ProServer Tutorial

Andy Jenkinson, EMBL-EBI, 26th February 2008

Mailing list: http://lists.sanger.ac.uk/mailman/listinfo/proserver-users.

Overview

This document is intended to be a quick guide to setting up ProServer to work with a custom set of data such as you may have. The example uses data from a tab-separated GFF file containing a small number of UniProt features. The file is available from here. However, the tutorial may be equally useful for those posessing data in other forms such as relational databases.

The tutorial assumes you are familiar with Perl and are operating on a Linux platform.

Basic Architecture

ProServer is a standalone server, meaning you do not need to run a separate web server such as Apache. It handles all of the communications, query parsing and XML output functions, leaving you to adapt your data to the DAS protocol. This is done by creating a subclass of the Bio::Das::ProServer::SourceAdaptor module.

The contract of a SourceAdaptor is to provide the data for a DAS query in a data structure that the ProServer core can understand. This is done by implementing a single method for each DAS command. For example, a DAS source that is to respond to the 'features' command implements the 'build_features' method, which returns an array of hashes. Each hash represents a single feature.

ProServer includes various transport modules that exist to make accessing your data easier by reducing the boilerplate code you need to write. For example, the dbi transport for relational databases handles all database connections, statements, results sets etc. Transports also exist for flat files, SRS, the BioPerl and Ensembl APIs, etc.

Running ProServer

Before starting to write code, you should know how to run and test it. The ProServer distribution contains a Perl script called 'proserver' that you should use to run proserver. It is in the 'eg' directory. During development, you should run this script with the '-x' option. This prevents the process from forking and directs log output to your terminal rather than to file. Try running the script in your terminal:

eg/proserver -x -c eg/proserver.ini

You should see the server start with some information about its (default) configuration. If not, you should be able to diagnose the problem. Commonly errors arise from:

ProServer uses an INI file to configure itself, which you specified using the '-c' command-line option. This INI file defines the port number the server should listen on, the root directory to look for static content and details of the DAS sources it is serving. You will write your own INI file, but for now take a quick look at the example proserver.ini. There are some comments describing the various options.

Each section of the INI file is denoted by square brackets. Server options such as port number are in the [general] section. All other sections are treated as DAS sources that the server hosts, each representing an individual source of data. Though each server can host several sources, you will define only one. Create a new file 'eg/tutorial.ini' with this content:

[mytutorial]
state        = on
adaptor      = tutorial

This file configures ProServer with a DAS source called 'mytutorial' using the 'Bio::Das::ProServer::SourceAdaptor::tutorial' adaptor, and turns it on. Now start ProServer with this file instead of the example one:

eg/proserver -x -c eg/tutorial.ini

By default, ProServer listens for HTTP requests on port 9000. Open a web browser to the URL "http://localhost:9000/das/sources". This runs the 'sources' server command, which returns an XML document listing the DAS sources the server is hosting.

Modern web broswers will automatically apply ProServer's XSL stylesheets to transform the XML into a more human readable HTML format. If you get some sort of error at this point, it is probably because ProServer can't find its default stylesheets. Make sure you are running ProServer from its root directory.

To see the XML itself, use the 'view source' function of your browser. Though your 'mytutorial' source should be listed, you will see that it is not. Check your terminal to find out why. You will see that ProServer attempted to build a Bio::Das::ProServer::SourceAdaptor::tutorial object, but errored. Of course, no such module exists because you haven't written it yet.

Writing a SourceAdaptor

In its most basic form, a SourceAdaptor is a single module extending from the Bio::Das::ProServer::SourceAdaptor package with two methods. Start by creating a new file with the following skeleton content:

package Bio::Das::ProServer::SourceAdaptor::tutorial; # package names must take this form
use strict;
use base qw(Bio::Das::ProServer::SourceAdaptor); # modules must extend from this

# Set metadata such as the commands supported by this source.
sub init {
  my ($self) = @_;
  $self->{'capabilities'} = { 'features' => '1.0' }; # Implement the features command
}

# Gather the features annotated in a given segment of sequence.
sub build_features {
  my ($self, $args) = @_;
  my $segment = $args->{'segment'}; # The query segment ID
  my $start   = $args->{'start'};   # The query start position (optional)
  my $end     = $args->{'end'};     # The query end position (optional)
  my @features = ();
  # do work...
  return @features;
}

1;

Save this file as lib/Bio/Das/ProServer/SourceAdaptor/tutorial.pm. Now try running the server again. Whenever you make changes to code Make sure you rebuild ProServer first to include the additional file:

perl Build.PL
./Build
eg/proserver -x -c eg/tutorial.ini

Now your source should appear in the list. The table has columns for extra details such as the description and coordinate system of the DAS source. We will add these later.

Now we shall expand our 'tutorial' SourceAdaptor to serve our features from a GFF file. To do this, the adaptor should return an array of simple hash structures. The POD documentation for the build_features method in Bio::Das::ProServer::SourceAdaptor contains full details of the format these hash structures can take. There is some flexibility here, but our features will look like this:

{
 'start'  => $feature_start,
 'end'    => $feature_end,
 'id'     => $feature_id,        # A unique ID for the feature
 'type'   => $feature_type,      # e.g. 'exon', 'snp'
 'method' => $annotation_method, # e.g. 'similarity'
 'score'  => $annotation_score,  # e.g. '96.5'
}

Take a look at the GFF file, which contains some features taken from the UniProt DAS source. GFF (Generic Feature Format) files are tab-separated files with standard columns. See the specification at http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml for details.

Expand the build_features method of your tutorial adaptor to do the following:

  1. Open the file for reading
  2. Iterate over each line
  3. Build a DAS feature structure for features associated overlap with the query protein
  4. Return an array of feature structures

Test your adaptor by checking that ProServer responds appropriately to a request for features within the UniProt protein P15056 (BRAF_HUMAN):

  http://localhost:9000/das/mytutorial/features?segment=P15056

Once you have finished, your adaptor should look something like this:

package Bio::Das::ProServer::SourceAdaptor::tutorial; # package names must take this form
use strict;
use base qw(Bio::Das::ProServer::SourceAdaptor); # modules must extend from this

# Set metadata such as the commands supported by this source.
sub init {
  my ($self) = @_;
  $self->{'capabilities'} = { 'features' => '1.0' }; # Implement the features command
}

# Gather the features annotated in a given segment of sequence.
sub build_features {
  my ($self, $args) = @_;
  my $segment = $args->{'segment'}; # The query segment ID
  my $start   = $args->{'start'};   # The query start position (optional)
  my $end     = $args->{'end'};     # The query end position (optional)
  my @features = ();
  # do work...
  
  open FH, '<', '/tmp/uniprot.gff' or die "Unable to open data file";
  while (defined (my $line = <FH>)) {
    chomp $line;
    my ($f_seg, $method, $type, $f_start, $f_end, $score, $strand, $phase, $f_id) = split /\t/, $line;
    
    if ($f_seg eq $segment && (!$start || !$end) || ($f_start <= $end && $f_end >= $start)) {
      $f_id =~ s/[^=]+=//;
      
      my $feature = {
        'id'     => $f_id,
        'start'  => $f_start,
        'end'    => $f_end,
        'method' => $method,
        'score'  => $score,
        'type'   => $type,
      };
      
      push @features, $feature;
    }
    
  }
  close FH;
  
  return @features;
}

1;

You now have your DAS source up and running. However, your source is of more use if it describes itself a little better. You can fill in some of the metadata properties (shown in the sources command) in different ways - in the init method, the INI file or by implementing the relevant method in your SourceAdaptor. It is easiest to define them in the INI file:

[mytutorial]
state        = on
adaptor      = tutorial
title        = Tutorial Source
description  = Some demonstration features taken from UniProt
coordinates  = UniProt,Protein Sequence -> P15056
dsncreated   = 2007-02-26
maintainer   = user@domain.com
doc_href     = http://beta.uniprot.org/uniprot/P15056

Once you have filled in these optional properties, start your server again. But this time, allow the server to fork so that it is running as a daemon process. This is done by omitting the '-x' command-line flag.

Further Tasks

Modify your source to make use of the file Transport. See the POD documentation for Bio::Das::ProServer::SourceAdaptor::Transport::file for details.

There are several other SourceAdaptor methods that may be useful to implement. For example, the segment_version method makes your source indicate the version of the segment that it is annotating. This is useful for clients to verify that annotations are based on the same entity. Note that not all coordinate systems have versioned entities - for example, genomic assemblies are versioned as a whole rather than per-entity. The known_segments and length methods, if implemented, allow ProServer to automatically offer the entry_points command, and also filter requests for unknown or out-of-range segments.

Of course, to provide this information you would need to store the versions and lengths of all the sequences you annotate, which is worth bearing in mind if you are planning to set up your own DAS source.