UCSCTableQuery-class {rtracklayer}R Documentation

Querying UCSC Tables

Description

The UCSC genome browser is backed by a large database, which is exposed by the Table Browser web interface. Tracks are stored as tables, so this is also the mechanism for retrieving tracks. The UCSCTableQuery class represents a query against the Table Browser. Storing the query fields in a formal class facilitates incremental construction and adjustment of a query.

Details

There are four supported fields for a table query:

session
The UCSCSession instance from the tables are retrieved. Although all sessions are based on the same database, the set of user-uploaded tracks, which are represented as tables, is not the same, in general.
trackName
The name of a track from which to retrieve a table. Each track can have multiple tables. Many times there is a primary table that is used to display the track, while the other tables are supplemental. Sometimes, tracks are displayed by aggregating multiple tables.
tableName
The name of the specific table to retrieve. May be NULL, in which case the behavior depends on how the query is executed, see below.
range
A RangesList indicating the portion of the table to retrieve, in genome coordinates. The genome indicated by the RangesList also determines which tracks are available and must always be non-NULL. If the RangesList is empty, the table is downloaded for the entire genome.

A common workflow for querying the UCSC database is to create an instance of UCSCTableQuery using the ucscTableQuery constructor, invoke tableNames to list the available tables for a track, and finally to retrieve the desired table either as a data.frame via getTable or as a RangedData track via track. See the examples.

The reason for a formal query class is to facilitate multiple queries when the differences between the queries are small. For example, one might want to query multiple tables within the track and/or same genomic region, or query the same table for multiple regions. The UCSCTableQuery instance can be incrementally adjusted for each new query. Some caching is also performed, which enhances performance.

Constructor

ucscTableQuery(x, track, range = GenomicRanges(), table = NULL): Creates a UCSCTableQuery with the UCSCSession given as x and the track name given by the single string track. range should be a RangesList instance, and it effectively defaults to range(x). Any missing information in range, often the genome identifier, is filled in from range(x). The table name is given by table, which may be a single string or NULL.

Executing Queries

Below, object is a UCSCTableQuery instance.

track(object): Retrieves the indicated table as a track, i.e. a RangedData instance. Note that not all tables are available as tracks.
getTable(object): Retrieves the indicated table as a data.frame. Note that not all tables are output in parseable form.
tableNames(object): Gets the names of the tables available for the session, track and range specified by the query.

Accessor methods

In the code snippets below, x/object is a UCSCTableQuery object.

browserSession(object), browserSession(object) <- value: Get or set the UCSCSession to query.
trackName(x), trackName(x) <- value: Get or set the single string indicating the track containing the table of interest.
tableName(x), tableName(x) <- value: Get or set the single string indicating the name of the table to retrieve. May be NULL, in which case the table is automatically determined.
range(x), range(x) <- value: Get or set the RangesList indicating the portion of the table to retrieve in genomic coordinates. Any missing information, such as the genome identifier, is filled in using range(browserSession(x)).

Author(s)

Michael Lawrence

Examples

## Not run: 
session <- browserSession()
genome(session) <- "mm9"
trackNames(session) ## list the track names
## choose the Conservation track for a portion of mm9 chr1
query <- ucscTableQuery(session, "Conservation",
                        GenomicRanges(57795963, 57815592, "chr12"))
## list the table names
tableNames(query)
## get the phastCons30way track
tableName(query) <- "phastCons30way"
## retrieve the track data
track(query)
## get a data.frame summarizing the multiple alignment
tableName(query) <- "multiz30waySummary"
getTable(query)
## End(Not run)

[Package rtracklayer version 1.4.1 Index]