| UCSCTableQuery-class {rtracklayer} | R Documentation |
The UCSC genome browser is backed by a large database,
which is exposed by the Table Browser web interface. Tracks are
stored as tables, so this is also the mechanism for retrieving tracks. The
UCSCTableQuery class represents a query against the Table
Browser. Storing the query fields in a formal class facilitates
incremental construction and adjustment of a query.
There are four supported fields for a table query:
UCSCSession instance from
the tables are retrieved. Although all sessions are based on the
same database, the set of user-uploaded tracks, which are represented
as tables, is not the same, in general.
NULL, in which case the behavior depends on how the query
is executed, see below.
RangesList
indicating the portion of the table to retrieve, in genome
coordinates. The genome indicated by the RangesList
also determines which tracks are available and must always be
non-NULL. If the RangesList is empty, the table is
downloaded for the entire genome.
A common workflow for querying the UCSC database is to create an
instance of UCSCTableQuery using the ucscTableQuery
constructor, invoke tableNames to list the available tables for
a track, and finally to retrieve the desired table either as a
data.frame via getTable or as a RangedData track
via track. See the examples.
The reason for a formal query class is to facilitate multiple queries
when the differences between the queries are small. For example, one
might want to query multiple tables within the track and/or same
genomic region, or query the same table for multiple regions. The
UCSCTableQuery instance can be incrementally adjusted for each
new query. Some caching is also performed, which enhances performance.
ucscTableQuery(x, track, range = GenomicRanges(), table = NULL):
Creates a UCSCTableQuery with the
UCSCSession given as x and the track name given by
the single string track. range should be a
RangesList instance, and it effectively defaults to
range(x). Any missing information in range, often
the genome identifier, is filled in from range(x). The
table name is given by table, which may be a single string
or NULL.
Below, object is a UCSCTableQuery instance.
track(object):
Retrieves the indicated table as a track, i.e. a RangedData
instance. Note that not all tables are available as tracks.
getTable(object): Retrieves the indicated table as a
data.frame. Note that not all tables are output in
parseable form.
tableNames(object): Gets the names of the tables available
for the session, track and range specified by the query.
In the code snippets below, x/object is a
UCSCTableQuery object.
browserSession(object),
browserSession(object) <- value:
Get or set the UCSCSession to query.
trackName(x), trackName(x) <- value: Get or
set the single string indicating the track containing the table of
interest.
tableName(x), tableName(x) <- value: Get or
set the single string indicating the name of the table to
retrieve. May be NULL, in which case the table is
automatically determined.
range(x), range(x) <- value: Get or set the
RangesList indicating the portion of the table to retrieve in
genomic coordinates. Any missing information, such as the genome
identifier, is filled in using range(browserSession(x)).
Michael Lawrence
## Not run:
session <- browserSession()
genome(session) <- "mm9"
trackNames(session) ## list the track names
## choose the Conservation track for a portion of mm9 chr1
query <- ucscTableQuery(session, "Conservation",
GenomicRanges(57795963, 57815592, "chr12"))
## list the table names
tableNames(query)
## get the phastCons30way track
tableName(query) <- "phastCons30way"
## retrieve the track data
track(query)
## get a data.frame summarizing the multiple alignment
tableName(query) <- "multiz30waySummary"
getTable(query)
## End(Not run)