UCSCTableQuery-class {rtracklayer} | R Documentation |
The UCSC genome browser is backed by a large database,
which is exposed by the Table Browser web interface. Tracks are
stored as tables, so this is also the mechanism for retrieving tracks. The
UCSCTableQuery
class represents a query against the Table
Browser. Storing the query fields in a formal class facilitates
incremental construction and adjustment of a query.
There are four supported fields for a table query:
UCSCSession
instance from
the tables are retrieved. Although all sessions are based on the
same database, the set of user-uploaded tracks, which are represented
as tables, is not the same, in general.
NULL
, in which case the behavior depends on how the query
is executed, see below.
RangesList
indicating the portion of the table to retrieve, in genome
coordinates. The genome
indicated by the RangesList
also determines which tracks are available and must always be
non-NULL
. If the RangesList
is empty, the table is
downloaded for the entire genome.
A common workflow for querying the UCSC database is to create an
instance of UCSCTableQuery
using the ucscTableQuery
constructor, invoke tableNames
to list the available tables for
a track, and finally to retrieve the desired table either as a
data.frame
via getTable
or as a RangedData
track
via track
. See the examples.
The reason for a formal query class is to facilitate multiple queries
when the differences between the queries are small. For example, one
might want to query multiple tables within the track and/or same
genomic region, or query the same table for multiple regions. The
UCSCTableQuery
instance can be incrementally adjusted for each
new query. Some caching is also performed, which enhances performance.
ucscTableQuery(x, track, range = GenomicRanges(), table = NULL)
:
Creates a UCSCTableQuery
with the
UCSCSession
given as x
and the track name given by
the single string track
. range
should be a
RangesList
instance, and it effectively defaults to
range(x)
. Any missing information in range
, often
the genome identifier, is filled in from range(x)
. The
table name is given by table
, which may be a single string
or NULL
.
Below, object
is a UCSCTableQuery
instance.
track(object)
:
Retrieves the indicated table as a track, i.e. a RangedData
instance. Note that not all tables are available as tracks.
getTable(object)
: Retrieves the indicated table as a
data.frame
. Note that not all tables are output in
parseable form.
tableNames(object)
: Gets the names of the tables available
for the session, track and range specified by the query.
In the code snippets below, x
/object
is a
UCSCTableQuery
object.
browserSession(object)
,
browserSession(object) <- value
:
Get or set the UCSCSession
to query.
trackName(x)
, trackName(x) <- value
: Get or
set the single string indicating the track containing the table of
interest.
tableName(x)
, tableName(x) <- value
: Get or
set the single string indicating the name of the table to
retrieve. May be NULL
, in which case the table is
automatically determined.
range(x)
, range(x) <- value
: Get or set the
RangesList
indicating the portion of the table to retrieve in
genomic coordinates. Any missing information, such as the genome
identifier, is filled in using range(browserSession(x))
.
Michael Lawrence
## Not run: session <- browserSession() genome(session) <- "mm9" trackNames(session) ## list the track names ## choose the Conservation track for a portion of mm9 chr1 query <- ucscTableQuery(session, "Conservation", GenomicRanges(57795963, 57815592, "chr12")) ## list the table names tableNames(query) ## get the phastCons30way track tableName(query) <- "phastCons30way" ## retrieve the track data track(query) ## get a data.frame summarizing the multiple alignment tableName(query) <- "multiz30waySummary" getTable(query) ## End(Not run)