| RangedData-class {IRanges} | R Documentation |
RangedData supports storing data, i.e. a set of
variables, on a set of ranges spanning multiple spaces
(e.g. chromosomes). Although the data is split across spaces, it can
still be treated as one cohesive dataset when
desired. In order to handle large datasets, the data values are
stored externally to avoid copying, and the rdapply
function facilitates the processing of each space separately (divide and
conquer).
A RangedData object consists of two primary components:
a RangesList holding the ranges over multiple
spaces and a parallel SplitXDataFrameList,
holding the split data. There is also an universe slot
for denoting the source (e.g. the genome) of the ranges and/or
data.
There are two different modes of interacting with a
RangedData. The first mode treats the object as a contiguous
"data frame" annotated with range information. The accessors
start, end, and width get the corresponding
fields in the ranges as atomic integer vectors, undoing the division
over the spaces. The [[ and matrix-style [, extraction
and subsetting functions unroll the data in the same way. [[<-
does the inverse. The number
of rows is defined as the total number of ranges and the number of
columns is the number of variables in the data. It is often convenient
and natural to treat the data this way, at least when the data is
small and there is no need to distinguish the ranges by their space.
The other mode is to treat the RangedData as a list, with an
element (a virtual Ranges/XDataFrame pair) for each
space. The length of the object is defined as the number of spaces and
the value returned by the names accessor gives the names of
the spaces. The list-style [ subset function behaves
analogously. The rdapply function provides a convenient and
formal means of applying an operation over the spaces separately. This
mode is helpful when ranges from different spaces must be treated
separately or when the data is too large to process over all spaces at
once.
In the code snippets below, x is a RangedData object.
The following accessors treat the data as a contiguous dataset, ignoring the division into spaces:
nrow(x): The number of ranges in x.
ncol(x): The number of data variables in x.
dim(x): An integer vector of length two, essentially
c(nrow(x), ncol(x)).
rownames(x), rownames(x) <- value: Gets or sets
the names of the ranges in x.
colnames(x), colnames(x) <- value: Gets the
names of the variables in x.
dimnames(x): A list with two elements, essentially
list(rownames(x), colnames(x)).
dimnames(x) <- value: Sets the row and column names,
where value is a list as described above.
Ranges. For IRanges, an integer
vector. Regardless, the number of elements is always equal to
nrow(x).
start(x): The start value of each range.
width(x): The width of each range.
end(x): The end value of each range.
These accessors make the object seem like a list along the spaces:
length(x):
The number of spaces (e.g. chromosomes) in x.
names(x), names(x) <- value: Get or set the names of
the spaces (e.g. "chr1").
NULL or a character vector of the same length as x.
Other accessors:
universe(x), universe(x) <- value: Get or set the
scalar string identifying the scope of the data in some way (e.g. genome,
experimental platform, etc). The universe may be NULL.
ranges(x): Gets the ranges in x as a
RangesList.
space(x): Gets the spaces from ranges(x).
values(x): Gets the data values in x as a
SplitXDataFrameList.
RangedData(ranges = IRanges(), ..., splitter = NULL,
universe = NULL):
Creates a RangedData with the ranges in ranges and
variables given by the arguments in .... See the
constructor XDataFrame for how the ...
arguments are interpreted. If splitter is NULL, all
of the ranges and values are placed into the same space, resulting
in a single-space (length one) RangedData. Otherwise, the
ranges and values are split into spaces according to
splitter, which is treated as a factor, like the f
argument in split. The universe may be specified
as a scalar string by the universe argument.
as.data.frame(x, row.names=NULL, optional=FALSE, ...):
Copy the start, end, width of the ranges and all of the variables
as columns in a data.frame. This is a bridge to existing
functionality in R, but of course care must be taken if the data
is large. Note that optional and ... are ignored.
as(from, "XDataFrame"): Like as.data.frame above,
except the result is an XDataFrame and it
probably involves less copying, especially if there is only a
single space.
as(from, "RangedData"): Coerce an Rle or an
XRle to a RangedData by converting each run
to a range and storing the run values in a column named "score".
In the code snippets below, x is a RangedData object.
x[i]:
Subsets x by indexing into its spaces, so the
result is of the same class, with a different set of spaces.
i can be numerical, logical, NULL or missing.
x[i,j]:
Subsets x by indexing into its rows and columns. The result
is of the same class, with a different set of rows and columns.
Note that this differs from the subset form
above, because we are now treating x as one contiguous dataset.
x[[i]]:
Extracts a variable from x, where i can be
a character, numeric, or logical scalar that indexes into the
columns. The variable is unlisted over the spaces.
x$name: similar to above, where name is taken
literally as a column name in the data.
x[[i]] <- value:
Sets value as column i in x, where i can be
a character, numeric, or logical scalar that indexes into the
columns. The length of value should equal
nrow(x). x[[i]] should be identical to value
after this operation.
x$name <- value: similar to above, where name is taken
literally as a column name in the data.
In the code snippets below, x is a RangedData object.
split(x, f, drop = FALSE): Split x according to
f, which should be of length equal to nrow(x). Note
that drop is ignored here. The result is a
RangedDataList where every element has the same
length (number of spaces) but different sets of ranges within each
space.
rbind(...): Matches the spaces from
the RangedData objects in ... by name and combines
them row-wise. In a way, this is the reverse of the split
operation described above.
c(x, ..., recursive = FALSE): Combines x with
arguments specified in ..., which must all be
RangedData objects. This combination acts as if x is
a list of spaces, meaning that the result will contain the spaces
of the first concatenated with the spaces of the second, and so
on. This function is useful when creating RangedData
objects on a space-by-space basis and then needing to
combine them.
There are two ways explicitly supported ways to apply a function over
the spaces of a RangedData. The richest interface is
rdapply, which is described in its own man page. The
simpler interface is an lapply method:
lapply(X, FUN, ...):
Applies FUN to each space in X with extra parameters
in ....
Michael Lawrence
RangedData-utils for utlities and the rdapply
function for applying a function to each space separately.
ranges <- IRanges(c(1,2,3),c(4,5,6))
filter <- c(1L, 0L, 1L)
score <- c(10L, 2L, NA)
## constructing RangedData instances
## no variables
rd <- RangedData()
rd <- RangedData(ranges)
ranges(rd)
## one variable
rd <- RangedData(ranges, score)
rd[["score"]]
## multiple variables
rd <- RangedData(ranges, filter, vals = score)
rd[["vals"]] # same as rd[["score"]] above
rd$vals
rd[["filter"]]
rd <- RangedData(ranges, score + score)
rd[["score...score"]] # names made valid
## use a universe
rd <- RangedData(ranges, universe = "hg18")
universe(rd)
## split some data over chromosomes
range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
both <- c(ranges, range2)
score <- c(score, c(0L, 3L, NA, 22L))
filter <- c(filter, c(0L, 1L, NA, 0L))
chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="")
rd <- RangedData(both, score, filter, space = chrom, universe = "hg18")
rd[["score"]] # identical to score
rd[1][["score"]] # identical to score[1:3]
## subsetting
## list style: [i]
rd[numeric()] # these three are all empty
rd[logical()]
rd[NULL]
rd[] # missing, full instance returned
rd[FALSE] # logical, supports recycling
rd[c(FALSE, FALSE)] # same as above
rd[TRUE] # like rd[]
rd[c(TRUE, FALSE)]
rd[1] # numeric index
rd[c(1,2)]
rd[-2]
## matrix style: [i,j]
rd[,NULL] # no columns
rd[NULL,] # no rows
rd[,1]
rd[,1:2]
rd[,"filter"]
rd[1,] # now by the rows
rd[c(1,3),]
rd[1:2, 1] # row and column
rd[c(1:2,1,3),1] ## repeating rows
## dimnames
colnames(rd)[2] <- "foo"
colnames(rd)
rownames(rd) <- head(letters, nrow(rd))
rownames(rd)
## space names
names(rd)
names(rd)[1] <- "chr1"
## variable replacement
count <- c(1L, 0L, 2L)
rd <- RangedData(ranges, count, space = c(1, 2, 1))
## adding a variable
score <- c(10L, 2L, NA)
rd[["score"]] <- score
rd[["score"]] # same as 'score'
## replacing a variable
count2 <- c(1L, 1L, 0L)
rd[["count"]] <- count2
## numeric index also supported
rd[[2]] <- score
rd[[2]] # gets 'score'
## removing a variable
rd[[2]] <- NULL
ncol(rd) # is only 1
rd$score2 <- score
## combining/splitting
rd <- RangedData(ranges, score, space = c(1, 2, 1))
c(rd[1], rd[2]) # equal to 'rd'
rd2 <- RangedData(ranges, score)
unlist(split(rd2, c(1, 2, 1))) # same as 'rd'
## applying
lapply(rd, `[[`, 1) # get first column in each space