| XStringSet-class {Biostrings} | R Documentation |
The BStringSet class is a container for storing a set of
BString objects and for making its manipulation
easy and efficient.
Similarly, the DNAStringSet (or RNAStringSet, or AAStringSet) class is
a container for storing a set of DNAString
(or RNAString, or AAString) objects.
All those containers derive directly (and with no additional slots) from the XStringSet virtual class.
## Constructors: BStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) DNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) RNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) AAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) ## Accessor-like methods: ## S4 method for signature 'XStringSet': length(x) ## S4 method for signature 'character': width(x) ## S4 method for signature 'XStringSet': width(x) ## S4 method for signature 'XStringSet': names(x) ## S4 method for signature 'XStringSet': nchar(x, type="chars", allowNA=FALSE) ## Efficient subsequence extraction: ## S4 method for signature 'character': subseq(x, start=NA, end=NA, width=NA) ## S4 method for signature 'XStringSet': subseq(x, start=NA, end=NA, width=NA) ## ... and more (see below)
x |
Either a character vector (with no NAs), or an XString, XStringSet or XStringViews object. |
start,end,width |
Either NA, a single integer, or an integer vector of the same
length as x specifying how x should be "narrowed"
(see ?narrow for the details).
|
use.names |
TRUE or FALSE. Should names be preserved?
|
type,allowNA |
Ignored. |
The BStringSet, DNAStringSet, RNAStringSet and
AAStringSet functions are constructors that can be used to
"naturally" turn x into an XStringSet object of the desired
base type.
They also allow the user to "narrow" the sequences contained in x
via proper use of the start, end and/or width
arguments. In this context, "narrowing" means dropping a prefix or/and
a suffix of each sequence in x.
The "narrowing" capabilities of these constructors can be illustrated
by the following property: if x is a character vector
(with no NAs), or an XStringSet (or XStringViews) object,
then the 3 following transformations are equivalent:
BStringSet(x, start=mystart, end=myend, width=mywidth)
subseq(BStringSet(x), start=mystart, end=myend, width=mywidth)
BStringSet(subseq(x, start=mystart, end=myend, width=mywidth))
Note that, besides being more convenient, the first form is also more efficient on character vectors.
In the code snippets below,
x is an XStringSet object.
length(x):
The number of sequences in x.
width(x):
A vector of non-negative integers containing the number
of letters for each element in x.
Note that width(x) is also defined for a character vector
with no NAs and is equivalent to nchar(x, type="bytes").
names(x):
NULL or a character vector of the same length as x containing
a short user-provided description or comment for each element in x.
These are the only data in an XStringSet object that can safely
be changed by the user. All the other data are immutable!
As a general recommendation, the user should never try to modify
an object by accessing its slots directly.
alphabet(x):
Return NULL, DNA_ALPHABET, RNA_ALPHABET or
AA_ALPHABET depending on whether x is a BStringSet,
DNAStringSet, RNAStringSet or AAStringSet object.
nchar(x):
The same as width(x).
In the code snippets below,
x is a character vector (with no NAs),
or an XStringSet (or XStringViews) object.
subseq(x, start=NA, end=NA, width=NA):
Applies subseq on each element in x.
See ?subseq for the details.
Note that this is similar to what substr does on a
character vector. However there are some noticeable differences:
(1) the arguments are start and stop for substr;
(2) the SEW interface (start/end/width) interface of subseq
is richer (e.g. support for negative start or end values);
and (3) subseq checks that the specified start/end/width values
are valid i.e., unlike substr, it throws an error if
they define "out of limits" subsequences or subsequences with a
negative width.
narrow(x, start=NA, end=NA, width=NA, use.names=TRUE):
Same as subseq. The only differences are: (1) narrow
has a use.names argument; and (2) all the things narrow
and subseq work on (IRanges, XStringSet or
XStringViews objects for narrow,
XSequence or XStringSet objects for subseq).
But they both work and do the same thing on an XStringSet object.
threebands(x, start=NA, end=NA, width=NA):
Like the method for IRanges objects, the
threebands methods for character vectors and XStringSet
objects extend the capability of narrow by returning the 3
set of subsequences (the left, middle and right subsequences)
associated to the narrowing operation.
See ?threebands in the IRanges package for
the details.
subseq(x, start=NA, end=NA, width=NA) <- value:
A vectorized version of the subseq<-
method for XSequence objects.
See ?`subseq<-` for the details.
In the code snippets below,
x and values are XStringSet objects,
and i should be an index specifying the elements to extract.
x[i]:
Return a new XStringSet object made of the selected elements.
x[[i]]:
Extract the i-th XString object from x.
append(x, values, after=length(x)):
Add sequences in values to x.
In the code snippets below,
x is an XStringSet object.
order(x):
Return a permutation which rearranges x into ascending or
descending order.
sort(x):
Sort x into ascending order (equivalent to x[order(x)]).
rank(x):
Rank x in ascending order.
In the code snippets below,
x is an XStringSet object.
duplicated(x):
Return a logical vector whose elements denotes duplicates in x.
unique(x):
Return an XStringSet containing the unique values in x.
In the code snippets below,
x and y are XStringSet objects
union(x, y):
Union of x and y.
intersect(x, y):
Intersection of x and y.
setdiff(x, y):
Asymmetric set difference of x and y.
setequal(x, y):
Set equality of x to y.
In the code snippets below,
x is a character vector, XString, or XStringSet object and
table is an XStringSet object.
x %in% table:
Returns a logical vector indicating which elements in x match
identically with an element in table.
match(x, table, nomatch = NA_integer_, incomparables = NULL):
Returns an integer vector containing the first positions of an identical
match in table for the elements in x.
In the code snippets below,
x is an XStringSet object.
unlist(x):
Turns x into an XString object by combining the
sequences in x together.
Fast equivalent to do.call(c, as.list(x)).
as.character(x, use.names):
Convert x to a character vector of the same length as x.
use.names controls whether or not names(x) should be
used to set the names of the returned vector (default is TRUE).
as.matrix(x, use.names):
Return a character matrix containing the "exploded" representation of
the strings. This can only be used on an XStringSet object with
equal-width strings.
use.names controls whether or not names(x) should be used
to set the row names of the returned matrix (default is TRUE).
toString(x):
Equivalent to toString(as.character(x)).
H. Pages
BString-class,
DNAString-class,
RNAString-class,
AAString-class,
XStringViews-class,
substr,
subseq,
narrow
## ---------------------------------------------------------------------
## A. USING THE XStringSet CONSTRUCTORS ON A CHARACTER VECTOR
## ---------------------------------------------------------------------
## Note that there is no XStringSet() constructor, but an XStringSet
## family of constructors: BStringSet(), DNAStringSet(), RNAStringSet(),
## etc...
x0 <- c("#CTC-NACCAGTAT", "#TTGA", "TACCTAGAG")
width(x0)
x1 <- BStringSet(x0)
x1
## 3 equivalent ways to obtain the same BStringSet object:
BStringSet(x0, start=4, end=-3)
subseq(x1, start=4, end=-3)
BStringSet(subseq(x0, start=4, end=-3))
dna0 <- DNAStringSet(x0, start=4, end=-3)
dna0
names(dna0)
names(dna0)[2] <- "seqB"
dna0
## ---------------------------------------------------------------------
## B. USING THE XStringSet CONSTRUCTORS ON AN XStringSet OBJECT
## ---------------------------------------------------------------------
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe$sequence)
probes
RNAStringSet(probes, start=2, end=-5) # does NOT copy the sequence data!
## ---------------------------------------------------------------------
## C. USING subseq() ON AN XStringSet OBJECT
## ---------------------------------------------------------------------
subseq(probes, start=2, end=-5)
subseq(probes, start=13, end=13) <- "N"
probes
## Add/remove a prefix:
subseq(probes, start=1, end=0) <- "--"
probes
subseq(probes, end=2) <- ""
probes
## Do more complicated things:
subseq(probes, start=4:7, end=7) <- c("YYYY", "YYY", "YY", "Y")
subseq(probes, start=4, end=6) <- subseq(probes, start=-2:-5)
probes
## ---------------------------------------------------------------------
## D. UNLISTING AN XStringSet OBJECT
## ---------------------------------------------------------------------
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe$sequence)
unlist(probes)