| BSgenome-class {BSgenome} | R Documentation |
A container for the complete genome sequence of a given specie.
[TODO: Put some details here]
In the code snippets below,
x is a BSgenome object.
seqnames(x):
Returns the index of the single sequences contained in x.
Each single sequence is stored in a BString (or derived)
object and comes from a source file (FASTA) with a single record.
The names returned by seqnames(x) usually reflect the names
of those source files but a common prefix or suffix was eventually
removed in order to keep them as short as possible.
mseqnames(x):
Returns the index of the multiple sequences contained in x.
Each multiple sequence is stored in a BStringViews
object and comes from a source file (FASTA) with multiple records.
The names returned by mseqnames(x) usually reflect the names
of those source files but a common prefix or suffix was eventually
removed in order to keep them as short as possible.
names(x):
Returns the index of all sequences contained in x.
This is the same as c(seqnames(x), mseqnames(x)).
In the code snippets below,
x is a BSgenome object
and name is the name of a sequence (character-string).
length(x):
Returns the length of x, i.e., the number of all sequences
that it contains. This is the same as length(names(x)).
x[[name]]:
[TODO: Document me]
x$name:
[TODO: Document me]
In the code snippets below,
x is a BSgenome object
and name is the name of a sequence (character-string).
unload(x, name):
[TODO: Document me]
H. Pages
available.genomes,
BString,
DNAString,
BStringViews,
getSeq,
matchPattern,
rm,
gc
library(BSgenome.Celegans.UCSC.ce2) # This doesn't load the chromosome
# sequences into memory.
length(Celegans) # Number of sequences in this genome.
Celegans # Displays index of all the sequences
# in this genome.
mem0 <- gc()["Vcells", "(Mb)"] # Current amount of data in memory (in
# Mb).
Celegans[["chrV"]] # Loads chromosome V into memory (hence
# takes a long time).
gc()["Vcells", "(Mb)"] - mem0 # Chromosome V occupies 20Mb of memory.
Celegans[["chrV"]] # Much faster (sequence is already in
# memory, hence it's not loaded again).
Celegans$chrV # Equivalent to Celegans[["chrV"]].
class(Celegans$chrV) # Chromosome V (like any other
# chromosome sequence) is a DNAString
# object.
nchar(Celegans$chrV) # Its has 20922231 letters (nucleotides).
x <- Celegans$chrV # Very fast because a BString object
# doesn't contain the sequence, only a
# pointer to the sequence, hence chrV
# seq is not duplicated in memory. But
# we now have 2 objects pointing to the
# same place in memory.
y <- substr(x, 10, 100) # A 3rd object pointing to chrV seq.
## We must remove all references to chrV seq if we want the 20Mb of memory
## used by it to be freed (note that it can be hard to keep track of all the
## references to a given sequence).
## IMPORTANT: The 1st reference to this seq (Celegans$chrV) should be removed
## last. This is achieved with unload(). All other references are removed by
## just removing the referencing object.
rm(x)
rm(y)
unload(Celegans, "chrV")
gc()["Vcells", "(Mb)"]