| reverseComplement {Biostrings} | R Documentation |
Use these functions for reversing a sequence and/or complementing a DNA sequence.
reverse(x, ...) complement(x, ...) reverseComplement(x, ...)
x |
An IRanges, NormalIRanges, MaskCollection,
XString, XStringSet, XStringViews
or MaskedXString object for reverse.
A DNAString, RNAString, DNAStringSet, RNAStringSet, XStringViews (with DNAString or RNAString subject), MaskedDNAString or MaskedRNAString object for complement and reverseComplement.
|
... |
Additional arguments to be passed to or from methods. |
Given an XString object x, reverse(x) returns
an object of the same XString subtype as x where letters
in x have been reordered in the reverse order.
If x is a DNAString or RNAString object,
complement(x) returns an object where each base in x
is "complemented" i.e. A, C, G, T in a DNAString object are replaced
by T, G, C, A respectively and A, C, G, U in a RNAString object
are replaced by U, G, C, A respectively.
Letters belonging to the "IUPAC extended genetic alphabet"
are also replaced by their complement (M <-> K, R <-> Y, S <-> S, V <-> B,
W <-> W, H <-> D, N <-> N) and the gap ("-") and hard masking
("+") letters are unchanged.
reverseComplement(x) is equivalent to reverse(complement(x))
but is faster and more memory efficient.
An object of the same class and length as the original object.
IRanges-class,
NormalIRanges-class,
MaskCollection-class,
DNAString-class,
RNAString-class,
DNAStringSet-class,
RNAStringSet-class,
XStringViews-class,
MaskedXString-class,
strrev,
chartr,
findPalindromes
## ---------------------------------------------------------------------
## A. SIMPLE EXAMPLES
## ---------------------------------------------------------------------
x <- DNAString("ACGT-YN-")
reverseComplement(x)
library(drosophila2probe)
x <- DNAStringSet(drosophila2probe$sequence)
x
alphabetFrequency(x, collapse=TRUE)
rcx <- reverseComplement(x)
rcx
alphabetFrequency(rcx, collapse=TRUE)
## ---------------------------------------------------------------------
## B. SEARCHING THE REVERSE STRAND OF A CHROMOSOME
## ---------------------------------------------------------------------
## Applying reverseComplement() to the pattern before calling
## matchPattern() is the recommended way to search hits on the reverse
## strand of a chromosome.
library(BSgenome.Dmelanogaster.UCSC.dm3)
chrX <- Dmelanogaster$chrX
chrX
alphabetFrequency(chrX) # 90100 N's
## Activate "assembly gaps" and "RepeatMasker" masks:
active(masks(chrX))[1:2] <- TRUE
chrX
alphabetFrequency(chrX) # no more N's
pattern <- DNAString("ACCAACNNGGTTG")
matchPattern(pattern, chrX, fixed=FALSE) # 3 hits on strand +
rcpattern <- reverseComplement(pattern)
rcpattern
m0 <- matchPattern(rcpattern, chrX, fixed=FALSE) # 5 hits on strand -
## Applying reverseComplement() to the subject instead of the pattern is not
## a good idea for 2 reasons:
## (1) Chromosome sequences are generally big and sometimes very big
## so computing the reverse complement of the positive strand will
## take time and memory proportional to its length.
chrXminus <- reverseComplement(chrX) # needs to allocate 22M of memory!
chrXminus
## (2) Chromosome locations are generally given relatively to the positive
## strand, even for features located in the negative strand, so after
## doing this:
m1 <- matchPattern(pattern, chrXminus, fixed=FALSE)
## the start/end of the matches are now relative to the negative strand.
## You need to apply reverseComplement() again on the result if you want
## them to be relative to the positive strand:
m2 <- reverseComplement(m1) # allocates 22M of memory, again!
## and finally to apply rev() to sort the matches from left to right
## (5'3' direction) like in m0:
m3 <- rev(m2) # same as m0, finally!
## WARNING: Before you try the example below on human chromosome 1, be aware
## that it will require the allocation of about 500Mb of memory!
if (interactive()) {
library(BSgenome.Hsapiens.UCSC.hg18)
chr1 <- Hsapiens$chr1
matchPattern(pattern, reverseComplement(chr1)) # DON'T DO THIS!
matchPattern(reverseComplement(pattern), chr1) # DO THIS INSTEAD
}