srFilter {ShortRead} | R Documentation |
These functions create user-defined (srFitler
) or built-in
instances of SRFilter
objects. Filters can be
applied to objects from ShortRead
, returning a logical vector
to be used to subset the objects to include only those components
satisfying the filter.
srFilter(fun, name = NA_character_, ...) ## S4 method for signature 'missing': srFilter(fun, name=NA_character_) ## S4 method for signature 'function': srFilter(fun, name=NA_character_) compose(filt, ..., .name) idFilter(regex=character(0), fixed=FALSE, exclude=FALSE, .name="idFilter") chromosomeFilter(regex=character(0), fixed=FALSE, exclude=FALSE, .name="ChromosomeFilter") positionFilter(min=-Inf, max=Inf, .name="PositionFilter") strandFilter(strandLevels=character(0), .name="StrandFilter") uniqueFilter(withSread=TRUE, .name="UniqueFilter") nFilter(threshold=0L, .name="CleanNFilter") polynFilter(threshold=0L, nuc=c("A", "C", "T", "G", "other"), .name="PolyNFilter") dustyFilter(threshold=Inf, .name="DustyFilter") srdistanceFilter(subject=character(0), threshold=0L, .name="SRDistanceFilter") alignQualityFilter(threshold=0L, .name="AlignQualityFilter") alignDataFilter(expr=expression(), .name="AlignDataFilter")
fun |
An object of class function to be used as a
filter. fun must accept a single named argument x , and
is expected to return a logical vector such that x[fun(x)]
selects only those elements of x satisfying the conditions of
fun
|
name |
A character(1) object to be used as the name of the
filter. The name is useful for debugging and reference. |
filt |
A SRFilter object, to be used with
additional arguments to create a composite filter. |
.name |
An optional character(1) object used to over-ride
the name applied to default filters. |
regex |
Either character(0) or a character(1)
regular expression used as grep(regex, chromosome(x)) to
filter based on chromosome. The default (character(0) )
performs no filtering |
fixed |
logical(1) passed to grep ,
influencing how pattern matching occurs. |
exclude |
logical(1) which, when TRUE , uses
regex to exclude, rather than include, reads. |
min |
|
max |
numeric(1) value defining the closed interval in
which position must be found,
min <= position <= max |
strandLevels |
Either character(0) or character(1)
containing strand levels to be selected. ShortRead objects
have standard strand levels NA, "+", "-", "*" , with NA
meaning strand information not available and "*" meaning
strand information not relevant. |
withSread |
A logical(1) indicating whether uniqueness
includes the read sequence (withSread=TRUE ) or is based only
on chromosome, position, and strand (withSread=FALSE ). |
threshold |
A numeric(1) value representing a minimum
(srdistanceFilter , alignQualityFilter ) or maximum
(nFilter , polynFilter , dustyFilter ) criterion
for the filter. The minima and maxima are closed-interval (i.e.,
x >= threshold , x <= threshold for some property
x of the object being filtered). |
nuc |
A character vector containing IUPAC symbols for
nucleotides or the value "other" corresponding to all
non-nucleotide symbols, e.g., N . |
subject |
A character() of any length, to be used as the
corresponding argument to srdistance . |
expr |
A expression to be evaluated with
pData(alignData(x)) . |
... |
Additional arguments for subsequent methods; these arguments are not currently used. |
srFilter
allows users to construct their own filters. The
fun
argument to srFilter
must be a function accepting a
single argument x
and returning a logical vector that can be
used to select elements of x
satisfying the filter with
x[fun(x)]
The signature(fun="missing")
method creates a default filter
that returns a vector of TRUE
values with length equal to
length(x)
.
compose
constructs a new filter from one or more existing
filter. The result is a filter that returns a logical vector with
indices corresponding to components of x
that pass all
filters. If not provided, the name of the filter consists of the names
of all component filters, each separated by " o "
.
The remaining functions documented on this page are built-in filters
that accept an argument x
and return a logical vector of
length(x)
indicating which components of x
satisfy the
filter.
idFilter
selects elements satisfying
grep(regex, id(x), fixed=fixed)
.
chromosomeFilter
selects elements satisfying
grep(regex, chromosome(x), fixed=fixed)
.
positionFilter
selects elements satisfying
min <= position(x) <= max
.
strandFilter
selects elements satisfying
match(strand(x), strand, nomatch=0) > 0
.
uniqueFilter
selects elements satisfying
!srduplicated(x)
when withSread=TRUE
, and
!(duplicated(chromosome(x)) & duplicated(position(x)) & duplicated(strand(x)))
when withSread=FALSE
.
nFilter
selects elements with fewer than threshold
'N'
symbols in each element of sread(x)
.
polynFilter
selects elements with fewer than threshold
copies of any nucleotide indicated by nuc
.
dustyFilter
selects elements with high sequence complexity, as
characterized by their dustyScore
. This emulates the
dust
command from WindowMaker
software.
srdistanceFilter
selects elements at an edit distance greater
than threshold
from all sequences in subject
.
alignQualityFilter
selects elements with alignQuality(x)
greater than threshold
.
alignDataFilter
selects elements with
pData(alignData(x))
satisfying expr
. expr
should
be formulated as though it were to be evaluated as
eval(expr, pData(alignData(x)))
.
srFilter
returns an object of SRFilter
.
Built-in filters return a logical vector of length(x)
, with
TRUE
indicating components that pass the filter.
Martin Morgan <mtmorgan@fhcrc.org>
sp <- SolexaPath(system.file("extdata", package="ShortRead")) aln <- readAligned(sp, "s_2_export.txt") # Solexa export file, as example # a 'chromosome 5' filter filt <- chromosomeFilter("chr5.fa") aln[filt(aln)] # filter during input readAligned(sp, "s_2_export.txt", filter=filt) # x- and y- coordinates stored in alignData, when source is SolexaExport xy <- alignDataFilter(expression(abs(x-500) > 200 & abs(y-500) > 200)) aln[xy(aln)] # both filters chr5xy <- compose(filt, xy) aln[chr5xy(aln)] # custom filter: minimum calibrated base call quality >20 goodq <- srFilter(function(x) { apply(as(quality(x), "matrix"), 1, min) > 20 }, name="GoodQualityBases") goodq aln[goodq(aln)]