| xval-methods {MLInterfaces} | R Documentation |
support for cross-validatory machine learning with ExpressionSets
xval( data, classLab, proc, xvalMethod, group, indFun, niter, fsFun=NULL, fsNum=NULL, decreasing=TRUE, cluster=NULL, ... ) balKfold(K) xvalML( formula, data, proc, xvalMethod="LOO", group, indFun, niter, fsFun=NULL, fsNum=10, decreasing=TRUE, cluster=NULL, ... )
data |
instance of class ExpressionSet |
formula |
a model formula, typically with a dot on the RHS,
and response variable chosen from pData columns. |
classLab |
character string identifying phenoData variable to label classifications |
proc |
an MLInterfaces method that returns an instance of
"classifOutput" |
xvalMethod |
character string identifying cross-validation procedure to use: default is "LOO" (leave one out), alternatives are "LOG" (leave group out) and "FUN" (user-supplied partition extraction function, see Details below) |
group |
a vector (length equal to number of samples) enumerating groups for LOG xval method |
indFun |
a function that returns a set of indices to be saved as a test set;
this function must have parameters data, clab, iternum; see
Details |
niter |
number of iterations for user-specified partition function to be run |
fsFun |
function computing ranks of features for feature selection |
fsNum |
number of features to be kept for learning in each iteration |
decreasing |
logical, should be TRUE if fsFun provides high scores for high-performing features
(e.g., is absolute value of a test statistics) and false if it provides low scores
for high-performing features (e.g., p-value of a test). |
cluster |
NULL or an S4-class object with a defined
xvalLoop method. Use this to execute xval on
several nodes in a computer cluster. See documentation for
xvalLoop for more information |
... |
arguments passed to the MLInterfaces generic proc |
K |
number of partitions to be used if balKfold is used as indFun |
For fixed feature sets (fsFun not specified),
a vector or matrix with length equal to the number of cross-validation
assignments. Each element contains the label resulting from the
cross-validation.
For dynamic feature sets (fsFun specified), a list with element
out containing labels from cross-validations, and element
fs.memory recording features used in each cross-validation.
NB: This is now regarded as a legacy approach. See MLearn for
an approach in which the cross-validation specification is a parameter to MLearn.
If xvalMethod is "FUN", then indFun must be a function
with parameters data, clab, and iternum.
This function returns
indices that identify the training set for a given
cross-validation iteration passed as the value of iternum. An example
function is printed out when the example of this page is executed.
if fsFun is not NULL, then it must be a function with two
arguments: the first can be transformed to a feature matrix (rows are objects,
columns are features) and the second is a vector of class labels.
The function returns a vector of scores, one for each object. The
scores will be interpreted according to the value of decreasing,
to select fsNum features. Thanks to Stephen Henderson of University
College London for
this functionality.
Note that if fsFun is non-null, then the RHS of
formula will be
ignored, and it is assumed that the RHS is ".". We will attempt
to ameliorate this in a future revision. If you wish to subset
the features in data before applying cross-validated
feature selection, do this manually, not by specifying a nontrivial
formula.
library(golubEsets)
data(Golub_Merge)
smallG <- Golub_Merge[200:250,]
lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", group=as.integer(0))
table(lk1,smallG$ALL.AML)
lk2 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOG", group=as.integer(
rep(1:8,each=9)))
table(lk2,smallG$ALL.AML)
balKfold
lk3 <- xval(smallG, "ALL.AML", knnB, xvalMethod="FUN", 0:0, indFun=balKfold(5), niter=5)
table(lk3, smallG$ALL.AML)
#
# illustrate the xval FUN method in comparison to LOO
#
LOO2 <- xval(smallG, "ALL.AML", knnB, "FUN", 0:0, function(x,y,i) {
(1:ncol(exprs(x)))[-i] }, niter=72 )
table(lk1, LOO2)
#
# use Stephen Henderson's feature selection extensions
#
t.fun<-function(data, fac)
{
require(genefilter)
# deal with the integer storage of golubTrain@exprs!
xd <- matrix(as.double(exprs(data)), nrow=nrow(exprs(data)))
return(abs(rowttests(xd,pData(data)[[fac]], tstatOnly=FALSE)$statistic))
}
lk3f <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", 0:0, fsFun=t.fun)
table(lk3f$out, smallG$ALL.AML)
# use MLearn xval
XXml = xvalML(ALL.AML~., smallG, "knn", "LOO")
# show that it agrees with the fB approach
table(XXml, lk1)
# use MLearn xval with feature selection
XXmlfs = xvalML(ALL.AML~., smallG, "knn", "LOO", fsFun=t.fun)
# show that it agrees with the previous approach
table(XXmlfs$out, lk3f$out)