| read.wtccc.signals {snpMatrix} | R Documentation |
read.wtccc.signals takes a file and a list of snp ids (either
Affymetrix ProbeSet IDs or rs numbers), and extract the entries
into a form suitable for plotting and further analysis
read.wtccc.signals(file, snp.list)
file |
file contains the signals. There is no need to gunzip. |
snp.list |
A list of snp id's. Some Affymetrix SNPs don't have rsnumbers both rsnumbers and Affymetrix ProbeSet IDs are accepted |
Do not specify both rs number and Affymetrix Probe Set ID in the input; one of them is enough.
The signal file is formatted as follows, with the first 5 columns being the Affymetrix Probe Set ID, rs number, chromosome position, AlleleA and AlleleB. The rest of the header containing the sample id appended with "_A" and "_B".
AFFYID RSID pos AlleleA AlleleB 12999A2_A 12999A2_B ... SNP_A-4295769 rs915677 14433758 C T 0.318183 0.002809 SNP_A-1781681 rs9617528 14441016 A G 1.540461 0.468571 SNP_A-1928576 rs11705026 14490036 G T 0.179653 2.261650
The routine matches the input list against the first and the 2nd column.
(some early signal files, have the first "AFFYID" missing - this routine can cope with that also)
The routine returns a list of named matrices, one for each input SNP
(NULL if the SNP is not found); the row names are sample IDs
and columns are "A", "B" signals.
TODO: There is a built-in limit to the input line buffer (65535) which should be sufficient for 2000 samples and 30 characters each. May want to seek backwards, re-read and dynamically expand if the buffer is too small.
Hin-Tak Leung htl10@users.sourceforge.net
## Not run:
answer <-
read.wtccc.signals("NBS_22_signals.txt.gz", c("SNP_A-4284341","rs4239845"))
> summary(answer)
Length Class Mode
SNP_A-4284341 2970 -none- numeric
rs4239845 2970 -none- numeric
> head(a$"SNP_A-4284341")
A B
12999A2 1.446261 0.831480
12999A3 1.500956 0.551987
12999A4 1.283652 0.722847
12999A5 1.549140 0.604957
12999A6 1.213645 0.966151
12999A8 1.439892 0.509547
>
## End(Not run)