sc_trim_barcode {scPipe} | R Documentation |
Reformat fastq files so barcode and UMI sequences are moved from the sequence into the read name.
sc_trim_barcode(outfq, r1, r2 = NULL, read_structure = list(bs1 = -1, bl1 = 0, bs2 = 6, bl2 = 8, us = 0, ul = 6), filter_settings = list(rmlow = TRUE, rmN = TRUE, minq = 20, numbq = 2))
outfq |
the output fastq file, which reformat the barcode and UMI into the read name. |
r1 |
read one for pair-end reads. This read should contain the transcript. |
r2 |
read two for pair-end reads, NULL if single read. (default: NULL) |
read_structure |
a list containing the read structure configuration:
|
filter_settings |
A list contains read filter settings:
|
Positions used in this function are 0-indexed, so they start from 0
rather than 1. The default read structure in this function represents
CEL-seq paired-ended reads. This contains a transcript in the first read, a
UMI in the first 8bp of the second read followed by a 6bp barcode. So the
read structure will be : list(bs1=-1, bl1=0, bs2=6, bl2=8, us=0,
ul=6)
. bs1=-1, bl1=0
indicates negative start position and zero
length for the barcode on read one, this is used to denote "no barcode" on
read one. bs2=6, bl2=8
indicates there is a barcode in read two that
starts at the 7th base with length 8bp. us=0, ul=6
indicates a UMI
from first base of read two and the length in 6bp.
For a typical Drop-seq experiment the read structure will be
list(bs1=-1, bl1=0, bs2=0, bl2=12, us=12, ul=8)
, which means the read
one only contains transcript, the first 12bp in read two are index, followed
by a 8bp UMI.
generates a trimmed fastq file named outfq
data_dir="celseq2_demo" ## Not run: # for the complete workflow, refer to the vignettes ... sc_trim_barcode(file.path(data_dir, "combined.fastq"), file.path(data_dir, "simu_R1.fastq"), file.path(data_dir, "simu_R2.fastq")) ... ## End(Not run)