Name | Modified | Size | Downloads / Week |
---|---|---|---|
README.txt | 2016-12-20 | 3.2 kB | |
SCIseq_RenameCellIDs.pl | 2016-12-20 | 1.2 kB | |
SCIseq_FilterBamToCellIDList.pl | 2016-12-20 | 496 Bytes | |
SCIseq_AddRGtoBam.pl | 2016-12-20 | 665 Bytes | |
SCIseq_FilterBamToReadThreshold.pl | 2016-12-20 | 759 Bytes | |
SCIseq_MakeCellIDList.pl | 2016-12-20 | 3.3 kB | |
SCIseq_NextSeqFastq_to_SCIseqFastq.pl | 2016-12-20 | 7.2 kB | |
SCIseq.TSase8nt_i5A_i7A.PCR10nt_i5ABCDEF_i7ABCDEF.index.txt | 2016-12-20 | 4.1 kB | |
SCIseq_RemoveDuplicates.pl | 2016-12-20 | 1.6 kB | |
SCIseq_RemoveDuplicatesPlot.r | 2016-12-20 | 818 Bytes | |
SCIseq_SplitRunFastq.pl | 2016-12-20 | 2.5 kB | |
Totals: 11 Items | 26.0 kB | 0 |
SCI-seq data processing readme: Contact: Andrew Adey (adey@ohsu.edu) The scripts here are for processing Single cell Combinatorial Indexing and Sequencing (SCI-seq) raw read data. The typical workflow is as follows & requires samtools to be command line callable: 1) After sequencing using SCI-seq chemistry (same as CPT-seq), perform the standard bcl2fastq v2 script as standard for NextSeq sequencing runs but be sure to include the following options: --with-failed-reads and --create-fastq-for-index-reads 2) In the folder witht he Undetermined... fastq files, run SCIseq_NextSeqFastq_to_SCIseqFastq.pl with the first arguement as the directory (just "." if current), the second arguement as the index file int he format: IndexID (tab) Index Number (1-4) (tab) Index Sequence The file: SCIseq.TSase8nt_i5A_i7A.PCR10nt_i5ABCDEF_i7ABCDEF.index.txt is provided, and the third arguement as an output prefix. The output of this script will be forward and reverse reads for those matching the idexes and rejected reads. The read names will be int he format used for processing where the name is the barcode (cell identifier) and a unique number. 3) If the entire run is for one sample, then no further split is necessary. Otherwise samples can be split out at this stage using SCIseq_SplitRunFastq.pl with the passing fastq files as the first two arguements, the output prefix for non-sample reads as the third arguement, and then a set of sample arguements with the sample prefix followed by an index file as in the index file used for the initial split. Note: This should only be carried out if the samples are for different projects, or require different alignment processes since it is possible to split after alignment when in bam file format which is typically easier since all samples can be aligned at the same time. 4) Next, align the reads using your preferred aligner to produce an aligned bam file. Do not perform duplicate removal, since standard duplicate removal does not account for the cell identifier. 5) Remove duplicates using SCIseq_RemoveDuplicates.pl with the aligned and sorted bam file as the first arguement and the output bam file as the second. If the path to the plotting R script (SCIseq_RemoveDuplicatesPlot.r) is provided as a third arguement, it will plot some compelxity figures. This R script requires ggplot2. Additional Processing: Splitting Bam files: Duplicate removed bam files can then be split into respective samples using: SCIseq_FilterBamToCellIDList.pl, where the first arguement is the input bam, the second is a list of barcodes to include, and the third is the output bam. A list of cell IDs can be generated using: SCIseq_MakeCellIDList.pl Filtering to a read count threshold: SCI-seq bam files can also be filtered to only include cells that have a minimum read count threshold using: SCIseq_FilterBamToReadThreshold.pl Add standard RG header lines to bam: Use: SCIseq_AddRGtoBam.pl Rename Cell IDs: If you do not want to use the barcodes, cells can be renamed using SCIseq_RenameCellIDs.pl