1 2 3 4
\[sample_ID\] \[sequence_library_ID\] \[run_ID\] \[read_length\]
e.g. sample A has RNA-Seq data from three runs: Run-a and Run-b are from the same Library (Lib-a, insert size is 300 nt), and Run-c is from another one Library (Lib-b, insert size is 170 nt). Sequenced read length of Run-a and Run-c are both PE90 nt, while Run-b is PE100 nt. The list file can be created like this:
A Lib-a Run-a 90
A Lib-a Run-b 100
A Lib-b Run-c 90
Note:
* Each line contains information of one run.
If you have N runs for one sample, just write N lines. One run, one line.
* It is suggested to prepare one list for each sample if you want to analyze samples in parallel.
As SOAPfuse needs one sample list for each operation, so N list files are suggested if you have N samples, and run SOAPfuse N times to analyze all samples in parallel.
* Insert size is not required.
Yes, we think the insert size provided by user is not accurate, so it is not required in sample list.
But SOAPfuse will use its algorithm to evaluate the actual insert size in the pipeline.
* Different read lengths are allowed.
a. If you have RNA-Seq data of one sample from several runs but with different read length, never
mind,SOAPfuse has a complete set of algorithms to distinguish them for accurate calculation.
b. If in one run, the readlengths of /1 end and /2 end are different (uncommon). For example,
sample A has another run (Run-d from Lib-a) in which /1 end is 80 nt and /2 end is 90 nt.
SOAPfuse allows users to write the sample list like this:
A Lib-a Run-a 90
A Lib-a Run-b 100
A Lib-b Run-c 90
A Lib-a Run-d 80/90
A Lib-a Run-a 90/90
A Lib-a Run-b 100/100
A Lib-b Run-c 90/90
A Lib-a Run-d 80/90
K101-T Lib-n Run-n 90
K101-N Lib-m Run-m 100
For example, for sample A mentioned in sample list instance. Its RNA-Seq data files (fastq) will be stored like this:
