Dear Brian,
I have a big DNA_seq file (fastq.gz) and I want to denovo assembly, but the computer resource isn't enough. I want to remove duplicates and reorder by Clumpify, then split the one big file to many small files that the computer can do it. can you give me some suggestion?
The below is the order I used:
1. clumpify.sh in=M27454_1.fq.gz in2=M27454_2.fq.gz out=M27454_1_clean.fq.gz out2=M27454_2_clean.fq.gz dedupe subs=0
2. clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz
3. zcat M27454_1_reorder.fastq.gz | split -l 10000000 --additional-suffix=".fastq" --filter='gzip > $FILE.gz' - "XXX_"
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Brian,
I have a big DNA_seq file (fastq.gz) and I want to denovo assembly, but the computer resource isn't enough. I want to remove duplicates and reorder by Clumpify, then split the one big file to many small files that the computer can do it. can you give me some suggestion?
The below is the order I used:
1. clumpify.sh in=M27454_1.fq.gz in2=M27454_2.fq.gz out=M27454_1_clean.fq.gz out2=M27454_2_clean.fq.gz dedupe subs=0
2. clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz
3. zcat M27454_1_reorder.fastq.gz | split -l 10000000 --additional-suffix=".fastq" --filter='gzip > $FILE.gz' - "XXX_"
the second order is:clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz reorder