BBMap / Wiki / Home

Home

Authors:

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.

Project Members:

Brian Bushnell (admin)

Discussion

shengweima - 2017-02-16

Dear Brian,
I have a big DNA_seq file (fastq.gz) and I want to denovo assembly, but the computer resource isn't enough. I want to remove duplicates and reorder by Clumpify, then split the one big file to many small files that the computer can do it. can you give me some suggestion?
The below is the order I used:
1. clumpify.sh in=M27454_1.fq.gz in2=M27454_2.fq.gz out=M27454_1_clean.fq.gz out2=M27454_2_clean.fq.gz dedupe subs=0
2. clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz
3. zcat M27454_1_reorder.fastq.gz | split -l 10000000 --additional-suffix=".fastq" --filter='gzip > $FILE.gz' - "XXX_"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

shengweima - 2017-02-16

the second order is:clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz reorder

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.