Menu

Home

Brian Bushnell

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.

Project Members:


Discussion

  • shengweima

    shengweima - 2017-02-16

    Dear Brian,
    I have a big DNA_seq file (fastq.gz) and I want to denovo assembly, but the computer resource isn't enough. I want to remove duplicates and reorder by Clumpify, then split the one big file to many small files that the computer can do it. can you give me some suggestion?
    The below is the order I used:
    1. clumpify.sh in=M27454_1.fq.gz in2=M27454_2.fq.gz out=M27454_1_clean.fq.gz out2=M27454_2_clean.fq.gz dedupe subs=0
    2. clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz
    3. zcat M27454_1_reorder.fastq.gz | split -l 10000000 --additional-suffix=".fastq" --filter='gzip > $FILE.gz' - "XXX_"

     
  • shengweima

    shengweima - 2017-02-16

    the second order is:clumpify.sh in=M27454_1_clean.fq.gz in2=M27454_2_clean.fq.gz out=M27454_1_reorder.fq.gz out2=M27454_2_reorder.fq.gz reorder

     

Log in to post a comment.