Menu

#11 clumpify on paired reads

1.0
open
nobody
clumpify (3)
2018-11-27
2018-11-23
No

Hi Brian,

I have trouble applying clumpify to highseq paired reads.

my command is:

clumpify.sh in=read1_1 in2=read1_2 out=c_read1_1 out2=c_read1_2 \
    dupedist=2500 \
    dedupe optical

the c_read1_1 are twice less than the _2 after that.

I understand from the manpage that unpair and repair can be applied but I do not get how

Pairing/ordering parameters (for use with error-correction):
unpair=f            For paired reads, clump all of them rather than just
                    read 1.  Destroys pairing.  Without this flag, for paired
                    reads, only read 1 will be error-corrected.
repair=f            After clumping and error-correction, restore pairing.
                    If groups>1 this will sort by name which will destroy
                    clump ordering; with a single group, clumping will
                    be retained.

should I add 'unpair=t' (and) 'repair=t' ?

Also, can I operate error correction on both reads of a pair based on the optical duplicate cluster and produce only one pair of consensus paired-reads per cluster? (command examples would be welcome here too)

could you please comment on how to clean both of the pair and obtain output still paired?

Thanks
Stephane

Discussion

  • Stephane Plaisance

    using v38.32

     
  • Stephane Plaisance

    weird!
    in fact the number of reads "zgrep -c '^@'" is the same but the size of the compressed read c_read_1 is 50% of that of c_read_2

     
  • Brian Bushnell

    Brian Bushnell - 2018-11-26

    Hi Stephane,

    This is not surprising; the reads are clumped by read 1, and read 2 is just along for the ride, getting put out in the same order as read 1. As such, when you have 2 files, file 1 will compress much better when you have variable insert size. I had not personally noticed this since I work with interleaved files.

    You do not want to use "unpair" and "repair" unless you are doing error-correction. If you ARE doing error-correction, then yes, add those flags.

    Honestly, I'm not entirely sure of the impact of doing certain operations in conjunction with each other, like "unpair" + "dedupe". That's probably a bad idea since then duplicates will be found based on single reads, and then some reads will be lost so they can't be re-paired, etc. If you want to do duplicate removal and error-correction, I'd run 2 passes, for example:

    clumpify.sh in=reads.fq out=deduped.fq dedupe optical
    clumpify.sh in=deduped.fq out=ecc.fq ecc unpair repair

    You cannot explicitly error-correct only duplicate clusters. But by default, the highest-quality pair should be retained when duplicates are found.

     
  • Stephane Plaisance

    Hi again Brian,
    Thanks for the explanations, I will help both and compare.
    best
    S

     

Log in to post a comment.

MongoDB Logo MongoDB