[wgs-assembler-users] understanding PBcR error-correction

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Serge, 

Could you give me some information about how PBcR does the error-correction (specially for low coverage). 
This might sound like a bold question but i have to ask since could not find any detailed information about it. 

I fed PBcR with 22 x PacBio data of a 1.3 Gb genome (low coverage settings) and it returned 15 x of error-corrected reads. This result is amazing (evenwhen considering the quality to be "only" 97-98 instead of 99+).

I know that overlaps are found using your MHAP aligner and that those overlaps are fed to PBDAGCON to create consensus, which then results in high confident base-information of the whole sequence. 

Does PBcR(like HGAP) use long sequences as initial "references" for the alignments or is it just brute-force all-against-all alignment and piling the overlaps up to find as many overlaps (coverage) per position as possible? 

Is there is a lower coverage threshold to do consensus calling at a given position of the read? 

Those questions relate more to PBDAGCON, for which i could not find much information. Maybe you could point me to some information about PBDAGCON or briefly explain its settings in PBcR. 

Thank you, 

Michel