[wgs-assembler-users] pacBioToCA fails at estimate-mer-threshold

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello everybody,

For a sequencing project of a genome around 1Gb, I have 3 Illumina lanes
and 8 SMRTcell of sequences. I am trying to use the pacbioToCA correction
tool of wgs assembler, using the 3 lanes of Illumina.

Up to now, I have managed to successfully run the pacBioToCA pipeline to
the counting of kmers with merryl. I thus have a bunch (a lot actually) of
files "asm-C-ms13.cm°.batch*.[mcdat|mcidx]". The next step is thus estimate
mer threshold but it invariably failes with:

merylStreamReader()-- ERROR: 0-mercounts/asm-C-ms14-cm0.mcidx is an
INCOMPLETE merylStream index file!
merylStreamReader()-- ERROR: 0-mercounts/asm-C-ms14-cm0.mcdat is an
INCOMPLETE merylStream data file!

>From what I understands, the pipeline is expecting only one file of mcdat
and one of mcidx but I have a whole batch of them. Shouldn't be some part
involved where the batch files are merged or shouldn't the meryl reader
understand that it's not a single file ? Do you have any idea on how to
overcome this ?

>From a larger perspective, what kind of sotfware analysis would you
recommand for my data ? I can't use allPathLG because I don't have
overlapping reads, I can't use scaffolding with AHA because my genome is
way to large, and I'm failling at pacBioToCA. Do you think its
feasible/advisable to perform a classic assembly with wgs assembler without
read correction that might be more likely to succeed ? Should I try MIRA (I
think it's the last alternative I have to get a whole genome assembly with
the data that I have...)

Thank you so much for your help !

Sacha