From: Sacha L. <sac...@un...> - 2013-10-11 09:02:20
|
Hello everybody, For a sequencing project of a genome around 1Gb, I have 3 Illumina lanes and 8 SMRTcell of sequences. I am trying to use the pacbioToCA correction tool of wgs assembler, using the 3 lanes of Illumina. Up to now, I have managed to successfully run the pacBioToCA pipeline to the counting of kmers with merryl. I thus have a bunch (a lot actually) of files "asm-C-ms13.cm°.batch*.[mcdat|mcidx]". The next step is thus estimate mer threshold but it invariably failes with: merylStreamReader()-- ERROR: 0-mercounts/asm-C-ms14-cm0.mcidx is an INCOMPLETE merylStream index file! merylStreamReader()-- ERROR: 0-mercounts/asm-C-ms14-cm0.mcdat is an INCOMPLETE merylStream data file! >From what I understands, the pipeline is expecting only one file of mcdat and one of mcidx but I have a whole batch of them. Shouldn't be some part involved where the batch files are merged or shouldn't the meryl reader understand that it's not a single file ? Do you have any idea on how to overcome this ? >From a larger perspective, what kind of sotfware analysis would you recommand for my data ? I can't use allPathLG because I don't have overlapping reads, I can't use scaffolding with AHA because my genome is way to large, and I'm failling at pacBioToCA. Do you think its feasible/advisable to perform a classic assembly with wgs assembler without read correction that might be more likely to succeed ? Should I try MIRA (I think it's the last alternative I have to get a whole genome assembly with the data that I have...) Thank you so much for your help ! Sacha |