From: Matthew C. <co...@gm...> - 2014-06-26 21:00:27
|
Hi, I had tried adding the localStaging flag, but still got the same " *java.io.FileNotFoundException*" during the overlap step. I did try out the lambda phage sample data set and it ran fine so I don't think it is something with my installation. We currently only have 16X but are thinking of going higher. I wanted to try a de novo assembly with this current dataset and MHAP finally seems like a reasonable way to do so =) Thanks, Matt On Thu, Jun 26, 2014 at 2:34 PM, Serge Koren <ser...@gm...> wrote: > Hi, > > Thanks, yes this looks like a bug in that the code recognized your genome > is too big to do the precompute but didn't properly turn it off. Adding the > localStaging="<path to local disk on node>" should let you work around the > issue. We will make a new release candidate and fix this bug and the other > one you encountered. I will say that with 16X you are probably not going to > get a very good assembly because you'll likely have less than 10X after > correction. I'd suggest trying ECTools as well ( > https://github.com/jgurtowski/ectools) as it is designed to work best > with coverage in the 10-20X range in combination with short-read sequencing > data. > > Sergey > > On Jun 25, 2014, at 2:33 PM, Matthew Conte <co...@gm...> wrote: > > Hi Serge, > > On Wed, Jun 25, 2014 at 11:36 AM, Serge Koren <ser...@gm...> > wrote: > >> Hi, >> >> On Jun 24, 2014, at 5:40 PM, Matthew Conte <co...@gm...> wrote: >> >> Hi all, >> >> I'm trying out PBcR to make use of the new MHAP overlapper for self >> correcting a set of PacBio reads and I'm running into an issue. >> >> I'm getting the following errors in the temp_dir/1-overlapper/1.err: >> *Exception in thread "main" java.io.FileNotFoundException: >> /raid3/PBcR_CA_8.2_alpha/tempLibrary/1-overlapper/stream_1/correct_reads_part000002.dat >> (No such file or directory)* >> >> The dat file is a pre-computed index that is used to speed up the >> computation for smaller genomes. For larger genomes or if you are using >> local disk, it should not get created. Do you have the output of the >> pipeline up to this step along with the command-line you used to start the >> run? That will help diagnose why it is not properly recognizing that the >> index is not built. As a workaround, you can add "localStaging=</path to >> local disk>" to your PBcR command which will force the pipeline to never >> pre-compute the index. >> > > The command that I ran was: > */sw/wgs-8.2alpha/Linux-amd64/bin/PBcR "mhap=-k 16 --num-hashes 1256 > --num-min-matches 3 --threshold 0.04 > localStaging=/path_to_working_dir/temp_staging" merSize=16 -length 500 > -partitions 200 -threads 27 -lib* > *aryname PBcR -s pacbio.spec > fastqFile=filtered_subreads.bbmap.rm_adapters.split.fastq -genomeSize > 1000000000* > > I changed the MHAP settings according to the PBcR wiki since I only have > about 16x coverage of PacBio data. > > I should mention that runCA continues to run until the '5-consensus' step, > and errors out there. But I think the start of the problem is at this > overlap step. > > The relevant output was: > *### Reading options from 'pacbio.spec'* > *### Reading options from the command line.* > > *Warning: no frag files specified, assuming self-correction of pacbio > sequences.* > *Running with 27 threads and 200 partitions* > ********** Starting correction...* > *...* > ********* Configuration Summary ********* > *bankPath = * > *maxCoverage = 40* > *...* > *mhap = -k 16 --num-hashes 1256 --num-min-matches 3 --threshold 0.04 > localStaging=/path_to_working_dir/temp_staging* > *ovlRefBlockLength = 100000000000* > *cnsErrorRate = 0.25* > *...* > *----------------------------------------START Wed Jun 25 11:24:30 2014* > *mkdir tempPBcR* > *----------------------------------------END Wed Jun 25 11:24:30 2014 (0 > seconds)* > *----------------------------------------START Wed Jun 25 11:24:30 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/fastqToCA -libraryname PBcR -type sanger > -technology none -feature doConsensusCorrection 1 -reads > /path_to_working_dir/filtered_subreads.bbmap.rm_adapters.split.fastq > > /path_to_working_dir//tempPBcR/PBcR.frg* > *----------------------------------------END Wed Jun 25 11:24:30 2014 (0 > seconds)* > *----------------------------------------START Wed Jun 25 11:24:30 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/runCA -s > /path_to_working_dir//tempPBcR/PBcR.spec -p asm -d tempPBcR > stopAfter=initialStoreBuilding /path_to_working_dir//tempPBcR/PBcR.frg* > *----------------------------------------START Wed Jun 25 11:24:30 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -o > /path_to_working_dir/tempPBcR/asm.gkpStore.BUILDING -F > /path_to_working_dir//tempPBcR/PBcR.frg > > /path_to_working_dir/tempPBcR/asm.gkpStore.err 2>&1* > *----------------------------------------END Wed Jun 25 11:35:27 2014 (657 > seconds)* > *numFrags = 2995674* > *Stop requested after 'initialstorebuilding'.* > *----------------------------------------END Wed Jun 25 11:35:27 2014 (657 > seconds)* > *Will be correcting PacBio library 1 with librarie[s] 1 - 1* > *----------------------------------------START Wed Jun 25 11:35:29 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -dumpfragments -invert > -tabular -longestovermin 1 500 -longestlength 1 8268329152 > /path_to_working_dir//tempPBcR/asm.gkpStore 2> > /path_to_working_dir//tempPBcR/asm.seedlength |awk '{if (!(match($1, "UID") > != 0 && length($1) == 3)) { print "frg uid "$1" isdeleted 1"; } }' > > /path_to_working_dir//tempPBcR/asm.toerase.uid* > *----------------------------------------END Wed Jun 25 11:35:38 2014 (9 > seconds)* > *----------------------------------------START Wed Jun 25 11:35:38 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper --edit > /path_to_working_dir//tempPBcR/asm.toerase.uid > /path_to_working_dir//tempPBcR/asm.gkpStore > > /path_to_working_dir//tempPBcR/asm.toerase.out 2> > /path_to_working_dir//tempPBcR/asm.toerase.err* > *----------------------------------------END Wed Jun 25 11:35:44 2014 (6 > seconds)* > *Running with 8.268329256X (for genome size 1000000000) of PBcR sequences > (8268329256 bp).* > *Correcting with 16X sequences (16536658304 bp).* > *Warning: performing self-correction with a total of 16. For best > performance, at least 50 is recommended.* > *----------------------------------------START Wed Jun 25 11:35:44 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/jellyfish count -m 16 -s 120000000 -t > 32 -o /path_to_working_dir//tempPBcR/asm.mers > /path_to_working_dir/filtered_subreads.bbmap.rm_adapters.split.fastq* > *----------------------------------------END Wed Jun 25 12:05:11 2014 > (1767 seconds)* > *----------------------------------------START Wed Jun 25 12:05:11 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/jellyfish histo -t 32 -f > /path_to_working_dir//tempPBcR/asm.mers > > /path_to_working_dir//tempPBcR/asm.hist* > *----------------------------------------END Wed Jun 25 12:09:10 2014 (239 > seconds)* > *----------------------------------------START Wed Jun 25 12:09:10 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/jellyfish dump -c -t -L 34 > /path_to_working_dir//tempPBcR/asm.mers |awk -v TOTAL=3328265613 > '{printf("%s\t%0.10f\t%d\t%d\n", $1, $2/TOTAL, $2, TOTAL)}' |sort -T . > -rnk2> /path_to_working_dir//tempPBcR/asm.ignore* > *----------------------------------------END Wed Jun 25 12:21:17 2014 (727 > seconds)* > *----------------------------------------START Wed Jun 25 12:21:17 2014* > *rm /path_to_working_dir//tempPBcR/asm.mers** > *----------------------------------------END Wed Jun 25 12:21:23 2014 (6 > seconds)* > *----------------------------------------START Wed Jun 25 12:21:23 2014* > *mkdir /path_to_working_dir//tempPBcR/1-overlapper* > *----------------------------------------END Wed Jun 25 12:21:23 2014 (0 > seconds)* > *----------------------------------------START Wed Jun 25 12:21:23 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -dumpfragments -tabular > asm.gkpStore |awk '{print $1"\t"$2}' > asm.eidToIID* > *----------------------------------------END Wed Jun 25 12:21:28 2014 (5 > seconds)* > *----------------------------------------START Wed Jun 25 12:21:28 2014* > */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -dumpfragments -tabular > asm.gkpStore |awk '{print $2"\t"$10}' > asm.iidToLen* > *----------------------------------------END Wed Jun 25 12:21:33 2014 (5 > seconds)* > *----------------------------------------START CONCURRENT Wed Jun 25 > 12:21:33 2014* > */path_to_working_dir//tempPBcR/1-overlapper/ovlprep.sh 1* > *Scanning store to find libraries used and reads to dump.* > *Added 0 reads to maintain mate relationships.* > *Dumping 0 fragments from unknown library (version 1 has these)* > *Dumping 133125 fragments from library IID 1* > */path_to_working_dir//tempPBcR/1-overlapper/ovlprep.sh 2* > *Scanning store to find libraries used and reads to dump.* > *Added 0 reads to maintain mate relationships.* > *...* > */path_to_working_dir//tempPBcR/1-overlapper/ovlprep.sh 23* > *Scanning store to find libraries used and reads to dump.* > *Added 0 reads to maintain mate relationships.* > *Dumping 0 fragments from unknown library (version 1 has these)* > *Dumping 66924 fragments from library IID 1* > *----------------------------------------END CONCURRENT Wed Jun 25 > 12:27:16 2014 (343 seconds)* > *----------------------------------------START CONCURRENT Wed Jun 25 > 12:27:16 2014* > */path_to_working_dir//tempPBcR/1-overlapper/overlap.sh 1* > *Running partition 000001 with options -h 1-133125 -r 133126-1597500 start > 133125 end 1597500 total 1464375 zero job 0 and stride 1* > */path_to_working_dir//tempPBcR/1-overlapper/overlap.sh 2* > *Running partition 000002 with options -h 1-133125 -r 1597501-2995674 > start 1597500 end 2995674 total 1398174 zero job 0 and stride 1* > *...* > > > Thanks, > Matt > > >> >> >> There is no 'correct_reads_part000002.dat' file there, but there is a >> 'correct_reads_part000002.fasta' file where the >> 'stream_1/correct_reads_part000002.dat' points to. I'm not sure if it is >> just an extension naming issue or if the .dat files weren't created >> properly. >> >> Also, I've found another minor issue with the '*-threads*' option >> supplied to PBcR on the command line. It doesn't seem to use the number of >> threads supplied and simply uses the max number of cpus on the machine >> available. >> >> Thanks, I'll check this and update the code. >> >> >> Thanks, >> Matt >> >> ------------------------------------------------------------------------------ >> Open source business process management suite built on Java and Eclipse >> Turn processes into business applications with Bonita BPM Community >> Edition >> Quickly connect people, data, and systems into organized workflows >> Winner of BOSSIE, CODIE, OW2 and Gartner awards >> >> http://p.sf.net/sfu/Bonitasoft_______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> >> >> > > |