From: Matthew C. <co...@gm...> - 2014-06-25 18:33:41
|
Hi Serge, On Wed, Jun 25, 2014 at 11:36 AM, Serge Koren <ser...@gm...> wrote: > Hi, > > On Jun 24, 2014, at 5:40 PM, Matthew Conte <co...@gm...> wrote: > > Hi all, > > I'm trying out PBcR to make use of the new MHAP overlapper for self > correcting a set of PacBio reads and I'm running into an issue. > > I'm getting the following errors in the temp_dir/1-overlapper/1.err: > *Exception in thread "main" java.io.FileNotFoundException: > /raid3/PBcR_CA_8.2_alpha/tempLibrary/1-overlapper/stream_1/correct_reads_part000002.dat > (No such file or directory)* > > The dat file is a pre-computed index that is used to speed up the > computation for smaller genomes. For larger genomes or if you are using > local disk, it should not get created. Do you have the output of the > pipeline up to this step along with the command-line you used to start the > run? That will help diagnose why it is not properly recognizing that the > index is not built. As a workaround, you can add "localStaging=</path to > local disk>" to your PBcR command which will force the pipeline to never > pre-compute the index. > The command that I ran was: */sw/wgs-8.2alpha/Linux-amd64/bin/PBcR "mhap=-k 16 --num-hashes 1256 --num-min-matches 3 --threshold 0.04 localStaging=/path_to_working_dir/temp_staging" merSize=16 -length 500 -partitions 200 -threads 27 -lib* *aryname PBcR -s pacbio.spec fastqFile=filtered_subreads.bbmap.rm_adapters.split.fastq -genomeSize 1000000000* I changed the MHAP settings according to the PBcR wiki since I only have about 16x coverage of PacBio data. I should mention that runCA continues to run until the '5-consensus' step, and errors out there. But I think the start of the problem is at this overlap step. The relevant output was: *### Reading options from 'pacbio.spec'* *### Reading options from the command line.* *Warning: no frag files specified, assuming self-correction of pacbio sequences.* *Running with 27 threads and 200 partitions* ********** Starting correction...* *...* ********* Configuration Summary ********* *bankPath = * *maxCoverage = 40* *...* *mhap = -k 16 --num-hashes 1256 --num-min-matches 3 --threshold 0.04 localStaging=/path_to_working_dir/temp_staging* *ovlRefBlockLength = 100000000000* *cnsErrorRate = 0.25* *...* *----------------------------------------START Wed Jun 25 11:24:30 2014* *mkdir tempPBcR* *----------------------------------------END Wed Jun 25 11:24:30 2014 (0 seconds)* *----------------------------------------START Wed Jun 25 11:24:30 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/fastqToCA -libraryname PBcR -type sanger -technology none -feature doConsensusCorrection 1 -reads /path_to_working_dir/filtered_subreads.bbmap.rm_adapters.split.fastq > /path_to_working_dir//tempPBcR/PBcR.frg* *----------------------------------------END Wed Jun 25 11:24:30 2014 (0 seconds)* *----------------------------------------START Wed Jun 25 11:24:30 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/runCA -s /path_to_working_dir//tempPBcR/PBcR.spec -p asm -d tempPBcR stopAfter=initialStoreBuilding /path_to_working_dir//tempPBcR/PBcR.frg* *----------------------------------------START Wed Jun 25 11:24:30 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -o /path_to_working_dir/tempPBcR/asm.gkpStore.BUILDING -F /path_to_working_dir//tempPBcR/PBcR.frg > /path_to_working_dir/tempPBcR/asm.gkpStore.err 2>&1* *----------------------------------------END Wed Jun 25 11:35:27 2014 (657 seconds)* *numFrags = 2995674* *Stop requested after 'initialstorebuilding'.* *----------------------------------------END Wed Jun 25 11:35:27 2014 (657 seconds)* *Will be correcting PacBio library 1 with librarie[s] 1 - 1* *----------------------------------------START Wed Jun 25 11:35:29 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -dumpfragments -invert -tabular -longestovermin 1 500 -longestlength 1 8268329152 /path_to_working_dir//tempPBcR/asm.gkpStore 2> /path_to_working_dir//tempPBcR/asm.seedlength |awk '{if (!(match($1, "UID") != 0 && length($1) == 3)) { print "frg uid "$1" isdeleted 1"; } }' > /path_to_working_dir//tempPBcR/asm.toerase.uid* *----------------------------------------END Wed Jun 25 11:35:38 2014 (9 seconds)* *----------------------------------------START Wed Jun 25 11:35:38 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper --edit /path_to_working_dir//tempPBcR/asm.toerase.uid /path_to_working_dir//tempPBcR/asm.gkpStore > /path_to_working_dir//tempPBcR/asm.toerase.out 2> /path_to_working_dir//tempPBcR/asm.toerase.err* *----------------------------------------END Wed Jun 25 11:35:44 2014 (6 seconds)* *Running with 8.268329256X (for genome size 1000000000) of PBcR sequences (8268329256 bp).* *Correcting with 16X sequences (16536658304 bp).* *Warning: performing self-correction with a total of 16. For best performance, at least 50 is recommended.* *----------------------------------------START Wed Jun 25 11:35:44 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/jellyfish count -m 16 -s 120000000 -t 32 -o /path_to_working_dir//tempPBcR/asm.mers /path_to_working_dir/filtered_subreads.bbmap.rm_adapters.split.fastq* *----------------------------------------END Wed Jun 25 12:05:11 2014 (1767 seconds)* *----------------------------------------START Wed Jun 25 12:05:11 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/jellyfish histo -t 32 -f /path_to_working_dir//tempPBcR/asm.mers > /path_to_working_dir//tempPBcR/asm.hist* *----------------------------------------END Wed Jun 25 12:09:10 2014 (239 seconds)* *----------------------------------------START Wed Jun 25 12:09:10 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/jellyfish dump -c -t -L 34 /path_to_working_dir//tempPBcR/asm.mers |awk -v TOTAL=3328265613 '{printf("%s\t%0.10f\t%d\t%d\n", $1, $2/TOTAL, $2, TOTAL)}' |sort -T . -rnk2> /path_to_working_dir//tempPBcR/asm.ignore* *----------------------------------------END Wed Jun 25 12:21:17 2014 (727 seconds)* *----------------------------------------START Wed Jun 25 12:21:17 2014* *rm /path_to_working_dir//tempPBcR/asm.mers** *----------------------------------------END Wed Jun 25 12:21:23 2014 (6 seconds)* *----------------------------------------START Wed Jun 25 12:21:23 2014* *mkdir /path_to_working_dir//tempPBcR/1-overlapper* *----------------------------------------END Wed Jun 25 12:21:23 2014 (0 seconds)* *----------------------------------------START Wed Jun 25 12:21:23 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $1"\t"$2}' > asm.eidToIID* *----------------------------------------END Wed Jun 25 12:21:28 2014 (5 seconds)* *----------------------------------------START Wed Jun 25 12:21:28 2014* */sw/wgs-8.2alpha/Linux-amd64/bin/gatekeeper -dumpfragments -tabular asm.gkpStore |awk '{print $2"\t"$10}' > asm.iidToLen* *----------------------------------------END Wed Jun 25 12:21:33 2014 (5 seconds)* *----------------------------------------START CONCURRENT Wed Jun 25 12:21:33 2014* */path_to_working_dir//tempPBcR/1-overlapper/ovlprep.sh 1* *Scanning store to find libraries used and reads to dump.* *Added 0 reads to maintain mate relationships.* *Dumping 0 fragments from unknown library (version 1 has these)* *Dumping 133125 fragments from library IID 1* */path_to_working_dir//tempPBcR/1-overlapper/ovlprep.sh 2* *Scanning store to find libraries used and reads to dump.* *Added 0 reads to maintain mate relationships.* *...* */path_to_working_dir//tempPBcR/1-overlapper/ovlprep.sh 23* *Scanning store to find libraries used and reads to dump.* *Added 0 reads to maintain mate relationships.* *Dumping 0 fragments from unknown library (version 1 has these)* *Dumping 66924 fragments from library IID 1* *----------------------------------------END CONCURRENT Wed Jun 25 12:27:16 2014 (343 seconds)* *----------------------------------------START CONCURRENT Wed Jun 25 12:27:16 2014* */path_to_working_dir//tempPBcR/1-overlapper/overlap.sh 1* *Running partition 000001 with options -h 1-133125 -r 133126-1597500 start 133125 end 1597500 total 1464375 zero job 0 and stride 1* */path_to_working_dir//tempPBcR/1-overlapper/overlap.sh 2* *Running partition 000002 with options -h 1-133125 -r 1597501-2995674 start 1597500 end 2995674 total 1398174 zero job 0 and stride 1* *...* Thanks, Matt > > > There is no 'correct_reads_part000002.dat' file there, but there is a > 'correct_reads_part000002.fasta' file where the > 'stream_1/correct_reads_part000002.dat' points to. I'm not sure if it is > just an extension naming issue or if the .dat files weren't created > properly. > > Also, I've found another minor issue with the '*-threads*' option > supplied to PBcR on the command line. It doesn't seem to use the number of > threads supplied and simply uses the max number of cpus on the machine > available. > > Thanks, I'll check this and update the code. > > > Thanks, > Matt > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > > http://p.sf.net/sfu/Bonitasoft_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > > |