First, thank you for providing this software. I am a new user of PBJelly. I've installed it as instructed and have run the dummy data and passed. When I was running it with real data, everything went smoothly until the "assembly" step and error message was "No gaps to be assembled were found". I went back to check .err files of every previous step.
In the mapping step, there was an error message like this.
2014-07-02 17:07:36,018 [INFO] Running /usr/local/bioinf/PBSuite_14.6.24//bin//m4pie.py /home/jay/temp/JSC1/mapping/m140608_213146_42146_c100642192550000001823128610151427_s1_p0.1.subreads.fastq.m4 /home/jay/temp/JSC1/reads/m140608_213146_42146_c1006421925500000018
23128610151427_s1_p0.1.subreads.fastq /home/jay/temp/JSC1/reference/JSC1.fasta --nproc 8 -i
2014-07-02 17:07:38,877 [INFO] Extracting tails
Traceback (most recent call last):
File "/usr/local/bioinf/PBSuite_14.6.24//bin//m4pie.py", line 206, in <module>
File "/usr/local/bioinf/PBSuite_14.6.24//bin//m4pie.py", line 185, in run
r, t, m = extractTails(aligns, reads, outFq=tailfastq, minLength=args.minTail)
File "/usr/local/bioinf/PBSuite_14.6.24//bin//m4pie.py", line 50, in extractTails
seq = reads[read.qname][:pTail]
The 'm140608_213146_42146_c100642192550000001823128610151427_s1_p0/219/4863_9930' was the name of a read. The mapping step seemed to produce a normal m4 file.
The output in "support" step seemed normal. In the "extraction" step, log looks like this.
2014-07-02 19:44:44,080 [INFO] Parsing /home/jay/temp/JSC1/reads/m140611_101939_42146_c100642462550000001823129210151477_s1_p0.3.subreads.fastq
2014-07-02 19:44:58,811 [INFO] Loaded 61375 Reads
2014-07-02 19:44:59,317 [INFO] Parsed 0 Reads
2014-07-02 19:44:59,726 [INFO] Finished
masterSupport.bml in extraction/ also looks normal, it has a bunch of lines started with 'extend' or 'evidence'.
By the way, my networkx version is 1.1, blasr version is 1.3.1.
Could you help me figure this out, please? Thank you very much.
This is likely due to your subreads.fastq file having spaces in the read name. Blasr doesn't use the spaces when reporting what read maps where, and PBJelly reads the entire entry. So where Blasr reports "m130611_...p0/218/0_1000", the fastq likely has something like "m130611_...p0/218/0_1000 RQ=0.862"
If you remove the " RQ=*" from the fastq read names, you can resume processing at the 'extraction' stage without needing to remap/resupport. I'd recommend making a backup of the subreads.fastq just in case.
Thank you. This solved my problem.