Again: No gaps to be assembled found in Jelly example dataset

  • Nikki

    Nikki - 2014-04-01


    I'm testing PBJelly on the example dataset provided with PBJelly. In the assembly stage I get the following error:

    2014-04-01 16:18:57,984 [INFO] Executing Stage: assembly
    2014-04-01 16:18:57,985 [WARNING] No gaps to be assembled were found in /data/PBJelly/Jelly_14.1.14/docs/jellyExample/assembly! Have you run 'extraction' yet?

    I know this problems has been addressed before on this forum, but as far as I can see the proposed solutions do not seem to work for me...

    -The correct version of networkx is installed:

    pip show networkx

    Name: networkx
    Version: 1.1
    Location: /usr/local/lib/python2.7/site-packages

    -The correct version of blasr is installed:

    blasr -version


    -Running the support step using the debug option ( support test_Protocol.xml --debug) gives:

    2014-04-01 16:33:49,887 [INFO] Running /data/PBJelly/Jelly_14.1.14/bin/ support test_Protocol.xml --debug
    2014-04-01 16:33:49,892 [INFO] Executing Stage: support
    2014-04-01 16:33:49,892 [DEBUG] /data/PBJelly/Jelly_14.1.14/docs/jellyExample/support already exists
    2014-04-01 16:33:49,893 [WARNING] Overwriting /data/PBJelly/Jelly_14.1.14/docs/jellyExample/support/filtered_subreads.fastq.gml
    2014-04-01 16:33:51,869 [DEBUG] CommandRunner Returned: [(0, '', None)]
    2014-04-01 16:33:51,870 [INFO] Finished Running Stage: support

    There are reads mapped, and filtered_subreads.fastq.err looks like:

    2014-04-01 16:33:00,663 [INFO] Running /data/PBJelly/Jelly_14.1.14/bin/ /data/PBJelly/Jelly_14.1.14/docs/jellyExample/mapping/filtered_subreads.fastq.m4 /data/PBJelly/Jelly_14.1.14/docs/jellyExample/data/reads/filtered_subreads.fastq /data/PBJelly/Jelly_14.1.14/docs/jellyExample/data/reference/lambda.fasta --nproc 4 -i
    2014-04-01 16:33:00,723 [INFO] Extracting tails
    2014-04-01 16:33:00,728 [INFO] Parsed 450 reads
    2014-04-01 16:33:00,728 [INFO] Found 311 tails
    2014-04-01 16:33:00,728 [INFO] 29 reads had double tails
    2014-04-01 16:33:00,728 [INFO] Mapping Tails
    2014-04-01 16:33:04,477 [INFO] [0, '[INFO] 2014-04-01T16:33:00 [blasr] started.\n[INFO] 2014-04-01T16:33:04 [blasr] ended.\n', None]
    2014-04-01 16:33:04,478 [INFO] Consolidating alignments
    2014-04-01 16:33:04,511 [INFO] 192 tails mapped

    -I also tried changing the minimal mapping qv: support test_Protocol.xml -x --minMapq=45

    But this resulted in the same warning of : [WARNING] No gaps to be assembled were found in /data/PBJelly/Jelly_14.1.14/docs/jellyExample/assembly (after running the extraction step).

    -The test_Protocol.xml file looks like:

    <blasr>-minMatch 8 -minPctIdentity 70 -bestn 1 -nCandidates 20 -maxScore -500 -nproc 4 -noSplitSubreads</blasr>
    <input baseDir="/data/PBJelly/Jelly_14.1.14/docs/jellyExample/data/reads/">

    Does anybody have some suggestions why no gaps are assembled using the PBJelly example dataset (and also not using "real data":-)? Thanks a lot for your help!


  • David

    David - 2014-05-19


    I'm having a similar issue, however when I ran the mapping stage, blasr never wrote the .m4 file used to extract gaps. Does anyone use the -m 4 option in their protocol? The information for setting up blasr is not easy to find.

  • David Mathog

    David Mathog - 2014-08-04

    Me too.

    Tried changing "-bestn 5" to "-bestn 1" as one other thread suggested. That changed the counts in mapping/filtered_subreads.fastq.err (made them smaller) but otherwise had no effect. Tried the patch for that does:

    <             inputGml = read_gml(i)
    >             inputGml = read_gml(i, relabel=True)

    and that also didn't help.

    networkx is 1.1
    python is 2.7.7
    blasr -version is 1.3.1 (this was built from sourceforge trunk on 6/18/2014.)

    This would be a whole lot easier to debug if the example included not just the input data but the expected results. With that "diff" would tell me exactly where the first problem arose, here we are just trading anecdotes. Can the authors please provide a "jellyExampleReference" directory showing a good run with exactly the Protocol.xml that was used?


    Last edit: David Mathog 2014-08-04
  • Adam English

    Adam English - 2014-08-12

    Good Points. I've added documentation to the Wiki:

    You're possibly still not getting through because relabel is an unexpected keyword argument in networkx v1.1. Inside of support/filtered_subreads.fastq.err you're possibly seeing an error like
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: read_gml() got an unexpected keyword argument 'relabel'

    Something that I'll try to clarify in the documentation is that there are generally two log files created. One to the stderr of wherever the command is being run, but there are also detailed logs saved inside of the <outputDir> directories (e.g. support/filtered_subreads.fastq.err) and you should check those, as well. The wiki page I provided has details on this.

    Let me know if you find anything with this information.

    • David Mathog

      David Mathog - 2014-08-19

      Found one thing, the file filtered_subreads.fastq is corrupt. The first entry starts with (regular expression) '^@' as it should, but all subsequent entries start with '^ @'. None of the other lines had a problem with leading spaces. Correcting this issue unfortunately did not resolve the "No gaps to be assembled" issue. Will dig through the log files in detail now.

      Ah, in assembly/extraction.err there was a Python error about not being able to import pyparsing. Installed that with

      easy_install py_parsing

      and started over. This finally got rid of the "No gaps" message. jelly.out.fasta

      looks similar to the one you posted except that all of the values are "48,464" instead of "48,505". I would wager that this is related to problems in either blasr or pbdagcon, because I saw exactly the same sort of variation when running the test data that came with PBcR.

      The log files diverge at assembly/extraction.err where my run had 108 reads and the one you posted had 111.

      assembly/ref0000001.0e3_ref0000001.1e5/assembly.err had a similar format but differed in most of the numeric values. "-p 961" instead of your "-p 50"; "13 reads, 15 tails" instead of "15 reads, 16 tails", etc.

      Ran it again to see if any of these values are stable. Nope. This time it was "48,471".

      Are the results numerically stable on your system? If they are, are you using blasr/pbdagcon from github or from SMRT Portal?

      Last edit: David Mathog 2014-08-19
  • David Mathog

    David Mathog - 2014-08-19

    Also, it seems that ./data/reference/lambda.fasta is not stable over runs. The sequence doesn't change but the headers have more and more "|ref0000001" appended to them. It isn't clear to my why anything would have a need to modify the reference, which should, one would normally assume, be a constant kept around for comparison, not something used by PB-Jelly's processing.

  • Adam English

    Adam English - 2014-08-19

    the ref\d{7} tag is an identifier that makes tracking contigs more stable since Jelly creates a number of temporary files. Fasta headers don't have the same restrictions as unix file names. Multiple |ref\d{7} tags means you've run setup multiple times. If you'd like to prevent the ref from being added you can use the --noRename parameter. Only do this if you've run setup at least once without the parameter. The differances in the -p parameter are likely due to this reftag rename. your ref0000001* gap is the 961 bp gap, not the 50bp gap. I'll look into what else might be causing the minor differences.

  • David Mathog

    David Mathog - 2014-08-19

    I'm curious why the decision was made to modify the headers in the input fasta rather than to construct a lookup table to hold the information, which since the ref's seem to be numbered in straight order, would have just been the fasta headers minus the ">", with lookups by line number. The intermediate files PB-Jelly creates can use whatever internal header names it wants, the originals only need to show up again, and maybe not even then if it all snaps together, in the final output files.

    PB-Jelly does try to keep a clean copy (in blah.fasta.original). Unfortunately that piece of code isn't very robust, since it seems to do:

    cp blah.fasta blah.fasta.original

    without regard for the existence of an earlier copy of blah.fasta.original.

    My excuse for running setup multiple times is that, since I had no idea what it did, it seemed safer to run it again than to omit it.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks