Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

questions on pbjelly output

Nikki
2014-07-15
2014-08-13
  • Nikki
    Nikki
    2014-07-15

    Hi,

    I'm using PBjelly to improve my genome assembly using long sequence reads.
    The run completes successfully, and I'm now trying to understand what has
    been changed in the assembly and why. This has resulted in a couple of
    questions on the output generated:

    1a) Why and how is the assignment of the new reference names done by
    PBJelly? For example, I have 3 reference scaffolds, that after renaming by
    PBJelly have names that look like:

    9250459_1162_24830|ref0158577|ref0122806|ref0096825
    4075816_1513_39396|ref0142504|ref0052326|ref0158577
    11012802_95505_2694272_561104-,...,10967441+|ref0226469|ref0158577|ref0182998

    • Why are 3 new names assigned to 1 scaffold? (e.g. ref0158577, ref0122806
      and ref0096825 to the first scaffold?)
    • Why are those "subnames" (e.g. ref0158577) not unique?

    2) In the gap_fill_status.txt file I see that a gap gets filled in
    ref0158577:

    ref0158577.1e3_ref0158577.2e5 filled
    ref0158577.2e3_ref0158577.3e5 filled

    • What do the .1 and .2 mean in ref0158577.1e3_ref0158577.2e5?

    • How do I figure out in which of the three original scaffolds (that all
      have a new name containing 'ref0158577') have been filled?

    3) In jelly.out.fasta the scaffolds have new names looking like Contig0
    etc. Is there somewhere where I can find back from which of the original
    scaffolds this contig is derived?

    4) In gap_fill_status.txt I see the following types of changes:

    doubleextend
    filled
    nofillmetrics
    overfilled
    singleextend

    -What does 'overfilled' mean? Is this an inter-scaffold connection? If not,
    how can I see which inter-scaffold connections have been made (if any)?

    -In jelly.out.fasta the smallest contig is ~300 nt, while in my input file
    the smallest scaffold was 1000 nt. This indicates that pbjelly has split
    something up. Where can I find back why one or more scaffolds have been
    split up, and which ones?

    -Could you give a short explanation on what a line like this in the
    gapInfo.bed file means:

    10515137_3627_104321_1776636+,...,7058545+|ref0197248|ref0108226|ref0155877
    na na ref0155877_0_0 3

    Sorry for all the questions, and thanks a lot for your help!

    With kind regards,

    Nikkie

     
    • Nikki
      Nikki
      2014-08-13

      ---------- Forwarded message ----------
      From: Nikkie van bers nikkie.vanbers@gmail.com
      Date: 15 July 2014 17:11
      Subject: questions on pbjelly output
      To: pbjtiks@discussion.pb-jelly.p.re.sf.net

      Hi,

      I'm using PBjelly to improve my genome assembly using long sequence reads.
      The run completes successfully, and I'm now trying to understand what has
      been changed in the assembly and why. This has resulted in a couple of
      questions on the output generated:

      1a) Why and how is the assignment of the new reference names done by
      PBJelly? For example, I have 3 reference scaffolds, that after renaming by
      PBJelly have names that look like:

      9250459_1162_24830|ref0158577|ref0122806|ref0096825
      4075816_1513_39396|ref0142504|ref0052326|ref0158577
      11012802_95505_2694272_561104-,...,10967441+|ref0226469|ref0158577|ref0182998

      • Why are 3 new names assigned to 1 scaffold? (e.g. ref0158577, ref0122806
        and ref0096825 to the first scaffold?)
      • Why are those "subnames" (e.g. ref0158577) not unique?

      2) In the gap_fill_status.txt file I see that a gap gets filled in
      ref0158577:

      ref0158577.1e3_ref0158577.2e5 filled
      ref0158577.2e3_ref0158577.3e5 filled

      • What do the .1 and .2 mean in ref0158577.1e3_ref0158577.2e5?

      • How do I figure out in which of the three original scaffolds (that all
        have a new name containing 'ref0158577') have been filled?

      3) In jelly.out.fasta the scaffolds have new names looking like Contig0
      etc. Is there somewhere where I can find back from which of the original
      scaffolds this contig is derived?

      4) In gap_fill_status.txt I see the following types of changes:

      doubleextend
      filled
      nofillmetrics
      overfilled
      singleextend

      -What does 'overfilled' mean? Is this an inter-scaffold connection? If not,
      how can I see which inter-scaffold connections have been made (if any)?

      -In jelly.out.fasta the smallest contig is ~300 nt, while in my input file
      the smallest scaffold was 1000 nt. This indicates that pbjelly has split
      something up. Where can I find back why one or more scaffolds have been
      split up, and which ones?

      -Could you give a short explanation on what a line like this in the
      gapInfo.bed file means:

      10515137_3627_104321_1776636+,...,7058545+|ref0197248|ref0108226|ref0155877
      na na ref0155877_0_0 3

      Sorry for all the questions, and thanks a lot for your help!

      With kind regards,

      Nikkie