Menu

fasta labeling in scaffold output

2016-06-27
2016-07-02
  • Carlos Llorens

    Carlos Llorens - 2016-06-27

    Hi again Wences,
    i am finishing a metassembly based on several assemblies and have take a look to the scaffold output and have a small question about the fasta labels assigned to some scaffolds. if i make grep ">" files.fasta > labels.txt obtain a
    summary of the fasta sequence names and see that there are a first number of sequences labeled as

    Q4D73.3D53.2D33.1D63.0REF_0
    .
    .
    .
    Q4D73.3D53.2D33.1D63.0REF_26071
    Q4D73.3D53.2D33.1D63.0REF_26072

    these are scaffolds from 0 to 26072 (that is clear)

    But then i also see that several scaffolds are labelled as

    Q4D73.3D53.2D33.1D63.0REF_26074 Primary_Q3D53.2D33.1D63.0REF_31511
    Q4D73.3D53.2D33.1D63.0REF_26075 Primary_Q3D53.2D33.1D63.0REF_31106
    Q4D73.3D53.2D33.1D63.0REF_26076 Primary_Q3D53.2D33.1D63.0REF_19327

    I am not sure what this means, it refer to repeats or any other kind of correspondence?

    i though there are scaffolds provided only by assembly A not present in assembly B but relabeled according to the metassembly sort

    is that correct?

    thank you in advance

    carlx

     
    • Alejandro Hernandez Wences

      Hi,

      These are scaffolds present in the primary assembly (or secondary assembly
      respectively) that were initially excluded from the metassembly sequence
      either because they did not appear in any alignment in the Whole Genome
      Alignment preformed with Nucmer or because they were filtered out in the
      delta-filter step (also from the MUMMER package); but that were recovered
      according to the meta2fasta parameters --keepUnaligned and --keepDF
      respectively.

      Wences

      On Mon, Jun 27, 2016 at 6:06 PM, Carlos Llorens cllorens@users.sf.net
      wrote:

      Hi again Wences,
      i am finishing a metassembly based on several assemblies and have take a
      look to the scaffold output and have a small question about the fasta
      labels assigned to some scaffolds. if i make grep ">" files.fasta >
      labels.txt obtain a
      summary of the fasta sequence names and see that there are a first number
      of sequences labeled as

      Q4D73.3D53.2D33.1D63.0REF_0
      .
      .
      .
      Q4D73.3D53.2D33.1D63.0REF_26071
      Q4D73.3D53.2D33.1D63.0REF_26072

      these are scaffolds from 0 to 26072 (that is clear)

      But then i also see that several scaffolds are labelled as

      Q4D73.3D53.2D33.1D63.0REF_26074 Primary_Q3D53.2D33.1D63.0REF_31511
      Q4D73.3D53.2D33.1D63.0REF_26075 Primary_Q3D53.2D33.1D63.0REF_31106
      Q4D73.3D53.2D33.1D63.0REF_26076 Primary_Q3D53.2D33.1D63.0REF_19327

      I am not sure what this means, it refer to repeats or any other kind of
      correspondence?

      i though there are scaffolds provided only by assembly A not present in
      assembly B but relabeled according to the metassembly sort

      is that correct?

      thank you in advance

      carlx

      fasta labeling in scaffold output
      https://sourceforge.net/p/metassembler/discussion/general/thread/ef88195e/?limit=25#ab2c


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/metassembler/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.