Menu

Blast results especially those flagged as "unknown" taxon

David
2013-03-23
2013-03-30
  • David

    David - 2013-03-23

    Dear Marc,

    Thanks for implementing the contig ID changes.

    In the new version (and also in the older versions) Blast is done based on nucleotides.
    I am using Metawatt to sort out a low complexity community derived from an enrichment culture. In my analysis where I have a target genome (Nitrosopumilus) I have noticed that although the contigs are binned as belonging to the target, some of the sequences in the taxon column are "unknown". However, when i do Blastx for the "unknown" contigs, it turns out that most are part of the target genome (which is good) but some contain "putative contaminants". In this case, is there a way you can twick your program to include BlastX-based taxonomy binning especially of those contigs that have the same tetraneucloetide pattern as the target organism but are flagged as unknown. Probably this is redundant if one does IMM-based binning but would provide additional info in deciding what else to shortlist for the modelling.

    You also mention that taxa with less than 25 kb contig data are not displayed. May I ask what is the rationale for this?

    Thanks in advance.

    Best,

    David

     
    • Marc Strous

      Marc Strous - 2013-03-25

      Dear David,
      thanks for using metawatt...

      (1) Why blastn and not blastx?
      Blastn is faster and still gives a good profile. If you get only very
      few hits it may pay off to search for a draft genome of a suitable
      reference organism. Speed is important for example when you are
      optimizing your assembly or do not have a large computing cluster

      Blastx would be more sensitive. It is so sensitive you could even use it
      as a classifier, like you suggest in your email. What you could do is
      first classify your contigs with a blastx based classifier like webcarma:

      http://www.biomedcentral.com/1471-2105/10/430

      Then, you import your contigs classified to your target organism as a
      separate sample in metawatt and use the contigs of this sample to create
      a IMM profile to bin your metagenome. That way you can discover those
      contigs that do not contain genes homologous to the reference organism
      and were not found by the classifier...

      I will experiment with blastx and see how much slower it is compared to
      blastn. Maybe we could add it as an option in a later version.

      (2) Why minor taxa (<25000 nt) are not shown in the taxa table.
      This was suggested by other users, they found these very long tables not
      useful. If the minor taxa would define a meaningful bin of course they
      would still be visible in the pie diagram of the bin.

      Best wishes!
      Marc

      On 03/23/13 11:14, David wrote:

      Dear Marc,

      Thanks for implementing the contig ID changes.

      In the new version (and also in the older versions) Blast is done based on nucleotides.
      I am using Metawatt to sort out a low complexity community derived from an enrichment culture. In my analysis where I have a target genome (Nitrosopumilus) I have noticed that although the contigs are binned as belonging to the target, some of the sequences in the taxon column are "unknown". However, when i do Blastx for the "unknown" contigs, it turns out that most are part of the target genome (which is good) but some contain "putative contaminants". In this case, is there a way you can twick your program to include BlastX-based taxonomy binning especially of those contigs that have the same tetraneucloetide pattern as the target organism but are flagged as unknown. Probably this is redundant if one does IMM-based binning but would provide additional info in deciding what else to shortlist for the modelling.

      You also mention that taxa with less than 25 kb contig data are not displayed. May I ask what is the rationale for this?

      Thanks in advance.

      Best,

      David


      Blast results especially those flagged as "unknown" taxon


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/metawatt/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/

      --
      Prof. dr. ir. Marc Strous
      Head of the Microbial Fitness Group
      Max Planck Institute for Marine Microbiology
      Celciusstrasse 1 - 28359 Bremen - Germany
      phone 0421 2028 822 | fax 0421 2028 580
      room 3241

       
  • David

    David - 2013-03-30

    Thanks Marc!...also for the link to the Blastx-based classifier.

     

Log in to post a comment.