Menu

#29 making the count matrix from a fasta reference

--
open
nobody
None
False
5
2014-01-04
2013-12-21
JahnDavik
No

Dear Simon Anders,
I am working with a non-model organism and have used Trinity to make the de novo transcriptome (which is essentially a flat file with ~620.000 transcripts in this case). Now, I need the counts for downstream analyses. Well, Trinity generates a counts matrix using RSEM. However, this is not a matrix of integers as, say, edgeR and/or DESeq requires (if I get this right).
So, I would like to generate the counts matrix from the experiment and would like to inquire if this sounds is a 'htseq-counts'-problem easily solved, given the files that I have, i.e., 1) the fasta 'reference' transcriptome and 2) the PE read files.

Thank you.
Regards

Discussion

  • Simon Anders

    Simon Anders - 2014-01-04

    If you have aligned to a transcriptome, the each gene is a different reference sequence, right? So, you only need to look at the RNAME field in the SAM file and ignore RPOS altogether. This does not look like a problem for htseq-count. You can, of course, use HTSeq to write your own counting script, but maybe some script-fu already does the job.

    BTW, I hope you have though about how to resolve ambiguities due to isoforms.

     
    • JahnDavik

      JahnDavik - 2014-01-14

      Dear Simon,
      If I interpret you correctly, than this is not a problem that HTSeq would solve as it is now.
      I have been made aware that the corset-project has built a tool for my case.
      Anyway, thanks for the feedback.

      jahn

      Fra: Simon Anders [mailto:sanders_muc@users.sf.net]
      Sendt: 5. januar 2014 00:05
      Til: [htseq:support-requests]
      Emne: [htseq:support-requests] #29 making the count matrix from a fasta reference

      If you have aligned to a transcriptome, the each gene is a different reference sequence, right? So, you only need to look at the RNAME field in the SAM file and ignore RPOS altogether. This does not look like a problem for htseq-count. You can, of course, use HTSeq to write your own counting script, but maybe some script-fu already does the job.

      BTW, I hope you have though about how to resolve ambiguities due to isoforms.


      [support-requests:#29]http://sourceforge.net/p/htseq/support-requests/29/ making the count matrix from a fasta reference

      Status: open
      Created: Sat Dec 21, 2013 06:16 AM UTC by JahnDavik
      Last Updated: Sat Dec 21, 2013 06:16 AM UTC
      Owner: nobody

      Dear Simon Anders,
      I am working with a non-model organism and have used Trinity to make the de novo transcriptome (which is essentially a flat file with ~620.000 transcripts in this case). Now, I need the counts for downstream analyses. Well, Trinity generates a counts matrix using RSEM. However, this is not a matrix of integers as, say, edgeR and/or DESeq requires (if I get this right).
      So, I would like to generate the counts matrix from the experiment and would like to inquire if this sounds is a 'htseq-counts'-problem easily solved, given the files that I have, i.e., 1) the fasta 'reference' transcriptome and 2) the PE read files.

      Thank you.
      Regards


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/htseq/support-requests/29/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.