transdecoder-users Mailing List for TranscriptDecoder
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
You can subscribe to this list here.
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 |
Jan
(12) |
Feb
(14) |
Mar
(4) |
Apr
(8) |
May
(17) |
Jun
(14) |
Jul
(21) |
Aug
(8) |
Sep
(5) |
Oct
(8) |
Nov
(1) |
Dec
(1) |
2015 |
Jan
(9) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Brian H. <bh...@br...> - 2015-01-27 00:54:57
|
Greetings all. A new and revised version of TransDecoder is now available. Note, we've moved the project to github, so please visit the new website: http://transdecoder.github.io and we have a new google group for user support and discussions at: https://groups.google.com/forum/#!forum/transdecoder-users This new version of TransDecoder has the following features: Blast homology (new) in addition to Pfam domain hits (earlier) can be used to ensure that protein-homologous regions are retained among the coding region predictions. The software now runs in two (or 3) distinct phases: 1. There's TransDecoder.LongOrfs that is used to identify all candidate ORFs. (1b) search these using blast or pfam searches 2. There's TransDecoder.Predict that does the final predictions, optionally using the search results from (1b) The new web documentation provides pointers for how to do the blast or pfam searches, as well as directing users to the new http://hpcgridrunner.github.io/ project to facilitate grid-level bioifx computes needed for this. Also, for those that are Trinotate ( http://trinotate.sf.net) users, the outputs from (1b) will be directly suitable for upload into Trinotate. The software build process is hugely simplified and the package itself has been trimmed down to include only the essentials (cd-hit). -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
From: Brian H. <bh...@br...> - 2015-01-13 14:08:26
|
Hi, For transdecoder to work, it needs the full sets of transcripts to operate on, from which it then builds a statistical model and ultimately goes through the full set of transcripts and predicted ORFs to find those that look to be coding based on the model. Running it on just a couple of transcripts isn't going to produce useful results, is my guess. Also, the ORF must be sufficiently long (min length is 100 aa or 300 nts) and not frameshifted or otherwise interrupted. Please use the transdecoder user support mailing list (CC'd) for user support. best wishes, ~brian On Tue, Jan 13, 2015 at 2:47 AM, Srilakshmy Lakshminarayanapuram < sri...@sl...> wrote: > Hello Brian, > > > I am a Master student working on my thesis right now with RNA seq data. > I am stuck with the use of Transdecoder at this point of time. > > I did an assembly of the transcriptome with Trinity and transabyss. My > goal is to identify paralogs in the assembly of a reported whole genome > duplication event. > > I work on a non-model species. > > My workflow so far is: > > 1. Assembled transcriptome (~200,000 contigs) > > > 2. Filtered based on expression and length (FPKM :1 and length of the > transcript > 500) > > > 3. Self reciprocal blast search to identify paralogs. Got the pairs of > best hits from the list. > > > 4.Just to test, applied a very simple nucleotide substitution model and > calculated K (substitution rate) for the pairs of paralogs identified from > the blast. > > > 5. We got a very high single peak form K rate and just a little > conformation that at least paralogs might exists in the assembly. > > > But to get a clear picture, we wanted to look at peptide level. > > > So for each pair of paralogs we identified from blastn, i'm trying > to convert the nucleotide sequence in to peptide. But for most of the > pairs, they seem not to convert but just gives an empty file. > > > > I even tried to convert the whole filtered assembly to peptide, even there > some of the sequences does not seem to get converted to peptide sequence. > In few there seem two ORF's predicted. > > > > My problem is i know they could be paralogs, so to test i need to change > in to peptide sequence. I will post my query and also an example of > sequence which is not changed. > > > Query: > > $TRINITY_RNASEQ_ROOT/trinity-plugins/transdecoder/TransDecoder -t $1pep.fa > --CPU 32 > > I will attach 1pep.fa for your reference. > > Your help will be greatly appreciated. > > > > Thanks for your time, > > Srilakshmy > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
From: Brian H. <bh...@br...> - 2015-01-09 17:12:55
|
ok - I can pretty much guarantee you that we're not going to see eye-to-eye on this, and we'll just have to agree to disagree. Alexie and I will do what we can, but it'll be what it is in the end and we're not planning to jump through hoops that we don't feel are essential or contradict how we (or I) feel about academic software. TransDecoder doesn't do any frameshift detection. You'd need some other process to correct errors before applying TransDecoder, as TransDecoder assumes that the sequence you're giving it is error-free. If you have a frameshift and that produces multiple ORFs that each look to have coding potential, and they meet the length requirement, then it'll report them as separate ORFs. Note, the TransDecoder algorithm is extremely simple and will produce many false positives as the length of the ORF decreases. If we end up properly addressing that issue, then I think the system would be worth publishing. If anyone has ideas - let's pursue it! Other than that, Pfam searches (and we'll include blastp or blastx searches in the next release) will yield improved sensitivity for those ORFs that are otherwise not meeting the coding metrics according to sequence composition. cheers, ~b |
From: Martin M. <mmo...@gm...> - 2015-01-09 16:54:07
|
Hi Brian, Brian Haas wrote: > Hi all, > > In general, I prefer for the system to be retained as a fully > self-contained unit. I really would *not* want any of the scripts to > show up in /usr/bin, or for the perl libraries to contaminate (for > lack of a better word) any site-perl libraries. I do not mind if all, maybe except TransDecoder main script, appear in /usr/share/TransDecoder/util we point PATH to it. Or, we can use environment variable TRANSDECODER to point to /usr/share/TransDecoder. But, the perl modules I would install under /usr/lib64/perl5/vendor_perl/5.16.3/TransDecoder, like most apps do. > > Keeping everything self-contained within the one package allows one > to easily move it around or delete it in its entirety. In this case, The packaging system enables user to uninstall a package seamlessly as well. I infer you want that unpacked directory to be functional. That can be done by telling users (or making it a shell-wrapper script TransDecoder.sh) to execute: export PATH=$PATH:`pwd`:`pwd`/3rd_party/TransDecoder_r20140704:`pwd`/3rd_party/TransDecoder_r20140704/3rd_party/ffindex-0.9.9.3/src:`pwd`/3rd_party/TransDecoder_r20140704/3rd_party/parafly-r2013-01-21 export TRANSDECODER=`pwd` export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`/3rd_party/TransDecoder_r20140704/3rd_party/ffindex-0.9.9.3/src ./TransDecoder $* > they simply need to have Transdecoder in their PATH, and ideally the > whole thing just works. I attach two patches I needed because the installed apps looked for /usr/bin/util . With the above 3 commands the unpacked distribution should still be functional and match your intent. > The one key issue that I'm happy to consider is what dependencies to > bundle with the system vs. what dependencies require users to supply > externally. In this case, my personal opinion is that the very > commonly installed bioinformatics tools and very large software > packages (ie. samtools, hmmer, bowtie, etc.) need not be bundled, but > for other ones that may be less commonly installed and dependent > within the software package (ie. ffindex, parafly, and possibly > cdhit) or those that have specific version compatibility issues (ie. > rsem in Trinity), I'm inclined to simply bundle them into the package Although I do not agree, especially because distro will NOT allow user to fetch the bundle unless he/she agrees with *all* LICENSEs in it, so you only complicate the situation and the person who made a package for a distro will be blamed if he did not realize there are multiple licenses in the bundle. This is especially case for trinity (because there is some code some people will not be comfortable with). I don't remember it from top of my head but it is in the email archives from 2014. Provided user agreed with a licensing scheme, the package maintainers will easily do something similar to what I did: just zap the 3rd_party directory contents and install "manually" or via a future Makefile doing the *install* step. The only thing which is necessary for us, is that the tools will not look into "/usr/bin/util", which is currently derived as '/usr/bin' + '/util'. > to ease installation and better ensure overall system integrity. As > long as those bundled packages remain inside the self-contained > software package, and don't contaminate or replace other > already-installed software tools, and are specifically leveraged by > the driving software within the self-contained package (by internally > adjusting PATH env var, or looking for tools via relative paths to > the package installation directory), I think it should be both > acceptable and non-disruptive. I should also mention that we don't > and won't plan to bundle any software that has restrictive licensing > issues nor that is not open source. I think the above TransDecoder.sh trick should meet your criteria although I did not test it. > > Now - with that said - TransDecoder does need to be seriously > overhauled with respect to its makefile, build mechanism, etc., and I think the only "pressing" issue is either acceptance of the attached patches or of something functionally similar, and where to place the *.pm files and to ensure that they get found via PERL INC path. I don't care about the Makefile anymore as I now know what layout is needed. > separately from that, it needs some additional enhancements to make > it even more useful to users in the next release. So, after this next > generation of the Trinity software goes out, we'll tackle > TransDecoder and whip it into shape. But note that, unless there's > some major shift in my method of operation with respect to software > packaging issues, it'll fit the general description that I outline > above. Alexie and I will work on it together, and we'll try to do > what we can to address any lingering issues that Martin might raise.> > cheers, The Makefile should check whether a user has hmmpress, there is no check for the binary. I think it could also tell users whether they can stay with earlier than hmmer-3.0 (so whether there is something like hmmpress in the older version of hmmer). Do not know myself. I only know that hmmer-3.0 does not have yet all functionality of hmmer-2 series, so some users just cannot "upgrade". Basically, you could document what is needed and when. I suspect that there is no real requirement for openmpi and that the perl scripts could call mpiexec (if discovered via $PATH) to launch children. The requirement is probably coming from parafly, not sure what in ffindex needs it. And is anything in the perl scripts using MPI API? I also do not like that by default, 2 jobs are forked by default. Default should be CPU=1. Finally, somewhere is defined that only ORFs larger than 900nt are to be retained in results, I think the default is too strict for current NGS-based assemblies full of sequencing errors causing frameshifts. BTW, TransDecoder does not include in results "ORFs" broken by a frameshift, right? [Note: it is easier to ask then to test that myself. ;-)] Martin |
From: Brian H. <bh...@br...> - 2015-01-09 14:44:55
|
Hi all, In general, I prefer for the system to be retained as a fully self-contained unit. I really would *not* want any of the scripts to show up in /usr/bin, or for the perl libraries to contaminate (for lack of a better word) any site-perl libraries. Keeping everything self-contained within the one package allows one to easily move it around or delete it in its entirety. In this case, they simply need to have Transdecoder in their PATH, and ideally the whole thing just works. The one key issue that I'm happy to consider is what dependencies to bundle with the system vs. what dependencies require users to supply externally. In this case, my personal opinion is that the very commonly installed bioinformatics tools and very large software packages (ie. samtools, hmmer, bowtie, etc.) need not be bundled, but for other ones that may be less commonly installed and dependent within the software package (ie. ffindex, parafly, and possibly cdhit) or those that have specific version compatibility issues (ie. rsem in Trinity), I'm inclined to simply bundle them into the package to ease installation and better ensure overall system integrity. As long as those bundled packages remain inside the self-contained software package, and don't contaminate or replace other already-installed software tools, and are specifically leveraged by the driving software within the self-contained package (by internally adjusting PATH env var, or looking for tools via relative paths to the package installation directory), I think it should be both acceptable and non-disruptive. I should also mention that we don't and won't plan to bundle any software that has restrictive licensing issues nor that is not open source. Now - with that said - TransDecoder does need to be seriously overhauled with respect to its makefile, build mechanism, etc., and separately from that, it needs some additional enhancements to make it even more useful to users in the next release. So, after this next generation of the Trinity software goes out, we'll tackle TransDecoder and whip it into shape. But note that, unless there's some major shift in my method of operation with respect to software packaging issues, it'll fit the general description that I outline above. Alexie and I will work on it together, and we'll try to do what we can to address any lingering issues that Martin might raise. cheers, ~brian |
From: Martin M. <mmo...@gm...> - 2015-01-09 10:46:38
|
Hi Brian and Alexie, thank you for your answers. I made the following packages for Gentoo Linux. Here are some comments how the layout could be improved. 1. Gentoo picks TransDecoder_r20140704/3rd_party/ffindex-0.9.9.3 subdirectory contens from TransDecoder_r20140704.tar.gz and installs it as a separate package. Ideally, once I figure out where is the official source the source address will be changed. Somebody knowing transdecoder could document whether transdecoder requires a special version of this tool or not. 2. Gentoo picks TransDecoder_r20140704/3rd_party/parafly-r2013-01-21 subdirectory contens from TransDecoder_r20140704.tar.gz and installs it as a separate package. Passing extra arguments to configure cause a broken Makefile or config.h is created. In turn, gcc later on dies about unresolved MPI symbols because "-fopenmp" was not pulled into CFLAGS. I got around by passing only '--prefix=/usr' and deliberately discarding other arguments Gentoo applies on general (to specify for example CPU architecture, need for cross-compiling). I think it has to do with autoreconf being called for some reason but did not study this in more detail. Again, would be nice if parafly was available separately to download, and transdecoder documented which versions it eventually requires, ideally could check for that in configure. 3. Gentoo already has cd-hit package, like other distro's, again, installs the binaries into /usr/bin accessible via $PATH. 4. The main Makefile in TransDecoder_r20140704/ is after the above changes not necessary. Fetching Pfam 1.5TB data files and their indexing can be done by a standalone shell-script and the rest of the Makefile was just compiling and installing the 3rd_party tools. That is not needed for us now so the automated system drops the Makefile altogether. But how to install the thingie at all? What was missing was where to and how to install the transdecoder files. I placed 'TransDecoder *.pl util/*.pl util/*.sh' files into /usr/bin now. I have no idea, but maybe of them could be more hidden in some other path, like /usr/share/TransDecoder if they are not to be directly executed by end-user. Here is their current listing: /usr/bin/TransDecoder /usr/bin/cdna_alignment_orf_to_genome_orf.pl /usr/bin/compute_base_probs.pl /usr/bin/cufflinks_gtf_genome_to_cdna_fasta.pl /usr/bin/cufflinks_gtf_to_alignment_gff3.pl /usr/bin/cufflinks_gtf_to_bed.pl /usr/bin/ffindex_gather.sh /usr/bin/ffindex_resume.pl /usr/bin/gene_list_to_gff.pl /usr/bin/get_top_longest_fasta_entries.pl /usr/bin/gff3_file_to_bed.pl /usr/bin/gff3_file_to_proteins.pl /usr/bin/index_gff3_files_by_isoform.pl /usr/bin/nr_ORFs_gff3.pl /usr/bin/pfam_mpi.sh /usr/bin/pfam_runner.pl /usr/bin/remove_eclipsed_ORFs.pl /usr/bin/score_CDS_liklihood_all_6_frames.pl /usr/bin/seq_n_baseprobs_to_logliklihood_vals.pl Finally, 'PerlLib/*.pm' gets placed now into e.g. /usr/lib64/perl5/vendor_perl/5.16.3/ but it would be great if the *.pl tools could look for PERL5INC/TransDecoder/ instead, that is, to anticipate the modules in a subdirectory. Just to avoid filename clashes on the filesystem. Current layout is here: /usr/lib64/perl5/vendor_perl/5.16.3/Fasta_reader.pm /usr/lib64/perl5/vendor_perl/5.16.3/GFF3_utils.pm /usr/lib64/perl5/vendor_perl/5.16.3/Gene_obj.pm /usr/lib64/perl5/vendor_perl/5.16.3/Gene_obj_indexer.pm /usr/lib64/perl5/vendor_perl/5.16.3/Longest_orf.pm /usr/lib64/perl5/vendor_perl/5.16.3/Nuc_translator.pm /usr/lib64/perl5/vendor_perl/5.16.3/TiedHash.pm In summary, Gentoo Linux users can install cd-hit, ffindex, parafly and transdecoder from science overlay now. They may go into the official, main tree after some testing. Debian and other distro's sometimes take over the logic from Gentoo to build their binary packages. Gentoo stays with compilation from source code. Maybe other distro developers will contact you about the same: split out the 3rd-party apckages, document whether transdecoder requires a special version of any of them, to make it possible to install TransDecoder into a custom path. That usually boils down to Makefile containing $(DESTDIR) variable placed at the beginning of every target installation path, which is derived from the configure.ac and configure.in files, respectively. That allows one to install files into a *temporary* location like /tmp/blah/usr via "make install DESTDIR=/tmp/blah, and the files are guaranteed to work once extracted from the package tarball and placed into /usr. I occasionally contribute some packages to Gentoo but I cannot fix every package, one needs to know how the tools call each other, what version they require, etc. So, it is easier to contact authors and provide feedback what could be adjusted to make the package more easily integrated into a distribution. With best regards, -- Martin Mokrejs, Ph.D. Adapter/artefact removal from datasets based on the following technologies: 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina http://www.bioinformatics.cz/software/supported-protocols/ Ale...@cs... wrote: > Hi Martin > > > > Trinity and siblings are community open-source software. They are also not funded by any funding body to have professional developers on the the payroll, we do it because we care JSo, with the exception of responding to requests from biologists who can’t code, our interest is to invite clever computational people such as yourself to implement the fixes they suggest. > > > > Soon we will be on github and you will be able to fork and do a pull request but in the meantime just email us a patch from the current sourceforge HEAD. If Brian is happy he may just give you direct write access to the repository. > > > > Thanks! > > a > > > > *From:*Brian Haas [mailto:bh...@br...] > *Sent:* Friday, January 09, 2015 07:48 AM > *To:* Martin MOKREJŠ > *Cc:* tri...@li...; tra...@li... > *Subject:* Re: [Transdecoder-users] Announcement: Transdecoder release r20140704 > > > > Hi Martin, > > > > Not yet... after I get the next Trinity release out, I'll work on updating transdecoder. I don't think it's going to meet your expectations as far as how the build goes, but it'll be better than it is now. We can certainly continue discussions before the release goes out. I'll check with you beforehand. > > > > best, > > > > ~b > > > > On Thu, Jan 8, 2015 at 3:42 PM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: > > Hi Brian, > I just returned to this and ... I don't see a never transdecoder release at sourceforge. So, did you get to splitting the packages out or not yet? > Thank you, > Martin > > Brian Haas wrote: >> Hi Martin, >> >> responses below >> >> >> On Fri, Jul 4, 2014 at 9:50 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...> <mailto:mmo...@gm... <mailto:mmo...@gm...>>> wrote: >> >> >> >> Brian Haas wrote: >> > Thanks, Martin. I've CC'd the trinity-developers list. We'll take all your comments into consideration. It'll be some time before it all meets your expectations (if that's even an option for us). For now, our (perhaps my) goals have been quite simple: keep everything self-contained and minimize dependencies. I entirely agree about that evil wget (which is one of the reasons why I put the 'make simple' in there). This version will be incorporated as a plugin in the upcoming Trinity, along with Jellyfish-2 (even if currently to your dismay), and Trinity will do the 'make simple' to avoid pulling down pfam via wget. >> >> Sure, take your time. Also, the packaging is an issue due to many different LICENSEs used in all the bundled tools. Just split it all into sub-packages, that's the only way. >> >> >> Right... that's definitely a cause for concern, which is why we keep the third-party code isolated wherever possible, and should have a note in there somewhere to follow the different licenses for the plugins. We only leverage code that has very lenient licensing, as we do for our code. >> >> >> >> So where does trinityrnaseq_r20140413p1/trinity-plugins/jellyfish-1.1.11 come from? It doesn't seem to be from http://www.genome.umd.edu/jellyfish.html or ftp://ftp.genome.umd.edu/pub/jellyfish/jellyfish-2.1.3.tar.gz >> >> >> It should have come from the jellyfish website. In the upcoming Trinity release, we're just going to bundle in the .tar.gz files for the code, and have the makefile do the unpack/build as part of the Trinity build. This will go for RSEM-2.15 as well. >> >> One of the key issues here is that there are some versions of the tools where the usage has changed significantly (both the latest rsem and jellyfish), so that our new version of Trinity will only be readily compatible with those specific versions - which is why they get bundled in, and Trinity will look in the plugins area to find what it needs. >> >> After we get the next Trinity, Trinotate, and PASA releases out, we'll rethink our bundling strategy. >> >> best, >> >> ~b >> >> >> Martin >> >> >> > >> > cheers, >> > >> > ~brian >> > >> > > >> > On Fri, Jul 4, 2014 at 8:46 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...> <mailto:mmo...@gm... <mailto:mmo...@gm...>> <mailto:mmo...@gm... <mailto:mmo...@gm...> <mailto:mmo...@gm... <mailto:mmo...@gm...>>>> wrote: >> > >> > Brian Haas wrote: >> > > Greetings all, >> > > >> > > The latest release of TransDecoder is now available: >> > > >> > > http://sourceforge.net/projects/transdecoder/files/TransDecoder_r20140704.tar.gz/download >> > > >> > > including minor changes from the previous release to ensure better compatibility with other projects, including Trinity, PASA, and Trinotate >> > > >> > > Release notes: >> > > >> > > -added 'make simple' to build just the essential components involving parafly and cdhit >> > > >> > > -removed the 'cds.' prefix from the pep and cds sequence accessions. >> > >> > >> > Hi Brian, >> > I just tested the new and have some comments: >> > >> > 1. In the past the files were tar.bz2 instead of tar.gz as of now. It helps distro maintainers if the URLs and filenames remain stable. It is also a common habit that if one unpacks MyApp-2.4c.tar.gz that it extracts into MyApp-2.4c/ subdirectory. Although it seems your today's archive file complies with this I think trinity does not and not sure how long will it last. ;) >> > >> > 2. It is evil that the "make compile" step runs wget to download 1.4GB large PFAM file. Please put it under different "target" in your Makefile's. Not only, I already have the files on my system and I certainly do not want to waste my bandwidth. >> > >> > 3. I would like to add this to Gentoo Linux but that won't ever be allowed if the package is huge glue of other tools. For example, I have already cd-hit installed and installing TransDecoder would try to overwrite existing files, and will be denied. If you would like to get the package accepted into Linux distros and save developers time resolving the knotted layout, please introduce some configure- or Makefile-based checks and bail out if they are not installed. You can keep the crazy layout/setup as an alternative for users who think this is the right way to go (while it is not). >> > >> > 4. I wanted to post trinityrnaseq-users list about this but ... it is confusing that trinity and transdecoders place overlapping 3rd-party stuff under its own source tree. The http://trinityrnaseq.sourceforge.net/#installation page it totally quiet how all there hidden obstackles. I recommend you to sum up a simple listing of required/optional tools, their versions and URLs. If possible, drop them from the TransDecoder_r20140704/3rd_party and also from the plugins subdirectory somewhere under trinity*. >> > >> > 5. In the current setup, both transdecoder, trinity are un-manageable for a Linux distro. One cannot force some version dependencies, the tools download what they want to on their own instead of just running a compiler ... and tehy overwrite other applications files. >> > >> > 6. BTW, I realized quorum package looks for jellyfish-1.11 while on the web I found only jellyfish-2.x. Incidentally, I see jellyfish-1.11 under trinity*. Huh. Would you please tell me: whether trinity uses an "old" jellyfish version of teh same package? Or is that that incidentally same name? Why can't trinity use jellyfish-2.x installed already on the system. >> > >> > I wish it helps you and other devs to cleanup the interesting package, though I did not get to install it yet. >> > >> > Thank you, >> > Martin > > > > > > -- > > -- > Brian J. Haas > The Broad Institute > http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> > > > |
From: <Ale...@cs...> - 2015-01-09 06:36:00
|
Hi Martin Trinity and siblings are community open-source software. They are also not funded by any funding body to have professional developers on the the payroll, we do it because we care ☺ So, with the exception of responding to requests from biologists who can’t code, our interest is to invite clever computational people such as yourself to implement the fixes they suggest. Soon we will be on github and you will be able to fork and do a pull request but in the meantime just email us a patch from the current sourceforge HEAD. If Brian is happy he may just give you direct write access to the repository. Thanks! a From: Brian Haas [mailto:bh...@br...] Sent: Friday, January 09, 2015 07:48 AM To: Martin MOKREJŠ Cc: tri...@li...; tra...@li... Subject: Re: [Transdecoder-users] Announcement: Transdecoder release r20140704 Hi Martin, Not yet... after I get the next Trinity release out, I'll work on updating transdecoder. I don't think it's going to meet your expectations as far as how the build goes, but it'll be better than it is now. We can certainly continue discussions before the release goes out. I'll check with you beforehand. best, ~b On Thu, Jan 8, 2015 at 3:42 PM, Martin MOKREJŠ <mmo...@gm...<mailto:mmo...@gm...>> wrote: Hi Brian, I just returned to this and ... I don't see a never transdecoder release at sourceforge. So, did you get to splitting the packages out or not yet? Thank you, Martin Brian Haas wrote: > Hi Martin, > > responses below > > > On Fri, Jul 4, 2014 at 9:50 AM, Martin MOKREJŠ <mmo...@gm...<mailto:mmo...@gm...> <mailto:mmo...@gm...<mailto:mmo...@gm...>>> wrote: > > > > Brian Haas wrote: > > Thanks, Martin. I've CC'd the trinity-developers list. We'll take all your comments into consideration. It'll be some time before it all meets your expectations (if that's even an option for us). For now, our (perhaps my) goals have been quite simple: keep everything self-contained and minimize dependencies. I entirely agree about that evil wget (which is one of the reasons why I put the 'make simple' in there). This version will be incorporated as a plugin in the upcoming Trinity, along with Jellyfish-2 (even if currently to your dismay), and Trinity will do the 'make simple' to avoid pulling down pfam via wget. > > Sure, take your time. Also, the packaging is an issue due to many different LICENSEs used in all the bundled tools. Just split it all into sub-packages, that's the only way. > > > Right... that's definitely a cause for concern, which is why we keep the third-party code isolated wherever possible, and should have a note in there somewhere to follow the different licenses for the plugins. We only leverage code that has very lenient licensing, as we do for our code. > > > > So where does trinityrnaseq_r20140413p1/trinity-plugins/jellyfish-1.1.11 come from? It doesn't seem to be from http://www.genome.umd.edu/jellyfish.html or ftp://ftp.genome.umd.edu/pub/jellyfish/jellyfish-2.1.3.tar.gz > > > It should have come from the jellyfish website. In the upcoming Trinity release, we're just going to bundle in the .tar.gz files for the code, and have the makefile do the unpack/build as part of the Trinity build. This will go for RSEM-2.15 as well. > > One of the key issues here is that there are some versions of the tools where the usage has changed significantly (both the latest rsem and jellyfish), so that our new version of Trinity will only be readily compatible with those specific versions - which is why they get bundled in, and Trinity will look in the plugins area to find what it needs. > > After we get the next Trinity, Trinotate, and PASA releases out, we'll rethink our bundling strategy. > > best, > > ~b > > > Martin > > > > > > cheers, > > > > ~brian > > > > > > On Fri, Jul 4, 2014 at 8:46 AM, Martin MOKREJŠ <mmo...@gm...<mailto:mmo...@gm...> <mailto:mmo...@gm...<mailto:mmo...@gm...>> <mailto:mmo...@gm...<mailto:mmo...@gm...> <mailto:mmo...@gm...<mailto:mmo...@gm...>>>> wrote: > > > > Brian Haas wrote: > > > Greetings all, > > > > > > The latest release of TransDecoder is now available: > > > > > > http://sourceforge.net/projects/transdecoder/files/TransDecoder_r20140704.tar.gz/download > > > > > > including minor changes from the previous release to ensure better compatibility with other projects, including Trinity, PASA, and Trinotate > > > > > > Release notes: > > > > > > -added 'make simple' to build just the essential components involving parafly and cdhit > > > > > > -removed the 'cds.' prefix from the pep and cds sequence accessions. > > > > > > Hi Brian, > > I just tested the new and have some comments: > > > > 1. In the past the files were tar.bz2 instead of tar.gz as of now. It helps distro maintainers if the URLs and filenames remain stable. It is also a common habit that if one unpacks MyApp-2.4c.tar.gz that it extracts into MyApp-2.4c/ subdirectory. Although it seems your today's archive file complies with this I think trinity does not and not sure how long will it last. ;) > > > > 2. It is evil that the "make compile" step runs wget to download 1.4GB large PFAM file. Please put it under different "target" in your Makefile's. Not only, I already have the files on my system and I certainly do not want to waste my bandwidth. > > > > 3. I would like to add this to Gentoo Linux but that won't ever be allowed if the package is huge glue of other tools. For example, I have already cd-hit installed and installing TransDecoder would try to overwrite existing files, and will be denied. If you would like to get the package accepted into Linux distros and save developers time resolving the knotted layout, please introduce some configure- or Makefile-based checks and bail out if they are not installed. You can keep the crazy layout/setup as an alternative for users who think this is the right way to go (while it is not). > > > > 4. I wanted to post trinityrnaseq-users list about this but ... it is confusing that trinity and transdecoders place overlapping 3rd-party stuff under its own source tree. The http://trinityrnaseq.sourceforge.net/#installation page it totally quiet how all there hidden obstackles. I recommend you to sum up a simple listing of required/optional tools, their versions and URLs. If possible, drop them from the TransDecoder_r20140704/3rd_party and also from the plugins subdirectory somewhere under trinity*. > > > > 5. In the current setup, both transdecoder, trinity are un-manageable for a Linux distro. One cannot force some version dependencies, the tools download what they want to on their own instead of just running a compiler ... and tehy overwrite other applications files. > > > > 6. BTW, I realized quorum package looks for jellyfish-1.11 while on the web I found only jellyfish-2.x. Incidentally, I see jellyfish-1.11 under trinity*. Huh. Would you please tell me: whether trinity uses an "old" jellyfish version of teh same package? Or is that that incidentally same name? Why can't trinity use jellyfish-2.x installed already on the system. > > > > I wish it helps you and other devs to cleanup the interesting package, though I did not get to install it yet. > > > > Thank you, > > Martin -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas<http://broad.mit.edu/~bhaas> |
From: Brian H. <bh...@br...> - 2015-01-08 20:48:37
|
Hi Martin, Not yet... after I get the next Trinity release out, I'll work on updating transdecoder. I don't think it's going to meet your expectations as far as how the build goes, but it'll be better than it is now. We can certainly continue discussions before the release goes out. I'll check with you beforehand. best, ~b On Thu, Jan 8, 2015 at 3:42 PM, Martin MOKREJŠ <mmo...@gm...> wrote: > Hi Brian, > I just returned to this and ... I don't see a never transdecoder release > at sourceforge. So, did you get to splitting the packages out or not yet? > Thank you, > Martin > > Brian Haas wrote: > > Hi Martin, > > > > responses below > > > > > > On Fri, Jul 4, 2014 at 9:50 AM, Martin MOKREJŠ <mmo...@gm... > <mailto:mmo...@gm...>> wrote: > > > > > > > > Brian Haas wrote: > > > Thanks, Martin. I've CC'd the trinity-developers list. We'll > take all your comments into consideration. It'll be some time before it > all meets your expectations (if that's even an option for us). For now, > our (perhaps my) goals have been quite simple: keep everything > self-contained and minimize dependencies. I entirely agree about that evil > wget (which is one of the reasons why I put the 'make simple' in there). > This version will be incorporated as a plugin in the upcoming Trinity, > along with Jellyfish-2 (even if currently to your dismay), and Trinity will > do the 'make simple' to avoid pulling down pfam via wget. > > > > Sure, take your time. Also, the packaging is an issue due to many > different LICENSEs used in all the bundled tools. Just split it all into > sub-packages, that's the only way. > > > > > > Right... that's definitely a cause for concern, which is why we keep > the third-party code isolated wherever possible, and should have a note in > there somewhere to follow the different licenses for the plugins. We only > leverage code that has very lenient licensing, as we do for our code. > > > > > > > > So where does > trinityrnaseq_r20140413p1/trinity-plugins/jellyfish-1.1.11 come from? It > doesn't seem to be from http://www.genome.umd.edu/jellyfish.html or > ftp://ftp.genome.umd.edu/pub/jellyfish/jellyfish-2.1.3.tar.gz > > > > > > It should have come from the jellyfish website. In the upcoming > Trinity release, we're just going to bundle in the .tar.gz files for the > code, and have the makefile do the unpack/build as part of the Trinity > build. This will go for RSEM-2.15 as well. > > > > One of the key issues here is that there are some versions of the tools > where the usage has changed significantly (both the latest rsem and > jellyfish), so that our new version of Trinity will only be readily > compatible with those specific versions - which is why they get bundled in, > and Trinity will look in the plugins area to find what it needs. > > > > After we get the next Trinity, Trinotate, and PASA releases out, we'll > rethink our bundling strategy. > > > > best, > > > > ~b > > > > > > Martin > > > > > > > > > > cheers, > > > > > > ~brian > > > > > > > > > On Fri, Jul 4, 2014 at 8:46 AM, Martin MOKREJŠ <mmo...@gm... > <mailto:mmo...@gm...> <mailto:mmo...@gm... <mailto: > mmo...@gm...>>> wrote: > > > > > > Brian Haas wrote: > > > > Greetings all, > > > > > > > > The latest release of TransDecoder is now available: > > > > > > > > > http://sourceforge.net/projects/transdecoder/files/TransDecoder_r20140704.tar.gz/download > > > > > > > > including minor changes from the previous release to ensure > better compatibility with other projects, including Trinity, PASA, and > Trinotate > > > > > > > > Release notes: > > > > > > > > -added 'make simple' to build just the essential components > involving parafly and cdhit > > > > > > > > -removed the 'cds.' prefix from the pep and cds sequence > accessions. > > > > > > > > > Hi Brian, > > > I just tested the new and have some comments: > > > > > > 1. In the past the files were tar.bz2 instead of tar.gz as of > now. It helps distro maintainers if the URLs and filenames remain stable. > It is also a common habit that if one unpacks MyApp-2.4c.tar.gz that it > extracts into MyApp-2.4c/ subdirectory. Although it seems your today's > archive file complies with this I think trinity does not and not sure how > long will it last. ;) > > > > > > 2. It is evil that the "make compile" step runs wget to > download 1.4GB large PFAM file. Please put it under different "target" in > your Makefile's. Not only, I already have the files on my system and I > certainly do not want to waste my bandwidth. > > > > > > 3. I would like to add this to Gentoo Linux but that won't > ever be allowed if the package is huge glue of other tools. For example, I > have already cd-hit installed and installing TransDecoder would try to > overwrite existing files, and will be denied. If you would like to get the > package accepted into Linux distros and save developers time resolving the > knotted layout, please introduce some configure- or Makefile-based checks > and bail out if they are not installed. You can keep the crazy layout/setup > as an alternative for users who think this is the right way to go (while it > is not). > > > > > > 4. I wanted to post trinityrnaseq-users list about this but > ... it is confusing that trinity and transdecoders place overlapping > 3rd-party stuff under its own source tree. The > http://trinityrnaseq.sourceforge.net/#installation page it totally quiet > how all there hidden obstackles. I recommend you to sum up a simple listing > of required/optional tools, their versions and URLs. If possible, drop them > from the TransDecoder_r20140704/3rd_party and also from the plugins > subdirectory somewhere under trinity*. > > > > > > 5. In the current setup, both transdecoder, trinity are > un-manageable for a Linux distro. One cannot force some version > dependencies, the tools download what they want to on their own instead of > just running a compiler ... and tehy overwrite other applications files. > > > > > > 6. BTW, I realized quorum package looks for jellyfish-1.11 > while on the web I found only jellyfish-2.x. Incidentally, I see > jellyfish-1.11 under trinity*. Huh. Would you please tell me: whether > trinity uses an "old" jellyfish version of teh same package? Or is that > that incidentally same name? Why can't trinity use jellyfish-2.x installed > already on the system. > > > > > > I wish it helps you and other devs to cleanup the interesting > package, though I did not get to install it yet. > > > > > > Thank you, > > > Martin > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
From: Martin M. <mmo...@gm...> - 2015-01-08 20:43:14
|
Hi Brian, I just returned to this and ... I don't see a never transdecoder release at sourceforge. So, did you get to splitting the packages out or not yet? Thank you, Martin Brian Haas wrote: > Hi Martin, > > responses below > > > On Fri, Jul 4, 2014 at 9:50 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: > > > > Brian Haas wrote: > > Thanks, Martin. I've CC'd the trinity-developers list. We'll take all your comments into consideration. It'll be some time before it all meets your expectations (if that's even an option for us). For now, our (perhaps my) goals have been quite simple: keep everything self-contained and minimize dependencies. I entirely agree about that evil wget (which is one of the reasons why I put the 'make simple' in there). This version will be incorporated as a plugin in the upcoming Trinity, along with Jellyfish-2 (even if currently to your dismay), and Trinity will do the 'make simple' to avoid pulling down pfam via wget. > > Sure, take your time. Also, the packaging is an issue due to many different LICENSEs used in all the bundled tools. Just split it all into sub-packages, that's the only way. > > > Right... that's definitely a cause for concern, which is why we keep the third-party code isolated wherever possible, and should have a note in there somewhere to follow the different licenses for the plugins. We only leverage code that has very lenient licensing, as we do for our code. > > > > So where does trinityrnaseq_r20140413p1/trinity-plugins/jellyfish-1.1.11 come from? It doesn't seem to be from http://www.genome.umd.edu/jellyfish.html or ftp://ftp.genome.umd.edu/pub/jellyfish/jellyfish-2.1.3.tar.gz > > > It should have come from the jellyfish website. In the upcoming Trinity release, we're just going to bundle in the .tar.gz files for the code, and have the makefile do the unpack/build as part of the Trinity build. This will go for RSEM-2.15 as well. > > One of the key issues here is that there are some versions of the tools where the usage has changed significantly (both the latest rsem and jellyfish), so that our new version of Trinity will only be readily compatible with those specific versions - which is why they get bundled in, and Trinity will look in the plugins area to find what it needs. > > After we get the next Trinity, Trinotate, and PASA releases out, we'll rethink our bundling strategy. > > best, > > ~b > > > Martin > > > > > > cheers, > > > > ~brian > > > > > > On Fri, Jul 4, 2014 at 8:46 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...> <mailto:mmo...@gm... <mailto:mmo...@gm...>>> wrote: > > > > Brian Haas wrote: > > > Greetings all, > > > > > > The latest release of TransDecoder is now available: > > > > > > http://sourceforge.net/projects/transdecoder/files/TransDecoder_r20140704.tar.gz/download > > > > > > including minor changes from the previous release to ensure better compatibility with other projects, including Trinity, PASA, and Trinotate > > > > > > Release notes: > > > > > > -added 'make simple' to build just the essential components involving parafly and cdhit > > > > > > -removed the 'cds.' prefix from the pep and cds sequence accessions. > > > > > > Hi Brian, > > I just tested the new and have some comments: > > > > 1. In the past the files were tar.bz2 instead of tar.gz as of now. It helps distro maintainers if the URLs and filenames remain stable. It is also a common habit that if one unpacks MyApp-2.4c.tar.gz that it extracts into MyApp-2.4c/ subdirectory. Although it seems your today's archive file complies with this I think trinity does not and not sure how long will it last. ;) > > > > 2. It is evil that the "make compile" step runs wget to download 1.4GB large PFAM file. Please put it under different "target" in your Makefile's. Not only, I already have the files on my system and I certainly do not want to waste my bandwidth. > > > > 3. I would like to add this to Gentoo Linux but that won't ever be allowed if the package is huge glue of other tools. For example, I have already cd-hit installed and installing TransDecoder would try to overwrite existing files, and will be denied. If you would like to get the package accepted into Linux distros and save developers time resolving the knotted layout, please introduce some configure- or Makefile-based checks and bail out if they are not installed. You can keep the crazy layout/setup as an alternative for users who think this is the right way to go (while it is not). > > > > 4. I wanted to post trinityrnaseq-users list about this but ... it is confusing that trinity and transdecoders place overlapping 3rd-party stuff under its own source tree. The http://trinityrnaseq.sourceforge.net/#installation page it totally quiet how all there hidden obstackles. I recommend you to sum up a simple listing of required/optional tools, their versions and URLs. If possible, drop them from the TransDecoder_r20140704/3rd_party and also from the plugins subdirectory somewhere under trinity*. > > > > 5. In the current setup, both transdecoder, trinity are un-manageable for a Linux distro. One cannot force some version dependencies, the tools download what they want to on their own instead of just running a compiler ... and tehy overwrite other applications files. > > > > 6. BTW, I realized quorum package looks for jellyfish-1.11 while on the web I found only jellyfish-2.x. Incidentally, I see jellyfish-1.11 under trinity*. Huh. Would you please tell me: whether trinity uses an "old" jellyfish version of teh same package? Or is that that incidentally same name? Why can't trinity use jellyfish-2.x installed already on the system. > > > > I wish it helps you and other devs to cleanup the interesting package, though I did not get to install it yet. > > > > Thank you, > > Martin |
From: Irantzu A. <ira...@gm...> - 2014-12-16 11:12:52
|
Hi all, I've recently started using TransDecoder, and I've a general question, sorry if it is very basic. In the final output of transdecoder, the .pep file has a total of 200 peptides, while my initial transcript.fasta file has 375. I know that this is probably because these 175 remaining transcripts have not ORFs. Or maybe they've but they not accomplish the minimum length of open reading frame. *Questions:* *1)* How much is this minimum length of ORF? *2)* Is there any other reason for not having the CDS/peptide hit? Some of the transcripts in transcripts.fasta file are small, 100-200 bp, maybe this is affecting somehow? *3)* For other hand, to produce the transcripts.fasta file, I've used a gtf file with gffread software. Is this OK? Because in TransDecoder webpage, I have read this part: "Starting from a genome-based transcript structure GTF file (eg. cufflinks)", and it says to convert gtf to gff3 ...etc. But, the transcript fasta sequence should be the same despite if you get it from GTF or GFF3, isn't it? Thanks in advance, Irantzu -- *Irantzu Anzar* M.Sc. in Bioinformatics, Autonomous University of Barcelona, Spain B.Sc. in Biotechnology, University of León, Spain |
From: Brian H. <bh...@br...> - 2014-11-01 01:07:26
|
Hi Gisele, If you're running this on a mac, then you might need to install a new version of GCC. I ended up building gcc 4.9 from source, and it works quite well (but it definitely takes some time and effort). If you have access to a linux server, it would definitely be easier to just build it and run it there. best, ~brian On Fri, Oct 31, 2014 at 2:35 PM, Gisele Antoniazzi Cardoso < gi_...@ya...> wrote: > Hi everyone! > > I need some help with transcriptdecoder. I am trying to do 'make simple', > but I always get this error: > > clang: warning: argument unused during compilation: '-fopenmp' > ParaFly.cpp:6:10: fatal error: 'omp.h' file not found > #include <omp.h> > ^ > 1 error generated. > make: *** [ParaFly.o] Error 1 > > Any ideias how to solve this? > > Bets regards, > Gisele > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: Gisele A. C. <gi_...@ya...> - 2014-10-31 18:49:12
|
Hi everyone! I need some help with transcriptdecoder. I am trying to do 'make simple', but I always get this error: clang: warning: argument unused during compilation: '-fopenmp' ParaFly.cpp:6:10: fatal error: 'omp.h' file not found #include <omp.h> ^ 1 error generated. make: *** [ParaFly.o] Error 1 Any ideias how to solve this? Bets regards,Gisele |
From: Brian H. <bh...@br...> - 2014-10-29 13:40:16
|
and the gtf or gff3 formats are described here: http://www.ensembl.org/info/website/upload/gff.html or http://www.sequenceontology.org/resources/gff3.html best, ~brian On Wed, Oct 29, 2014 at 12:40 AM, Elasady, Summer <sr...@st...> wrote: > Hi- > > I asked Brian to decipher the output, and here is what he told me: > > Hi Summer, > > I'm not sure it's described anywhere yet... > > There appears to be some redundant info in the header. The important parts > are: > > >cds.comp1000092_c0_seq1|m.177654 type:internal len:250 (+) > comp1000092_c0_seq1:1-747(+) > > which can be broken down as: > > protein accession: cds.comp1000092_c0_seq1|m.177654 > so, orf m.177654 on trinity transcript comp1000092_c0_seq1 > > type: internal indicates that the transcript can be translated from > beginning to end of the trinity transcript sequence (no start codon or stop > codon detected). Alternatively, this might indicate complete (full ORF), > 3prime partial (missing stop codon) or 5prime partial (missing start codon) > > The length of the orf is 250 amino acids > and the translation is done from the range of 1-747 on the trinity > transcript, in the '+' orientation. > > > Sent from my iPhone > > > On Oct 28, 2014, at 5:38 PM, Rebekah Ruth Starks < > rst...@em...> wrote: > > > > Good Afternoon, > > > > My name is Rebekah and I am a first year pH D student. This is my first > time using transDecoder and I am having a difficult time understanding the > output and what each column corresponds to. Is there a place on the website > that describes the output better? Thank you or your time. Have a great day! > > > > Bekah > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Transdecoder-users mailing list > > Tra...@li... > > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > > > ------------------------------------------------------------------------------ > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: Elasady, S. <sr...@st...> - 2014-10-29 04:57:32
|
Hi- I asked Brian to decipher the output, and here is what he told me: Hi Summer, I'm not sure it's described anywhere yet... There appears to be some redundant info in the header. The important parts are: >cds.comp1000092_c0_seq1|m.177654 type:internal len:250 (+) comp1000092_c0_seq1:1-747(+) which can be broken down as: protein accession: cds.comp1000092_c0_seq1|m.177654 so, orf m.177654 on trinity transcript comp1000092_c0_seq1 type: internal indicates that the transcript can be translated from beginning to end of the trinity transcript sequence (no start codon or stop codon detected). Alternatively, this might indicate complete (full ORF), 3prime partial (missing stop codon) or 5prime partial (missing start codon) The length of the orf is 250 amino acids and the translation is done from the range of 1-747 on the trinity transcript, in the '+' orientation. Sent from my iPhone > On Oct 28, 2014, at 5:38 PM, Rebekah Ruth Starks <rst...@em...> wrote: > > Good Afternoon, > > My name is Rebekah and I am a first year pH D student. This is my first time using transDecoder and I am having a difficult time understanding the output and what each column corresponds to. Is there a place on the website that describes the output better? Thank you or your time. Have a great day! > > Bekah > ------------------------------------------------------------------------------ > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users |
From: Rebekah R. S. <rst...@em...> - 2014-10-28 22:38:30
|
Good Afternoon, My name is Rebekah and I am a first year pH D student. This is my first time using transDecoder and I am having a difficult time understanding the output and what each column corresponds to. Is there a place on the website that describes the output better? Thank you or your time. Have a great day! Bekah |
From: Brian H. <bh...@br...> - 2014-10-03 09:26:11
|
Hi Suhaila, It appears that our gtf-to-gff3 converter isn't compatible with the unknown strand '.' designation and simply converts them all to '-'. So, for now, the cufflinks compatibility is restricted to using strand-specific rna-seq, where all transcripts have a defined orientation. We'll aim to address this in a future release. best, ~brian On Fri, Oct 3, 2014 at 2:59 AM, Suhaila Sulaiman < suh...@gm...> wrote: > Hi Brian, > > Good then. I've tried using cufflinks output (.gtf) and convert to gff3. > In gtf file, the strand column show a '.' there for all of the transcripts. > When I converted it to gff3, most of them show '-' in strand column. How > does the script know which strand the transcripts in? Can you hel me on > that? > > Regards, > > > |
From: Brian H. <bh...@br...> - 2014-10-02 14:30:42
|
Hi Suhaila, I don't think there's a hard limit for what you can select as a parameter value, but note the false discovery rate is bound to go up exponentially as you start lowering the min length. best, ~brian On Thu, Oct 2, 2014 at 10:27 AM, Suhaila Sulaiman < suh...@gm...> wrote: > Hi Brian, > > Thanks for your prompt reply. Really helpful. > > One more question here. In TransDecode, default length of ORFs predicted > is 100 amino acids (-m parameter right?). Is there any minimum number of > length that the software can accept? In some cases, I need to search for > small ORF which is less than 80 residues. Can it goes until 10 residues or > 20 residues? > > Regards, > > Suhaila. > > On Wed, Oct 1, 2014 at 6:04 PM, Brian Haas <bh...@br...> > wrote: > >> Hi Suhaila, >> >> responses below: >> >> On Tue, Sep 30, 2014 at 11:09 PM, Suhaila Sulaiman < >> suh...@gm...> wrote: >> >>> Hi, >>> >>> I am Suhaila, a PhD student from Malaysia. I am writing this email to >>> ask something regarding TransDecoder program. Sorry as I just found the >>> program, so the questions would be very basic though. >>> >>> 1. Let say I have RNA-seq data, I have assembled using Tophat and >>> Cufflinks, so I can directly use TransDecoder to predict for genes in the >>> assembled transcripts, right? Then using the result from TransDecoder, I >>> can mapped back the predicted genes against the annotated genes in the ref >>> genome, am I right? >>> >>> Yes, transdecoder will predict coding regions within the >> cufflinks-defined transcripts. This is described in the web documentation >> at http://transdecoder.sf.net >> >> >> >>> >>> 1. >>> 2. Is there any publication of TransDecoder that I can cited in my >>> research? >>> >>> >>> >> It's best to cite the website, but you can also cite: >> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132/ >> where the algorithm is described in the supp. materials. >> >> best, >> >> ~brian >> >> >>> Looking forward for your reply, and much appreciated! >>> >>> Regards, >>> Suhaila S. >>> >> >> >> >> -- >> -- >> Brian J. Haas >> The Broad Institute >> http://broad.mit.edu/~bhaas >> >> >> > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: Suhaila S. <suh...@gm...> - 2014-10-02 14:28:27
|
Hi Brian, Thanks for your prompt reply. Really helpful. One more question here. In TransDecode, default length of ORFs predicted is 100 amino acids (-m parameter right?). Is there any minimum number of length that the software can accept? In some cases, I need to search for small ORF which is less than 80 residues. Can it goes until 10 residues or 20 residues? Regards, Suhaila. On Wed, Oct 1, 2014 at 6:04 PM, Brian Haas <bh...@br...> wrote: > Hi Suhaila, > > responses below: > > On Tue, Sep 30, 2014 at 11:09 PM, Suhaila Sulaiman < > suh...@gm...> wrote: > >> Hi, >> >> I am Suhaila, a PhD student from Malaysia. I am writing this email to ask >> something regarding TransDecoder program. Sorry as I just found the >> program, so the questions would be very basic though. >> >> 1. Let say I have RNA-seq data, I have assembled using Tophat and >> Cufflinks, so I can directly use TransDecoder to predict for genes in the >> assembled transcripts, right? Then using the result from TransDecoder, I >> can mapped back the predicted genes against the annotated genes in the ref >> genome, am I right? >> >> Yes, transdecoder will predict coding regions within the > cufflinks-defined transcripts. This is described in the web documentation > at http://transdecoder.sf.net > > > >> >> 1. >> 2. Is there any publication of TransDecoder that I can cited in my >> research? >> >> >> > It's best to cite the website, but you can also cite: > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132/ > where the algorithm is described in the supp. materials. > > best, > > ~brian > > >> Looking forward for your reply, and much appreciated! >> >> Regards, >> Suhaila S. >> > > > > -- > -- > Brian J. Haas > The Broad Institute > http://broad.mit.edu/~bhaas > > > |
From: Brian H. <bh...@br...> - 2014-10-01 10:04:44
|
Hi Suhaila, responses below: On Tue, Sep 30, 2014 at 11:09 PM, Suhaila Sulaiman < suh...@gm...> wrote: > Hi, > > I am Suhaila, a PhD student from Malaysia. I am writing this email to ask > something regarding TransDecoder program. Sorry as I just found the > program, so the questions would be very basic though. > > 1. Let say I have RNA-seq data, I have assembled using Tophat and > Cufflinks, so I can directly use TransDecoder to predict for genes in the > assembled transcripts, right? Then using the result from TransDecoder, I > can mapped back the predicted genes against the annotated genes in the ref > genome, am I right? > > Yes, transdecoder will predict coding regions within the cufflinks-defined transcripts. This is described in the web documentation at http://transdecoder.sf.net > > 1. > 2. Is there any publication of TransDecoder that I can cited in my > research? > > > It's best to cite the website, but you can also cite: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132/ where the algorithm is described in the supp. materials. best, ~brian > Looking forward for your reply, and much appreciated! > > Regards, > Suhaila S. > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: Grusz, A. <Gr...@si...> - 2014-09-29 13:38:27
|
Hi there, I attempted to run transcoder and received the following error message: Error, path to a required program (cd-hit-est) cannot be found To my knowledge, I have properly downloaded all of the relevant dependencies. Any suggestions would be helpful. Thanks, Amanda Grusz ----------------------------------------- Amanda L. Grusz, Ph.D. Postdoctoral Fellow MRC-166/Botany Smithsonian Institution P.O. Box 37012 Washington, D.C. 20013-7012 Web: www.duke.edu/~alg3<http://www.duke.edu/~alg3> |
From: Erik L. <eri...@bi...> - 2014-09-09 08:37:19
|
Thank you Alexie and Brian you for answers Erik Fra: Ale...@cs... [mailto:Ale...@cs...] Sendt: 9. september 2014 03:56 Til: bh...@br...; Erik Lysøe Kopi: tra...@li... Emne: RE: [Transdecoder-users] Transcoder question And I do the reverse (conduct the DE analysis on the ORFs) but that's because my approach is the same whether the genes come from Trinity or from genome annotations (PASA) so... Erik it's up to you either way your blast2go graphs will look the same :-) ________________________________ well if a cDNA contig has two ORFs, that means they are merged genes, no? Either as a computational artefact or due to a biological reason. Personally, I don't work with contigs. After Transdecoder finishes I use the mRNA and peptides (c.f. http://jamps.sourceforge.net which kinda replaces blast2go but is less well funded :-)) ________________________________ From: Erik Lysøe [eri...@bi...] Sent: Monday, 8 September 2014 9:21 PM To: tra...@li...<mailto:tra...@li...> Subject: [Transdecoder-users] Transcoder question ________________________________ From: Brian Haas [bh...@br...] Sent: Tuesday, 9 September 2014 11:25 AM To: Erik Lysøe Cc: tra...@li...<mailto:tra...@li...> Subject: Re: [Transdecoder-users] Transcoder question Hi Erik, I find it useful to combine all ORF annotations to annotate a given transcript (ie. the transcript inherits the annotations from all it's candidate peptides). Then, if a transcript is found to be interesting (ie. DE analysis), I would dig in further to determine which ORF is most relevant in that context. best, ~brian On Mon, Sep 8, 2014 at 7:21 AM, Erik Lysøe <eri...@bi...<mailto:eri...@bi...>> wrote: Dear developers About de novo transcriptome. I know blastx is very time consuming, and got a suggestion to first use CD-HIT EST (too reduce redundancy), then transdecoder to convert to peptides and then blastp against ref-seq protein database to speed up transcriptome annotation. So far so good, but I have some problems. How to you normally proceed after transdecoder, when many cDNA contigs has several ORFs? Then each cDNA contig could have several annotations. I usually import the blast xml to Blast2go, and would like to only use one (best hit) annotation per cDNA contig for further analysis of the transcriptome. Any suggestions? Cheers Erik ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Transdecoder-users mailing list Tra...@li...<mailto:Tra...@li...> https://lists.sourceforge.net/lists/listinfo/transdecoder-users -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: <Ale...@cs...> - 2014-09-09 02:08:41
|
And I do the reverse (conduct the DE analysis on the ORFs) but that's because my approach is the same whether the genes come from Trinity or from genome annotations (PASA) so... Erik it's up to you either way your blast2go graphs will look the same :-) ________________________________ From: Brian Haas [bh...@br...] Sent: Tuesday, 9 September 2014 11:25 AM To: Erik Lysøe Cc: tra...@li... Subject: Re: [Transdecoder-users] Transcoder question Hi Erik, I find it useful to combine all ORF annotations to annotate a given transcript (ie. the transcript inherits the annotations from all it's candidate peptides). Then, if a transcript is found to be interesting (ie. DE analysis), I would dig in further to determine which ORF is most relevant in that context. best, ~brian On Mon, Sep 8, 2014 at 7:21 AM, Erik Lysøe <eri...@bi...<mailto:eri...@bi...>> wrote: Dear developers About de novo transcriptome. I know blastx is very time consuming, and got a suggestion to first use CD-HIT EST (too reduce redundancy), then transdecoder to convert to peptides and then blastp against ref-seq protein database to speed up transcriptome annotation. So far so good, but I have some problems. How to you normally proceed after transdecoder, when many cDNA contigs has several ORFs? Then each cDNA contig could have several annotations. I usually import the blast xml to Blast2go, and would like to only use one (best hit) annotation per cDNA contig for further analysis of the transcriptome. Any suggestions? Cheers Erik ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Transdecoder-users mailing list Tra...@li...<mailto:Tra...@li...> https://lists.sourceforge.net/lists/listinfo/transdecoder-users -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: Brian H. <bh...@br...> - 2014-09-09 01:25:27
|
Hi Erik, I find it useful to combine all ORF annotations to annotate a given transcript (ie. the transcript inherits the annotations from all it's candidate peptides). Then, if a transcript is found to be interesting (ie. DE analysis), I would dig in further to determine which ORF is most relevant in that context. best, ~brian On Mon, Sep 8, 2014 at 7:21 AM, Erik Lysøe <eri...@bi...> wrote: > Dear developers > > > > About de novo transcriptome. I know blastx is very time consuming, and got > a suggestion to first use CD-HIT EST (too reduce redundancy), then > transdecoder to convert to peptides and then blastp against ref-seq protein > database to speed up transcriptome annotation. So far so good, but I have > some problems. How to you normally proceed after transdecoder, when many > cDNA contigs has several ORFs? Then each cDNA contig could have several > annotations. I usually import the blast xml to Blast2go, and would like to > only use one (best hit) annotation per cDNA contig for further analysis of > the transcriptome. > > > > Any suggestions? > > > > Cheers Erik > > > > > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
From: Erik L. <eri...@bi...> - 2014-09-08 11:34:56
|
Dear developers About de novo transcriptome. I know blastx is very time consuming, and got a suggestion to first use CD-HIT EST (too reduce redundancy), then transdecoder to convert to peptides and then blastp against ref-seq protein database to speed up transcriptome annotation. So far so good, but I have some problems. How to you normally proceed after transdecoder, when many cDNA contigs has several ORFs? Then each cDNA contig could have several annotations. I usually import the blast xml to Blast2go, and would like to only use one (best hit) annotation per cDNA contig for further analysis of the transcriptome. Any suggestions? Cheers Erik |
From: Brian H. <bh...@br...> - 2014-08-20 11:42:28
|
Thanks for the update! Glad to hear it's working Best, -Brian (by iPhone) > On Aug 20, 2014, at 5:29 AM, Jon Lees <jon...@gm...> wrote: > > Hi Brian > > Actually, apologies, it looks like Transdecoder is working fine, > a trans spliced gene had caused some problems in the transcriptome build, > generating a very large erroneous transcript sequence,, > > Thanks and best wishes > > Jon > > > > > >> On Tue, Aug 19, 2014 at 12:48 PM, Brian Haas <bh...@br...> wrote: >> OK - I look forward to learning more. >> >> thx, >> >> ~brian >> >> >> >>> On Tue, Aug 19, 2014 at 7:45 AM, Jon Lees <jon...@gm...> wrote: >>> Hi Brian >>> Yes I just tried that, and it looks like the documentation is up to date, >>> but its silently crashing at some point. >>> >>> Ive just started trying to debug to find out at which point its stops running >>> will let you know if I find anything >>> >>> thanks and >>> best wishes >>> >>> Jon >>> >>> >>> >>> >>>> On Tue, Aug 19, 2014 at 12:35 PM, Brian Haas <bh...@br...> wrote: >>>> HI Jon, >>>> >>>> The documentation could be out of date. Can you try running the sample data set through and see if it generates the expected output files? >>>> >>>> cd sample_data/ >>>> ./runMe.sh >>>> >>>> best, >>>> >>>> ~brian >>>> >>>> >>>> >>>> >>>>> On Tue, Aug 19, 2014 at 6:27 AM, Jon Lees <jon...@gm...> wrote: >>>>> Hi >>>>> >>>>> Ive run transdecoder a couple of times now, >>>>> >>>>> However it only generates the temporary folder >>>>> >>>>> with the three files: >>>>> longest_orfs.pep : all ORFs meeting the minimum length criteria, regardless of coding potential. >>>>> longest_orfs.gff3 : positions of all ORFs as found in the target transcripts >>>>> longest_orfs.cds : the nucleotide coding sequence for all detected ORFs >>>>> >>>>> no other files are generated, e.g.: >>>>> >>>>> """longest_orfs.cds.top_500_longest""" >>>>> >>>>> >>>>> or the final outputs files in the current working directory >>>>> >>>>> >>>>> e.g. """transcripts.fasta.transdecoder.pep""" >>>>> >>>>> >>>>> Is the documentation (http://transdecoder.sourceforge.net/) out of date, or is the transdecoder failing silently, I couldnt see any issues with memory usage etc. >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Jon >>>> >>>> >>>> >>>> -- >>>> -- >>>> Brian J. Haas >>>> The Broad Institute >>>> http://broad.mit.edu/~bhaas >> >> >> >> -- >> -- >> Brian J. Haas >> The Broad Institute >> http://broad.mit.edu/~bhaas > |