inchworm-users Mailing List for inchworm
Brought to you by:
bhaas
You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
(4) |
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
|
Feb
(11) |
Mar
(5) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
(3) |
Nov
|
Dec
|
From: Brian H. <bh...@br...> - 2011-10-14 19:52:17
|
Oh, I should also mention that I haven't had good luck with getting cufflinks to run on non-TopHat-generated alignments... There are some additional fields that are also required in the sam output, such as NM:i:\d+ and NH:i:\d+ (something like that... pulling from my inaccurate memory), that seem to be essential for cufflinks to run on these files. If you want to use cufflinks for expression analysis, I definitely recommend sticking with TopHat as the aligner for now. If you want to try out RSEM (one of my personal favorites right now), you could give it a whirl using the built-in bowtie alignments to unspliced transcript sequences. best, -brian On Fri, Oct 14, 2011 at 3:47 PM, Brian Haas <bh...@br...>wrote: > Hi Thomas, > > I'm glad you're having mostly good luck with these tools, and able to tweak > them as needed. > > I've been slowly moving the inchworm and related developments over to the > > http://trinity.sf.net > > site, and there's a new alignment wrapper under > > util/alignReads.pl > > that now supercedes the old blat alignment wrapper. With the above, you'd > use > > --aligner BLAT > > If you want to give it a whirl, pull all the code from SVN directly, since > this stuff has been pretty fluid lately. > svn co > https://trinityrnaseq.svn.sourceforge.net/svnroot/trinityrnaseq/trunktrinityrnaseq > > I don't remember what the blat system does in the context of > non-strand-specific data, but it sounds like we'll need to add the XS > attribute there. I'll have to investigate this further. I'll check into > the other issues you describe, since they sound very familiar. > > > thanks! > > -brian > > > > > On Fri, Oct 14, 2011 at 3:22 PM, Thomas Sandmann <tom...@go...>wrote: > >> Dear Brian, >> >> I have used your blat alignment pipeline to map non-strand-specific >> 2x100 bp RNASeq HiSeq reads to a reference genome. >> >> The sequence headers contained spaces, so I modified your perl scripts >> "fastQ_to_fastA.pl" and "fastQ_to_tab.pl" to split the headers on the >> first space and use only the first element as fasta header. >> >> Now, I would like to use cufflinks to assemble transcripts. >> Unfortunately, the "XS" tags seem to be missing from my .bam file. Is >> this expected ? Does the blat pipeline only add these for stranded reads ? >> >> Also, I would like to combine two .bam output files that were mapped to >> the same reference genome in separate blat alignment runs. Is the >> "samtools sort" command suitable for use with the output from blat >> alignments / input into cufflinks ? >> >> Thanks a ton for providing such amazing tools to the community ! >> Thomas >> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> _______________________________________________ >> Inchworm-users mailing list >> Inc...@li... >> https://lists.sourceforge.net/lists/listinfo/inchworm-users >> > > > > -- > -- > Brian J. Haas > Manager, Genome Annotation Research and Development > The Broad Institute > http://broad.mit.edu/~bhaas > > > > > -- -- Brian J. Haas Manager, Genome Annotation Research and Development The Broad Institute http://broad.mit.edu/~bhaas |
From: Brian H. <bh...@br...> - 2011-10-14 19:47:52
|
Hi Thomas, I'm glad you're having mostly good luck with these tools, and able to tweak them as needed. I've been slowly moving the inchworm and related developments over to the http://trinity.sf.net site, and there's a new alignment wrapper under util/alignReads.pl that now supercedes the old blat alignment wrapper. With the above, you'd use --aligner BLAT If you want to give it a whirl, pull all the code from SVN directly, since this stuff has been pretty fluid lately. svn co https://trinityrnaseq.svn.sourceforge.net/svnroot/trinityrnaseq/trunktrinityrnaseq I don't remember what the blat system does in the context of non-strand-specific data, but it sounds like we'll need to add the XS attribute there. I'll have to investigate this further. I'll check into the other issues you describe, since they sound very familiar. thanks! -brian On Fri, Oct 14, 2011 at 3:22 PM, Thomas Sandmann <tom...@go...>wrote: > Dear Brian, > > I have used your blat alignment pipeline to map non-strand-specific > 2x100 bp RNASeq HiSeq reads to a reference genome. > > The sequence headers contained spaces, so I modified your perl scripts > "fastQ_to_fastA.pl" and "fastQ_to_tab.pl" to split the headers on the > first space and use only the first element as fasta header. > > Now, I would like to use cufflinks to assemble transcripts. > Unfortunately, the "XS" tags seem to be missing from my .bam file. Is > this expected ? Does the blat pipeline only add these for stranded reads ? > > Also, I would like to combine two .bam output files that were mapped to > the same reference genome in separate blat alignment runs. Is the > "samtools sort" command suitable for use with the output from blat > alignments / input into cufflinks ? > > Thanks a ton for providing such amazing tools to the community ! > Thomas > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > -- -- Brian J. Haas Manager, Genome Annotation Research and Development The Broad Institute http://broad.mit.edu/~bhaas |
From: Thomas S. <tom...@go...> - 2011-10-14 19:22:13
|
Dear Brian, I have used your blat alignment pipeline to map non-strand-specific 2x100 bp RNASeq HiSeq reads to a reference genome. The sequence headers contained spaces, so I modified your perl scripts "fastQ_to_fastA.pl" and "fastQ_to_tab.pl" to split the headers on the first space and use only the first element as fasta header. Now, I would like to use cufflinks to assemble transcripts. Unfortunately, the "XS" tags seem to be missing from my .bam file. Is this expected ? Does the blat pipeline only add these for stranded reads ? Also, I would like to combine two .bam output files that were mapped to the same reference genome in separate blat alignment runs. Is the "samtools sort" command suitable for use with the output from blat alignments / input into cufflinks ? Thanks a ton for providing such amazing tools to the community ! Thomas |
From: Brian H. <bh...@br...> - 2011-09-12 22:07:32
|
Hi Nicholas, Try giving the full path to the inchworm utility. Also, I encourage you to use trinity as opposed to just inchworm. Get trinity at trinityrnaseq.sf.net The most recent version of inchworm is in trinity. Best -brian (via iPhone) On Sep 12, 2011, at 5:04 PM, "Sanford, Nicholas" <nic...@tt...> wrote: > Hello, > I have installed inchworm on a unix server but I am having trouble running the program. I am trying to do de novo transcriptome assembly so I am using the command: > > inchworm --reads Bayer5.fa --run_inchworm --DS >B5assembly.fasta > > When I enter this command I get the following result: > > -bash: inchworm: command not found > > Can you tell me what I am doing wrong?? > > > Thank you, > Nicholas Sanford > Research Assistant > Texas Tech University > Department of Plant and Soil Science > nic...@tt...<mailto:nic...@tt...> > 972 837 7536 > > ------------------------------------------------------------------------------ > Doing More with Less: The Next Generation Virtual Desktop > What are the key obstacles that have prevented many mid-market businesses > from deploying virtual desktops? How do next-generation virtual desktops > provide companies an easier-to-deploy, easier-to-manage and more affordable > virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/ > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users |
From: Sanford, N. <nic...@tt...> - 2011-09-12 21:17:12
|
Hello, I have installed inchworm on a unix server but I am having trouble running the program. I am trying to do de novo transcriptome assembly so I am using the command: inchworm --reads Bayer5.fa --run_inchworm --DS >B5assembly.fasta When I enter this command I get the following result: -bash: inchworm: command not found Can you tell me what I am doing wrong?? Thank you, Nicholas Sanford Research Assistant Texas Tech University Department of Plant and Soil Science nic...@tt...<mailto:nic...@tt...> 972 837 7536 |
From: Brian J H. <bh...@br...> - 2011-05-24 20:27:31
|
Hi Kurt, You'll need a more modern version of the GCC compiler suite to build it. The old versions are not compatible with openMP. Best, -b On Tue, May 24, 2011 at 4:14 PM, Kurt Showmaker <kc...@ms...> wrote: > Hello, > I am trying to install inchworm, and do not think it is installing > correctly. > When I run the "make" command I get the errors below. > > > > make all-recursive > make[1]: Entering directory `/hpc/compbio/Kurt/inchworm/inchworm-03132011' > Making all in src > make[2]: Entering directory > `/hpc/compbio/Kurt/inchworm/inchworm-03132011/src' > g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Wno-deprecated -fopenmp -MT > Fasta_entry.o -MD -MP -MF .deps/Fasta_entry.Tpo -c -o Fasta_entry.o > Fasta_entry.cpp > cc1plus: error: unrecognized command line option "-fopenmp" > make[2]: *** [Fasta_entry.o] Error 1 > make[2]: Leaving directory > `/hpc/compbio/Kurt/inchworm/inchworm-03132011/src' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/hpc/compbio/Kurt/inchworm/inchworm-03132011' > make: *** [all] Error 2 > > When I subsequently run inchworm I get the following error. > "-bash: .inchworm: cannot execute binary file. > > > Thank You for your consideration, and please advise. > Kurt Showmaker > > > -- > Kurt Showmaker > > > Institute for Genomics, Biocomputing, and Biotechnology > Graduate Research Assistant > Mississippi State University > Email:kc...@ms... > Cell:573-427-0060 > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Kurt S. <kc...@ms...> - 2011-05-24 20:14:58
|
Hello, I am trying to install inchworm, and do not think it is installing correctly. When I run the "make" command I get the errors below. make all-recursive make[1]: Entering directory `/hpc/compbio/Kurt/inchworm/inchworm-03132011' Making all in src make[2]: Entering directory `/hpc/compbio/Kurt/inchworm/inchworm-03132011/src' g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Wno-deprecated -fopenmp -MT Fasta_entry.o -MD -MP -MF .deps/Fasta_entry.Tpo -c -o Fasta_entry.o Fasta_entry.cpp cc1plus: error: unrecognized command line option "-fopenmp" make[2]: *** [Fasta_entry.o] Error 1 make[2]: Leaving directory `/hpc/compbio/Kurt/inchworm/inchworm-03132011/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/hpc/compbio/Kurt/inchworm/inchworm-03132011' make: *** [all] Error 2 When I subsequently run inchworm I get the following error. "-bash: .inchworm: cannot execute binary file. Thank You for your consideration, and please advise. Kurt Showmaker -- Kurt Showmaker Institute for Genomics, Biocomputing, and Biotechnology Graduate Research Assistant Mississippi State University Email:kc...@ms... Cell:573-427-0060 |
From: Brian J H. <bh...@br...> - 2011-03-22 20:24:39
|
Hi Dana, Sounds like so far, so good. In the trinity/ folder, you can see if inchworm has started reporting any contigs yet. There's also a monitor.out file there that will track the current status, in case it's not reporting contigs yet. In the future, you can set the OMP_NUM_THREADS environmental variable to 4 to use only 4 threads in the initial phase, rather than consuming whatever it can grab on to. In the next version of Trinity, I'll make this a command-line option. Let me know how it goes. Best, -brian On Tue, Mar 22, 2011 at 4:16 PM, Dana Price <dan...@gm...> wrote: > > I've got 42 million Illumina paired end reads (84 million total) of > 125x125bp, and ran the trinity pipeline via > > /usr/local/bin/Trinity.pl --seqType fq --left left.fastq --right > right.fastq --output output --run_butterfly --num_butterfly_CPU 48 > --min_contig_length 100 > > It's running inchworm, via > > /usr/local/bin/Inchworm/bin/inchworm --reads both.fa --run_inchworm -K 25 > -L 48 --monitor 1 --DS 2>monitor.out > inchworm.K25.L48.DS.fa > > That's been running for 3 days on a 48 cpu AMD Opteron 2.1GHz SMP machine. > It spawned 48 threads for the first two days. Today I see that the number > of threads has dropped to 1. It's got a 74 gig memory footprint. I'm > wondering if it's still doing anything? Is a three day assembly > par-for-the-course with this much data? > > Thanks! > > > -- > Dana Price > Laboratory Researcher in Bioinformatics > Rutgers, The State University > Bhattacharya Lab > http://dblab.rutgers.edu > > > > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Dana P. <dan...@gm...> - 2011-03-22 20:17:05
|
I've got 42 million Illumina paired end reads (84 million total) of 125x125bp, and ran the trinity pipeline via /usr/local/bin/Trinity.pl --seqType fq --left left.fastq --right right.fastq --output output --run_butterfly --num_butterfly_CPU 48 --min_contig_length 100 It's running inchworm, via /usr/local/bin/Inchworm/bin/inchworm --reads both.fa --run_inchworm -K 25 -L 48 --monitor 1 --DS 2>monitor.out > inchworm.K25.L48.DS.fa That's been running for 3 days on a 48 cpu AMD Opteron 2.1GHz SMP machine. It spawned 48 threads for the first two days. Today I see that the number of threads has dropped to 1. It's got a 74 gig memory footprint. I'm wondering if it's still doing anything? Is a three day assembly par-for-the-course with this much data? Thanks! -- Dana Price Laboratory Researcher in Bioinformatics Rutgers, The State University Bhattacharya Lab http://dblab.rutgers.edu |
From: Brian J H. <bh...@br...> - 2011-03-21 18:26:29
|
Hi Jing, If you run inchworm with '--monitor 1', you can follow its progress. It'll report how many sequences its parsed at certain intervals and also indicate at what stage it's in in the process before it starts outputting assemblies. If you want to capture the output to a file, be sure to direct the output to a file: inchworm --reads $file.reads --run_inchworm --monitor 1 > inchworm.fasta The 5G of data shouldn't be a problem, but it might take a while. By monitoring the progress (with the option above) and keeping an eye on your resources (memory available via 'top'), you'll get a feel for what your progress and any limitations are. Hopefully, you have enough physical RAM, which we estimate at around 1G of RAM per 1 M reads (~100 base), which in your case might require ~50G of RAM. Also, I should mention that our Trinity software is now available at: http://TrinityRNASeq.sf.net, and provides a more rigorous solution to RNA-Seq assembly. Inchworm is used as the first part of the process. Best, -brian On Mon, Mar 21, 2011 at 12:22 PM, Jing Liu <jl...@mc...> wrote: > Hi Inchworm group, > > I am a graduate student at Umass-Amherst. I am using Inchworm right > now to assemble some mRNA data. I have met two problems in running and > wonder if you can give me some help. > > 1, the data size I have is huge(over 5G). I found it seems to take a > long time to run. In fact I ran on our server over 10 hours and > nothing changed. I wonder for inchworm if there is a way to deal with > such a huge amount of data? > > 2, I found the program will print result in standard output. Will it > possible if I want to print result into a file? > > Thank you very much, > Jing Liu > > > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Jing L. <jl...@mc...> - 2011-03-21 17:44:18
|
Hi Inchworm group, I am a graduate student at Umass-Amherst. I am using Inchworm right now to assemble some mRNA data. I have met two problems in running and wonder if you can give me some help. 1, the data size I have is huge(over 5G). I found it seems to take a long time to run. In fact I ran on our server over 10 hours and nothing changed. I wonder for inchworm if there is a way to deal with such a huge amount of data? 2, I found the program will print result in standard output. Will it possible if I want to print result into a file? Thank you very much, Jing Liu |
From: Brian J H. <bh...@br...> - 2011-03-13 15:29:04
|
Greetings all. Two important messages: 1. A new version of Inchworm is now available: http://sourceforge.net/projects/inchworm/files/inchworm-03132011.tgz/download It includes contributions from Michael Ott and Alexie Papanicolao that (a) leverage OpenMP for parallel multi-core faster parsing of inputed read sequences, and (b) faster bit-level reverse complementing of sequences for faster processing in double-stranded RNA-Seq assembly mode. 2. Our initial release of Trinity is now available at http://TrinityRNASeq.sf.net Trinity uses Inchworm followed by two additional new tools: Chrysalis and Butterfly, which provide for full-length reconstruction of alternatively spliced isoforms and improved assembly of transcripts derived from paralogous genes. Code, documentation, and sample data are available from the Trinity website. The Trinity download includes the most recent version of Inchworm. We'll continue to maintain both sites, since there are additional Inchworm-based applications that haven't been migrated over to Trinity just yet, such as genome-guided de novo transcript assembly and integration with PASA for genome annotation. Best wishes, -Brian -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Greg C. <gco...@gm...> - 2011-02-25 21:53:20
|
Hi Brian, Thanks for the quick response! My strategy so far has been to run inchworm on the raw data with a range of >> Kmer sizes, from 25-37, concatenate those outputs and reassemble using the >> '--reassembleIworm' option. >> This brings me to my two questions so far: >> 1. I had to modify the source to allow a max Kmer size >31. Is there a >> particular reason for this limit? >> > > The true maximal limit should be 32, since the kmers are stored as 64 bit > unsigned integers (with 2 bits per base encoding). I made the max 31 > because I was hoping to reserve the last couple of bits to store additional > info at some point... but I haven't used it. If you try to go beyond 32, > your still only storing a 32mer worth of sequence and tossing out the other > bits. A different storage strategy would need to be built into inchworm to > go higher than the 32 bits and require some serious reengineering. > Ahhhh, I figured there would be a reason for the max limit, but figured I would try to run it anyway since it still compiled. Guess I shouldn't trust the results from the K > 31 runs... > In my various tests, I've found that 25mers work very well for > transcriptome data and going higher (such as beyond 29mers) can end up > fragmenting some otherwise nice full-length transcripts. Also, the > strategy of assembling using a bunch of different kmer lengths and combining > the data (which I borrowed conceptually from trans-ABySS though it works > very differently here, doesn't buy much, at least in the various tests I've > done. In many cases, we end up getting an ever so slight increase in the > number of full-length transcripts. > I noticed minimal differences in final assembly characteristics between runs with varying kmers. At least not enough to convince me that it's absolutely necessary for de novo reconstruction of a transcriptome. 2. I doubt the strategy i'm using is "the best" one, however from one lane >> of a flow cell (~5 Gb raw illumina data (2x76bp * 34E06 reads)) I was able >> to generate ~120 Mb of consensus sequence representing >500,000 >> contigs/transcripts (above a 100bp threshold). Should I be doing anything >> differently to maximize Inchworm's potential to assemble transcripts? So far >> the lengths are pretty good, with >8,000 transcripts longer than 1,000bps, >> and a few in the 10,000bp range. >> >> > Is this strand-specific data? > Unfortunately the data is not strand-specific. Any plans to incorporate read pair information? Note, I made some important improvements to inchworm recently: both removing > error-containing kmers and adjustments for non-strand-specific data. I > definitely encourage you to give it a whirl.. just run it once with the > default settings (k=25) and see how it looks. The total number of contigs > (and amount of complete garbage that result) should be minimal, and the > contigs that are reported should be heavily enriched for quality assemblies. > > Also note, I'm hoping to release the full Trinity package sometime next > week (http://trinityrnaseq.sf.net) which takes Inchworm results to another > level of utility, especially where alternative splicing is concerned. > I have the latest version of inchworm (2-21-2011) crunching the data right now at K25 & K29. It will be interesting to see how the results compare to the previous assemblies made with the older version (1-20-2011). I'm excited to run the data through the rest of the Trinity pipeline. I have several de novo transcriptomes in the works and would love an all in one package to take me from raw reads (adaptor/vector trimmed) to accurately identified alternative splice variants. Needless to say, i'll be watching the mailing list intently until the packages are released! Thanks! |
From: Brian J H. <bh...@br...> - 2011-02-25 17:50:10
|
Also, I forgot to mention, the Trinity paper (describing Inchworm and the other tools to be released) is currently under review. Best, -b On Fri, Feb 25, 2011 at 12:48 PM, Brian J Haas <bh...@br...>wrote: > Hi Greg, > > I've responded to your questions below: > > On Fri, Feb 25, 2011 at 12:23 PM, Greg Concepcion <gco...@gm...>wrote: > >> Hi Brian, >> >> Thanks for the awesome resource! >> >> I've been using the previous version of inchworm (inchworm_01-20-2011<http://sourceforge.net/projects/inchworm/files/OLD_VERSIONS/inchworm_01-20-2011.tgz/download>) >> for de novo assembly of transcriptomic data from a non-model organism with >> no reference genome available. So far my success has been great, both in >> terms of transcript length and maximum memory requirements (which crippled >> my Velvet/Oases assembly) >> >> > Excellent news! > > > >> My strategy so far has been to run inchworm on the raw data with a range >> of Kmer sizes, from 25-37, concatenate those outputs and reassemble using >> the '--reassembleIworm' option. >> This brings me to my two questions so far: >> 1. I had to modify the source to allow a max Kmer size >31. Is there a >> particular reason for this limit? >> > > The true maximal limit should be 32, since the kmers are stored as 64 bit > unsigned integers (with 2 bits per base encoding). I made the max 31 > because I was hoping to reserve the last couple of bits to store additional > info at some point... but I haven't used it. If you try to go beyond 32, > your still only storing a 32mer worth of sequence and tossing out the other > bits. A different storage strategy would need to be built into inchworm to > go higher than the 32 bits and require some serious reengineering. In my > various tests, I've found that 25mers work very well for transcriptome data > and going higher (such as beyond 29mers) can end up fragmenting some > otherwise nice full-length transcripts. Also, the strategy of assembling > using a bunch of different kmer lengths and combining the data (which I > borrowed conceptually from trans-ABySS though it works very differently > here, doesn't buy much, at least in the various tests I've done. In many > cases, we end up getting an ever so slight increase in the number of > full-length transcripts. > > >> 2. I doubt the strategy i'm using is "the best" one, however from one lane >> of a flow cell (~5 Gb raw illumina data (2x76bp * 34E06 reads)) I was able >> to generate ~120 Mb of consensus sequence representing >500,000 >> contigs/transcripts (above a 100bp threshold). Should I be doing anything >> differently to maximize Inchworm's potential to assemble transcripts? So far >> the lengths are pretty good, with >8,000 transcripts longer than 1,000bps, >> and a few in the 10,000bp range. >> >> > Is this strand-specific data? > > Note, I made some important improvements to inchworm recently: both > removing error-containing kmers and adjustments for non-strand-specific > data. I definitely encourage you to give it a whirl.. just run it once with > the default settings (k=25) and see how it looks. The total number of > contigs (and amount of complete garbage that result) should be minimal, and > the contigs that are reported should be heavily enriched for quality > assemblies. > > Also note, I'm hoping to release the full Trinity package sometime next > week (http://trinityrnaseq.sf.net) which takes Inchworm results to another > level of utility, especially where alternative splicing is concerned. > > > Best, > > -brian > > > >> Also, I'm hoping to submit this data for publication in the near future, >> is there an ETA on a date for a publication that I can cite? >> >> Aloha! >> >> Gregory T. Concepcion, PhD >> Cell Biology and Molecular Genetics >> 2107 Biosciences Research Building >> University of Maryland >> College Park, MD 20742 >> >> w:301.405.8300 >> c:301.828.8210 >> >> >> ------------------------------------------------------------------------------ >> Free Software Download: Index, Search & Analyze Logs and other IT data in >> Real-Time with Splunk. Collect, index and harness all the fast moving IT >> data >> generated by your applications, servers and devices whether physical, >> virtual >> or in the cloud. Deliver compliance at lower cost and gain new business >> insights. http://p.sf.net/sfu/splunk-dev2dev >> _______________________________________________ >> Inchworm-users mailing list >> Inc...@li... >> https://lists.sourceforge.net/lists/listinfo/inchworm-users >> >> > > > -- > -- > Brian J. Haas > Manager, Bioinformatics Outreach, Genome Annotation and Analysis > The Broad Institute > http://broad.mit.edu/~bhaas > > > > > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Brian J H. <bh...@br...> - 2011-02-25 17:49:01
|
Hi Greg, I've responded to your questions below: On Fri, Feb 25, 2011 at 12:23 PM, Greg Concepcion <gco...@gm...>wrote: > Hi Brian, > > Thanks for the awesome resource! > > I've been using the previous version of inchworm (inchworm_01-20-2011<http://sourceforge.net/projects/inchworm/files/OLD_VERSIONS/inchworm_01-20-2011.tgz/download>) > for de novo assembly of transcriptomic data from a non-model organism with > no reference genome available. So far my success has been great, both in > terms of transcript length and maximum memory requirements (which crippled > my Velvet/Oases assembly) > > Excellent news! > My strategy so far has been to run inchworm on the raw data with a range of > Kmer sizes, from 25-37, concatenate those outputs and reassemble using the > '--reassembleIworm' option. > This brings me to my two questions so far: > 1. I had to modify the source to allow a max Kmer size >31. Is there a > particular reason for this limit? > The true maximal limit should be 32, since the kmers are stored as 64 bit unsigned integers (with 2 bits per base encoding). I made the max 31 because I was hoping to reserve the last couple of bits to store additional info at some point... but I haven't used it. If you try to go beyond 32, your still only storing a 32mer worth of sequence and tossing out the other bits. A different storage strategy would need to be built into inchworm to go higher than the 32 bits and require some serious reengineering. In my various tests, I've found that 25mers work very well for transcriptome data and going higher (such as beyond 29mers) can end up fragmenting some otherwise nice full-length transcripts. Also, the strategy of assembling using a bunch of different kmer lengths and combining the data (which I borrowed conceptually from trans-ABySS though it works very differently here, doesn't buy much, at least in the various tests I've done. In many cases, we end up getting an ever so slight increase in the number of full-length transcripts. > 2. I doubt the strategy i'm using is "the best" one, however from one lane > of a flow cell (~5 Gb raw illumina data (2x76bp * 34E06 reads)) I was able > to generate ~120 Mb of consensus sequence representing >500,000 > contigs/transcripts (above a 100bp threshold). Should I be doing anything > differently to maximize Inchworm's potential to assemble transcripts? So far > the lengths are pretty good, with >8,000 transcripts longer than 1,000bps, > and a few in the 10,000bp range. > > Is this strand-specific data? Note, I made some important improvements to inchworm recently: both removing error-containing kmers and adjustments for non-strand-specific data. I definitely encourage you to give it a whirl.. just run it once with the default settings (k=25) and see how it looks. The total number of contigs (and amount of complete garbage that result) should be minimal, and the contigs that are reported should be heavily enriched for quality assemblies. Also note, I'm hoping to release the full Trinity package sometime next week (http://trinityrnaseq.sf.net) which takes Inchworm results to another level of utility, especially where alternative splicing is concerned. Best, -brian > Also, I'm hoping to submit this data for publication in the near future, is > there an ETA on a date for a publication that I can cite? > > Aloha! > > Gregory T. Concepcion, PhD > Cell Biology and Molecular Genetics > 2107 Biosciences Research Building > University of Maryland > College Park, MD 20742 > > w:301.405.8300 > c:301.828.8210 > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search & Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT > data > generated by your applications, servers and devices whether physical, > virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Greg C. <gco...@gm...> - 2011-02-25 17:23:25
|
Hi Brian, Thanks for the awesome resource! I've been using the previous version of inchworm (inchworm_01-20-2011<http://sourceforge.net/projects/inchworm/files/OLD_VERSIONS/inchworm_01-20-2011.tgz/download>) for de novo assembly of transcriptomic data from a non-model organism with no reference genome available. So far my success has been great, both in terms of transcript length and maximum memory requirements (which crippled my Velvet/Oases assembly) My strategy so far has been to run inchworm on the raw data with a range of Kmer sizes, from 25-37, concatenate those outputs and reassemble using the '--reassembleIworm' option. This brings me to my two questions so far: 1. I had to modify the source to allow a max Kmer size >31. Is there a particular reason for this limit? 2. I doubt the strategy i'm using is "the best" one, however from one lane of a flow cell (~5 Gb raw illumina data (2x76bp * 34E06 reads)) I was able to generate ~120 Mb of consensus sequence representing >500,000 contigs/transcripts (above a 100bp threshold). Should I be doing anything differently to maximize Inchworm's potential to assemble transcripts? So far the lengths are pretty good, with >8,000 transcripts longer than 1,000bps, and a few in the 10,000bp range. Also, I'm hoping to submit this data for publication in the near future, is there an ETA on a date for a publication that I can cite? Aloha! Gregory T. Concepcion, PhD Cell Biology and Molecular Genetics 2107 Biosciences Research Building University of Maryland College Park, MD 20742 w:301.405.8300 c:301.828.8210 |
From: Brian J H. <bh...@br...> - 2011-02-21 17:24:55
|
Greetings all. A new version of Inchworm is now available here: http://sourceforge.net/projects/inchworm/files/inchworm_r02-21-2011.tgz/download It includes the following changes: -cannot seed contigs from palindromic kmers -in double-stranded mode, the reverse-complement kmers are disabled during path extension (prevents artifactual 'fold-back' contigs resulting from inverted-repeat-containing or palindrome-containing sequences). -by default, prunes likely error-containing kmers, defined as candidate extension-kmers that have an abundance less than 5% of a dominant kmer extension. Much cleaner contig sets result, mitigating what we had called 'echo' contigs that were output based on error-containing kmers from highly expressed transcripts. Command-line options are available to manipulate this or turn it off altogether. Finally, note that the full Trinity RNA-Seq assembly suite ( http://trinityrnaseq.sf.net) should be available within the next week. Inchworm is leveraged as the front-end process in the Trinity suite, and future Inchworm updates will be provided as part of the larger Trinity package. Stay tuned for details. Best regards, -Brian -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Navish D. <nav...@co...> - 2011-02-18 15:24:58
|
Hi Brain, We got the -Werror on our centos machine but not on a Ubuntu installation. _Nav Navish Dadighat Computational Scientist +1 (573) 569-3305 +1 (888) 8- COFACTOR Cofactor, trust your samples with the most experienced team in the industry. +IlluminaCSPro = Illumina Certified Service Provider http://www.illumina.com/services/cspro.ilmn +AB SOLiD service provider Join our mailing list at http://eepurl.com/NjMT --- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute, retain, or copy this e-mail or any attachments. If you have received this email in error please delete and notify the sender. On Feb 18, 2011, at 8:21 AM, Brian J Haas wrote: > Hi Fabrice, > > Thanks for the info. I hadn't experienced that error on our centos linux nor mac osx. I did update the code to rectify the error, and I took the Werror out of the compilation. > > Could you do me a big favor and see if the latest code under SVN builds properly? If so, I'll plan for a new release shortly. You can pull the code like so: > > svn co https://inchworm.svn.sourceforge.net/svnroot/inchworm inchworm > > and hopefully, just do the configure / make / make install > > Thanks! > > -brian > > > On Fri, Feb 18, 2011 at 4:38 AM, Fabrice Legeai <fab...@ir...> wrote: > Hi, > > I've installed the 01-20-2010 version of inchwom > on a linux Redhat (2.6.18-194.11.4.el5 x86_64). > But while compiling I got the following error : > > g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Werror > -Wno-deprecated -MT IRKE.o -MD -MP -MF > .deps/IRKE.Tpo -c -o IRKE.o IRKE.cpp > cc1plus: warnings being treated as errors > IRKE.cpp: In member function ‘void > IRKE::populate_Kmers_from_fasta(const > std::string&, bool)’: > IRKE.cpp:126: warning: converting to ‘unsigned > int’ from ‘double’ > IRKE.cpp: In member function ‘void > IRKE::compute_sequence_assemblies(KmerCounter&, > float, unsigned int, unsigned int, bool, > std::string)’: > IRKE.cpp:492: warning: converting to ‘unsigned > int’ from ‘double’ > make[2]: *** [IRKE.o] Error 1 > make[2]: Leaving directory > `/partage/tmp/inchworm_01-20-2010/src' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory > `/partage/tmp/inchworm_01-20-2010' > make: *** [all] Error 2 > > I achieved to compile by removing the -Werror from > the configure.ac and src/Makefile.in, but I am not > sure that it is the good option. Would you please > check ? > > Best regards, > > Fabrice Legeai > AphidBase > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > > > > -- > -- > Brian J. Haas > Manager, Bioinformatics Outreach, Genome Annotation and Analysis > The Broad Institute > http://broad.mit.edu/~bhaas > > > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb_______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users |
From: Brian J H. <bh...@br...> - 2011-02-18 15:04:01
|
Thanks for the info. I'm taking the -Werror out altogether, but will leave in the -Wall. We'll keep patching the code to resolve any warnings that should arrise. Best, -brian On Fri, Feb 18, 2011 at 9:58 AM, Navish Dadighat < nav...@co...> wrote: > Hi Brain, > > We got the -Werror on our centos machine but not on a Ubuntu installation. > > _Nav > > Navish Dadighat > Computational Scientist > > +1 (573) 569-3305 > +1 (888) 8- COFACTOR > > > Cofactor, trust your samples with the most experienced team in the > industry. > +IlluminaCSPro = Illumina Certified Service Provider > http://www.illumina.com/services/cspro.ilmn > +AB SOLiD service provider > > Join our mailing list at http://eepurl.com/NjMT > --- > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they > are addressed. If you are not the named addressee you should not > disseminate, distribute, retain, or copy this e-mail or any > attachments. If you have received this email in error please delete and > notify the sender. > > On Feb 18, 2011, at 8:21 AM, Brian J Haas wrote: > > Hi Fabrice, > > Thanks for the info. I hadn't experienced that error on our centos linux > nor mac osx. I did update the code to rectify the error, and I took the > Werror out of the compilation. > > Could you do me a big favor and see if the latest code under SVN builds > properly? If so, I'll plan for a new release shortly. You can pull the > code like so: > > svn co https://inchworm.svn.sourceforge.net/svnroot/inchworm inchworm > > and hopefully, just do the configure / make / make install > > Thanks! > > -brian > > > On Fri, Feb 18, 2011 at 4:38 AM, Fabrice Legeai <fab...@ir...>wrote: > >> Hi, >> >> I've installed the 01-20-2010 version of inchwom >> on a linux Redhat (2.6.18-194.11.4.el5 x86_64). >> But while compiling I got the following error : >> >> g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Werror >> -Wno-deprecated -MT IRKE.o -MD -MP -MF >> .deps/IRKE.Tpo -c -o IRKE.o IRKE.cpp >> cc1plus: warnings being treated as errors >> IRKE.cpp: In member function ‘void >> IRKE::populate_Kmers_from_fasta(const >> std::string&, bool)’: >> IRKE.cpp:126: warning: converting to ‘unsigned >> int’ from ‘double’ >> IRKE.cpp: In member function ‘void >> IRKE::compute_sequence_assemblies(KmerCounter&, >> float, unsigned int, unsigned int, bool, >> std::string)’: >> IRKE.cpp:492: warning: converting to ‘unsigned >> int’ from ‘double’ >> make[2]: *** [IRKE.o] Error 1 >> make[2]: Leaving directory >> `/partage/tmp/inchworm_01-20-2010/src' >> make[1]: *** [all-recursive] Error 1 >> make[1]: Leaving directory >> `/partage/tmp/inchworm_01-20-2010' >> make: *** [all] Error 2 >> >> I achieved to compile by removing the -Werror from >> the configure.ac and src/Makefile.in, but I am not >> sure that it is the good option. Would you please >> check ? >> >> Best regards, >> >> Fabrice Legeai >> AphidBase >> >> >> ------------------------------------------------------------------------------ >> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: >> Pinpoint memory and threading errors before they happen. >> Find and fix more than 250 security defects in the development cycle. >> Locate bottlenecks in serial and parallel code that limit performance. >> http://p.sf.net/sfu/intel-dev2devfeb >> _______________________________________________ >> Inchworm-users mailing list >> Inc...@li... >> https://lists.sourceforge.net/lists/listinfo/inchworm-users >> > > > > -- > -- > Brian J. Haas > Manager, Bioinformatics Outreach, Genome Annotation and Analysis > The Broad Institute > http://broad.mit.edu/~bhaas > > > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > > http://p.sf.net/sfu/intel-dev2devfeb_______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > > > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Brian J H. <bh...@br...> - 2011-02-18 14:48:47
|
Hi Fabrice, Thanks for the info. I hadn't experienced that error on our centos linux nor mac osx. I did update the code to rectify the error, and I took the Werror out of the compilation. Could you do me a big favor and see if the latest code under SVN builds properly? If so, I'll plan for a new release shortly. You can pull the code like so: svn co https://inchworm.svn.sourceforge.net/svnroot/inchworm inchworm and hopefully, just do the configure / make / make install Thanks! -brian On Fri, Feb 18, 2011 at 4:38 AM, Fabrice Legeai <fab...@ir...>wrote: > Hi, > > I've installed the 01-20-2010 version of inchwom > on a linux Redhat (2.6.18-194.11.4.el5 x86_64). > But while compiling I got the following error : > > g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Werror > -Wno-deprecated -MT IRKE.o -MD -MP -MF > .deps/IRKE.Tpo -c -o IRKE.o IRKE.cpp > cc1plus: warnings being treated as errors > IRKE.cpp: In member function ‘void > IRKE::populate_Kmers_from_fasta(const > std::string&, bool)’: > IRKE.cpp:126: warning: converting to ‘unsigned > int’ from ‘double’ > IRKE.cpp: In member function ‘void > IRKE::compute_sequence_assemblies(KmerCounter&, > float, unsigned int, unsigned int, bool, > std::string)’: > IRKE.cpp:492: warning: converting to ‘unsigned > int’ from ‘double’ > make[2]: *** [IRKE.o] Error 1 > make[2]: Leaving directory > `/partage/tmp/inchworm_01-20-2010/src' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory > `/partage/tmp/inchworm_01-20-2010' > make: *** [all] Error 2 > > I achieved to compile by removing the -Werror from > the configure.ac and src/Makefile.in, but I am not > sure that it is the good option. Would you please > check ? > > Best regards, > > Fabrice Legeai > AphidBase > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Brian J H. <bh...@br...> - 2011-02-18 14:36:25
|
Fantastic! Thanks for checking! I'll run some tests on this new version and hopefully release it later today or tomorrow. There are a couple things this new version does, such as: -removes error-containing kmers for cleaner output -defaults to a kmer of 26 to avoid artifacts related to palindromic kmers. (should give improved results with non-strand-specific RNA-Seq data) Best, -b On Fri, Feb 18, 2011 at 9:32 AM, Fabrice Legeai <fab...@ir...>wrote: > Hi Brian, > > It worked perfectly. Thanks ! > > Fabrice > > Le 18/02/2011 15:21, Brian J Haas a écrit : > > Hi Fabrice, >> >> Thanks for the info. I hadn't experienced that error on our centos linux >> nor mac osx. I did update the code to rectify the error, and I took the >> Werror out of the compilation. >> >> Could you do me a big favor and see if the latest code under SVN builds >> properly? If so, I'll plan for a new release shortly. You can pull the >> code like so: >> >> svn co https://inchworm.svn.sourceforge.net/svnroot/inchworm inchworm >> >> and hopefully, just do the configure / make / make install >> >> Thanks! >> >> -brian >> >> >> On Fri, Feb 18, 2011 at 4:38 AM, Fabrice Legeai<fab...@ir... >> >wrote: >> >> Hi, >>> >>> I've installed the 01-20-2010 version of inchwom >>> on a linux Redhat (2.6.18-194.11.4.el5 x86_64). >>> But while compiling I got the following error : >>> >>> g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Werror >>> -Wno-deprecated -MT IRKE.o -MD -MP -MF >>> .deps/IRKE.Tpo -c -o IRKE.o IRKE.cpp >>> cc1plus: warnings being treated as errors >>> IRKE.cpp: In member function ‘void >>> IRKE::populate_Kmers_from_fasta(const >>> std::string&, bool)’: >>> IRKE.cpp:126: warning: converting to ‘unsigned >>> int’ from ‘double’ >>> IRKE.cpp: In member function ‘void >>> IRKE::compute_sequence_assemblies(KmerCounter&, >>> float, unsigned int, unsigned int, bool, >>> std::string)’: >>> IRKE.cpp:492: warning: converting to ‘unsigned >>> int’ from ‘double’ >>> make[2]: *** [IRKE.o] Error 1 >>> make[2]: Leaving directory >>> `/partage/tmp/inchworm_01-20-2010/src' >>> make[1]: *** [all-recursive] Error 1 >>> make[1]: Leaving directory >>> `/partage/tmp/inchworm_01-20-2010' >>> make: *** [all] Error 2 >>> >>> I achieved to compile by removing the -Werror from >>> the configure.ac and src/Makefile.in, but I am not >>> sure that it is the good option. Would you please >>> check ? >>> >>> Best regards, >>> >>> Fabrice Legeai >>> AphidBase >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: >>> Pinpoint memory and threading errors before they happen. >>> Find and fix more than 250 security defects in the development cycle. >>> Locate bottlenecks in serial and parallel code that limit performance. >>> http://p.sf.net/sfu/intel-dev2devfeb >>> _______________________________________________ >>> Inchworm-users mailing list >>> Inc...@li... >>> https://lists.sourceforge.net/lists/listinfo/inchworm-users >>> >>> >> >> > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: Fabrice L. <fab...@ir...> - 2011-02-18 14:32:31
|
Hi Brian, It worked perfectly. Thanks ! Fabrice Le 18/02/2011 15:21, Brian J Haas a écrit : > Hi Fabrice, > > Thanks for the info. I hadn't experienced that error on our centos linux > nor mac osx. I did update the code to rectify the error, and I took the > Werror out of the compilation. > > Could you do me a big favor and see if the latest code under SVN builds > properly? If so, I'll plan for a new release shortly. You can pull the > code like so: > > svn co https://inchworm.svn.sourceforge.net/svnroot/inchworm inchworm > > and hopefully, just do the configure / make / make install > > Thanks! > > -brian > > > On Fri, Feb 18, 2011 at 4:38 AM, Fabrice Legeai<fab...@ir...>wrote: > >> Hi, >> >> I've installed the 01-20-2010 version of inchwom >> on a linux Redhat (2.6.18-194.11.4.el5 x86_64). >> But while compiling I got the following error : >> >> g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Werror >> -Wno-deprecated -MT IRKE.o -MD -MP -MF >> .deps/IRKE.Tpo -c -o IRKE.o IRKE.cpp >> cc1plus: warnings being treated as errors >> IRKE.cpp: In member function ‘void >> IRKE::populate_Kmers_from_fasta(const >> std::string&, bool)’: >> IRKE.cpp:126: warning: converting to ‘unsigned >> int’ from ‘double’ >> IRKE.cpp: In member function ‘void >> IRKE::compute_sequence_assemblies(KmerCounter&, >> float, unsigned int, unsigned int, bool, >> std::string)’: >> IRKE.cpp:492: warning: converting to ‘unsigned >> int’ from ‘double’ >> make[2]: *** [IRKE.o] Error 1 >> make[2]: Leaving directory >> `/partage/tmp/inchworm_01-20-2010/src' >> make[1]: *** [all-recursive] Error 1 >> make[1]: Leaving directory >> `/partage/tmp/inchworm_01-20-2010' >> make: *** [all] Error 2 >> >> I achieved to compile by removing the -Werror from >> the configure.ac and src/Makefile.in, but I am not >> sure that it is the good option. Would you please >> check ? >> >> Best regards, >> >> Fabrice Legeai >> AphidBase >> >> >> ------------------------------------------------------------------------------ >> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: >> Pinpoint memory and threading errors before they happen. >> Find and fix more than 250 security defects in the development cycle. >> Locate bottlenecks in serial and parallel code that limit performance. >> http://p.sf.net/sfu/intel-dev2devfeb >> _______________________________________________ >> Inchworm-users mailing list >> Inc...@li... >> https://lists.sourceforge.net/lists/listinfo/inchworm-users >> > > |
From: Fabrice L. <fab...@ir...> - 2011-02-18 10:13:20
|
Hi, I've installed the 01-20-2010 version of inchwom on a linux Redhat (2.6.18-194.11.4.el5 x86_64). But while compiling I got the following error : g++ -DHAVE_CONFIG_H -I. -I.. -Wall -Werror -Wno-deprecated -MT IRKE.o -MD -MP -MF .deps/IRKE.Tpo -c -o IRKE.o IRKE.cpp cc1plus: warnings being treated as errors IRKE.cpp: In member function ‘void IRKE::populate_Kmers_from_fasta(const std::string&, bool)’: IRKE.cpp:126: warning: converting to ‘unsigned int’ from ‘double’ IRKE.cpp: In member function ‘void IRKE::compute_sequence_assemblies(KmerCounter&, float, unsigned int, unsigned int, bool, std::string)’: IRKE.cpp:492: warning: converting to ‘unsigned int’ from ‘double’ make[2]: *** [IRKE.o] Error 1 make[2]: Leaving directory `/partage/tmp/inchworm_01-20-2010/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/partage/tmp/inchworm_01-20-2010' make: *** [all] Error 2 I achieved to compile by removing the -Werror from the configure.ac and src/Makefile.in, but I am not sure that it is the good option. Would you please check ? Best regards, Fabrice Legeai AphidBase |
From: Brian J H. <bh...@br...> - 2010-11-18 12:37:02
|
Hi Andrew, Inchworm should work very well on squid, and strand-specific data should be slightly better in this case than non-strand-specific data. The strand-specific data helps mostly in the cases where you have high gene density (some fungal genomes) and when transcripts slightly overlap from opposite strands. Since you don't have a genome, I suggest running the Inchworm assembly output through CD-HIT with a 90% identity filter to pull out those transcripts that are the highest quality assemblies. This will minimize the number of artifacts that fall through (assemblies based on error-containing reads) and concentrate the results on the highest quality transcripts. In any case, if you have the option to go strand-specific, I highly encourage you to do so. Being able to properly differentiate between antisense transcripts and sense-transcripts for genes is important and can only be done properly with the strand-specific data. Best, -brian On Wed, Nov 17, 2010 at 11:31 AM, <and...@hu...> wrote: > Hello- > > I'm a PhD student in Dr. Spencer Nyholm's lab at UConn. We're thinking of > doing some strand specific sequencing and are interested in using inchworm > for the assembly. > > Our animal model (a squid) has no sequenced genome, so I'm curious about > how well inchworm assembles contigs without mapping the illumina reads to > a genome. How do the resulting contigs compare with contigs assembled from > unspecific strand reads? Can you give me some comparative statistics? > > Thank you. > -Andrew > > > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9-sfdev2dev > _______________________________________________ > Inchworm-users mailing list > Inc...@li... > https://lists.sourceforge.net/lists/listinfo/inchworm-users > -- -- Brian J. Haas Manager, Bioinformatics Outreach, Genome Annotation and Analysis The Broad Institute http://broad.mit.edu/~bhaas |
From: <and...@hu...> - 2010-11-17 16:47:18
|
Hello- I'm a PhD student in Dr. Spencer Nyholm's lab at UConn. We're thinking of doing some strand specific sequencing and are interested in using inchworm for the assembly. Our animal model (a squid) has no sequenced genome, so I'm curious about how well inchworm assembles contigs without mapping the illumina reads to a genome. How do the resulting contigs compare with contigs assembled from unspecific strand reads? Can you give me some comparative statistics? Thank you. -Andrew |