From: Giuseppe N. <gna...@gm...> - 2009-08-05 10:48:45
|
Dear Dan, I will play with the overlap size of the hash-overlap. Let me know how the re-assembling goes... Thanks a lot, --Giuseppe On Tue, Aug 4, 2009 at 1:16 PM, Dan Sommer<ds...@um...> wrote: > The Clear ranges look right. > The has been some minor changes to the code since the paper so I guess there is chance something is wrong. > If you want to try changing the overlap size, you can modify the minimus > pipeline script. The step to change is the hash-overlap step. Run 'overlap > -h' to see the options. I believe the -o option sets the min. overlap. If > you have amos install in your path, you should be able to run hash-overlap > at the command line. > I will try re-assembling the benchmark here once I get a chance. > -Dan > > On Thu, Jul 30, 2009 at 6:28 AM, Giuseppe Narzisi <gna...@gm...> > wrote: >> >> Just for your information, with Brucella, Wolbachia and Staphylococcus >> epidermidis I do not get a fragmented assembly. >> It only happens with the Staphylococcus aureus and Shewanella >> oneidensis datasets. >> >> --Giuseppe. >> >> >> >> On Thu, Jul 30, 2009 at 6:16 AM, Giuseppe Narzisi<gna...@gm...> >> wrote: >> > Dear Dan, >> > >> > I used the tarchive2amos utility to convert the fasta/qual/xml files >> > into the amos format. >> > Here is an example of RED from the afg file, it looks like the clear >> > range is properly handled. >> > I wanted to send you the full afg file but it is too big to send as an >> > attachment... >> > >> > Thanks, >> > --Giuseppe >> > >> > >> > {RED >> > iid:28498 >> > eid:GSALE31TF >> > seq: >> > NGCCAAGCTTGCATGCCTGCAGGTCGACACTAGAGGATCCCCTCGAATGTTTAATCATTT >> > AGAAGCGCCTACATCAGGTGAAGTTATTATAGATGGAGACCATATAGGTCAATTGTCCAA >> > AAATGGATTAAGAGCAAAAAGACAAAAAGTAAGTATGATCTTCCAACATTTTAATTTGTT >> > ATGGTCAAGGACTGTGTTAAAAAATATTATGTTTCCGCTTGAAATTGCAGGTGTCCCTAG >> > AAGGAGAGCTAAGCAAAAAGCATTAGAACTTGTCGAACTCGTCGGTTTAAAAGGTAGAGA >> > AAAGGCTTATCCATCAGAGTTATCAGGTGGACAAAAGCAACGTGTTGGGATTGCACGAGC >> > GTTAGCTAATGATCCAACGGTCTTGCTTTGTGATGAGGCAACAAGTGCACTTGATCCGCA >> > AACAACAGATGAAATTTTAGATCTACTACTAAAAATTAGAGAACAACAAAATTTAACAAT >> > TGTACTAATTACGCATGAAATGCATGTCATTCGTCGTATTTGTGATGAANTTGCAGTTAT >> > GGAAAGTGGTAAAGTGATAGAACAAGGACCGGTGACACAGGTTTTTGAAAATCCGCAACA >> > CACTGTGACAAAAACATTTGTGAAAGACGATTTAGATGAATATTTCGAAACATCTTTTAC >> > AGAATTAGAGCCATTAGAAAAAGATGCATATATCGGTTAGATTAGTTTCCGCTGGGTCAC >> > CAANCAACGGAGCCTATTGGTATCGAGTCTAC >> > . >> > qlt: >> > 0000FC00000ABB0000CLKRTTVLI000DGPRRR]]]XXPPLUUXX]X]RRRNNQQQQ >> > RQQQQQQV]]]]]]]UUU]]XXXXUUXUUUUX]]]XXXXUUUUUXUUUUUXX]UUUYYYY >> > ]]]Y[YYYRRRUUU]]]]]a]cccc]ZZUUUU]UUUUUUUUYYYYYYNNNUQQQQZUUNN >> > NNQRUUYUYUYUUQQQQQQUZ]]]]URRUUUR]YYRRRRRRUUYYYYYPPUVVUUXPPLL >> > UV]UURUUUOORUMUU]]]UUUGI00PPPUPFFKLU]]RRMMMMRUURRPUPP0B0GGRU >> > ]]SSEEELIPLOOOOIIPNMJHBD00?GGOOTTTRQNIHHLNQOOOOORQQQQPPKKPPO >> > QRROGLGIMOGG?00AAEHFHGAAJJOOOMFLLKJ@0000AIHHMHJRID?000000CIH >> > HTMMIJJJJLLLPGHEF000C000IGOOTTTQQQM?000GGORTH0000000GDAAHDDE >> > AGEIHHA@B0B000??KKNNNKLLLKGDC00000000@00@?0@00000000AG000000 >> > 00000B0000000BBE000000000000000000000000000@@000@00000000000 >> > 000000000000000000000000D00000000000000000000000000ABC0A0000 >> > 0000000000000000000000BBBB?000000000000000000000000000000000 >> > 00000000000000000000000000000000 >> > . >> > frg:1 >> > clr:42,529 >> > } >> > >> > >> > >> > >> > On Wed, Jul 29, 2009 at 11:02 AM, Dan Sommer<ds...@um...> >> > wrote: >> >> The clear ranges (CLEARL & CLEARR) are given in the fasta header line >> >> for >> >> each read in the benchmark but I am not sure they are being put into >> >> the >> >> amos message file (.afg). How did you convert the fasta file? Can you >> >> look >> >> to see if the .afg file has the clear ranges in it? >> >> Dan >> >> >> >> On Tue, Jul 28, 2009 at 6:47 AM, Giuseppe Narzisi <gna...@gm...> >> >> wrote: >> >>> >> >>> Dear Dan, >> >>> >> >>> the README of the assembly benchmark says that all the sequences have >> >>> been already trimmed to remove vector and low-quality basecalls. >> >>> For each read a clear-range is specified using CLEARL and CLEARR and I >> >>> assume that minimus uses this information during the assembly. >> >>> Should I trim the reads again using LUCY ? >> >>> Could it be that I have to change the default parameter of the >> >>> hash-overlap ? >> >>> >> >>> Thanks, >> >>> --Giuseppe >> >>> >> >>> >> >>> >> >>> On Mon, Jul 27, 2009 at 1:06 PM, Dan Sommer<ds...@um...> >> >>> wrote: >> >>> > Did you trim the reads first before assembling them? If you don't >> >>> > trim >> >>> > first using the LUCY trimming software, it will fragment the >> >>> > assembly. >> >>> > http://lucy.sourceforge.net/ >> >>> > Dan >> >>> > >> >>> > On Sat, Jul 25, 2009 at 9:43 AM, Giuseppe Narzisi >> >>> > <gna...@gm...> >> >>> > wrote: >> >>> >> >> >>> >> Hi everyone, >> >>> >> >> >>> >> I have been testing Minimus on the Staphylococcus Aureus genome >> >>> >> from >> >>> >> the benchmark data available at: >> >>> >> http://www.cbcb.umd.edu/research/benchmark.shtml >> >>> >> In partuclular, to simulate the assembly of the original shotgun >> >>> >> project, I have concatenated the data in random.seq and >> >>> >> random_nonmatching.seq >> >>> >> According to the results reported in the BMC Bioinformatics paper, >> >>> >> Minimus creates only 85 contigs however I get 5445 contigs. >> >>> >> So I was wondering what I am doing wrong. >> >>> >> I am using the standard minimus pipeline of the amos package. >> >>> >> >> >>> >> >> >>> >> Thanks, >> >>> >> --Giuseppe >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> ------------------------------------------------------------------------------ >> >>> >> _______________________________________________ >> >>> >> AMOS-help mailing list >> >>> >> AMO...@li... >> >>> >> https://lists.sourceforge.net/lists/listinfo/amos-help >> >>> > >> >>> > >> >> >> >> >> > > > |