dart-help Mailing List for DART: DNA, amino acid and RNA tests
Brought to you by:
ihh
You can subscribe to this list here.
2008 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(12) |
May
(6) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Ian H. <ih...@be...> - 2009-07-14 19:55:53
|
Vincent Yan Fu Tan wrote: > Dear Prof. Holmes, > > I read your paper on an EM algorithm for Training Hidden Substitution > Models. In it you mentioned that there's freely available code under > the GNU Public License at www.biowiki.org/hsm. However, this link > appears to be broken. I was wondering whether you can fix the link or > send me a copy of the code for research purposes. > > Thanks and best regards, > Vincent > Hi Vincent, The biowiki.org website appears to be very slow at the moment -- this occurs periodically, I think because I haven't excluded web robots correctly so it's currently being pounded by Google/Yahoo/Bing... sigh. It seems to recover in an hour or two. Anyway, the link (biowiki.org/hsm) should redirect to biowiki.org/DART, which is where all my molecular evolution software is distributed at the moment. The particular program you want is XRATE, which is described at biowiki.org/XRATE Even if biowiki.org is down, you can get the code itself from here: http://github.com/ihh/dart/tree/master Do let me know if you have other questions... Best wishes, Ian |
From: Farzad R. <far...@gm...> - 2009-01-07 23:41:07
|
Hi, I just downloaded Dart. I would be grateful if somebody could help me with the following questions. 1. Given a DNA sequence pair, how can I achieve the alignment and the likelihood of the alignment by using Dart? 2. I have a dataset containing 500 DNA sequence pairs in FASTA format. Is it possible that I run Dart on my dataset as a whole or I should run it on one sequence pair at a time? Thanks, Farzad |
From: marcin j. <mar...@gm...> - 2008-05-19 22:55:53
|
No problem, thanks again for all the hand-holding. I've started up a new job with the gene-finding grammar which will take at least a few hours based on the last (default grammar) run. So what is the default 'rev' grammar useful for? Just so I have a better context, what can I expect to get out of this given that the genomes are from fairly uncharacterized and potentially unusual sulfate-reducing anaerobic bacteria? Also, the three genomes have one close pair and two distant pairs. thanks. marcin On Mon, May 19, 2008 at 3:44 PM, Ian Holmes <ih...@be...> wrote: > marcin joachimiak wrote: >> hey >> >> Looks like I now have windowlicker output with 49500 alignment and trees like: >> // >> # STOCKHOLM 1.0 >> #=GF NH (CP000527:0.1287975598,(CP000112:0.0000000010,AE017286:0.0000000 >> 010)node_3:0.1287875646)root; >> #=GF SC_max_rev -643.47 >> CP000112 ACAAAGTTTGCTATCCCTTCTTTGAGCGGAGTTCCGGGCTTGAAGCCGGTGTCGGCAATCAGGT >> CP000527 ACGAAGGCGGCTACGCCCTCTTCGACCGTGGTCGCGGGGCGGAAGCCCGTGTCAGCGATGAGGT >> AE017286 ---------------------------------------------------------------- >> >> CP000112 CGTCCACATTGGCATATGTTGCTTCCACATCGCCGGGCTGCATGGGCATCATGTTTTTAACGGC >> CP000527 CGTCGACGTCGGCGTAGGTCGCTGCCACGTCGCCGGGCTGCATGGGCAGCATGTTCTTGACCGC >> AE017286 ---------------------------------------------------------------- >> >> CP000112 CTTTTTGCCCAGGCATTCTTCCAGAGTCGCGATGAACTCGCCAAGCTCTACCGTGTTGTTGTTG >> CP000527 CTTCTTGCCGAGGCACTCTTCGAGTACTTCGATGAACCTGCCGAGTTCGACGGTGTTGTTGTTC >> AE017286 ---------------------------------------------------------------- >> >> CP000112 CCAATGTT >> CP000527 CCGATGTT >> AE017286 -------- >> >> >> But I'll need a little guidance as to what these are are and how to >> interpret them. Arre these ~50,000 ncRNA predictions? >> >> >> For completeness, here's the command I used to generate the output: >> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg >> &> windowlicker.out > > > Oops. It does not look like you have actually specified a genefinding > grammar. The score "SC_max_rev" that's reported indicates that > windowlicker was using xrate's default "rev" grammar which is not a > genefinder... this is consistent with the command line above. > > You need to download a genefinding grammar e.g. from here > > http://biowiki.org/results/ncRNA_predictions/12fly/ncRna_v22.eg > > ...then change your command line to include this grammar... > > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg > -g ncRna_v22.eg &> windowlicker.out > > > I am sorry that with all the discussion of how to get the trees right, > I/we somehow lost sight of this fairly important point... I think we did > mention it initially but while trying to figure out options for the tree > estimation, it must have got dropped at some point... > > I. > > > > > >> >> thanks, >> marcin >> >> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>> Hi Marcin... >>> >>> So, I never ran xrate on an alignment this big before (6 million >>> columns!) and it turned out there was a drastic inefficiency in the >>> distance matrix code, that was causing it to scale as the square of the >>> number of columns. Unnoticeable for most alignments (since the code >>> inside the loop was pretty quick), but your example caused it to hang... >>> >>> I think it was quite possible that your xrate job was getting killed >>> because it was running too long. Are you running queue management >>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>> long-running jobs. I couldn't find a specific bug that was crashing on >>> this data, although I didn't finish running it because it was taking >>> sooooo long. >>> >>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>> neighbor-joining now finishes pretty fast. However, it now takes a long >>> time to complete the second step of the tree-estimation procedure, which >>> is to refine the branch lengths of the tree using the EM algorithm. >>> >>> There is less that I can do to optimize this. The neighbor-joining code >>> is not heavily dependent on the alignment length, but the branch-length >>> refinement is. I could optimize the branch-length code in the same way >>> that the neighbor-joining is optimized, so it was less dependent on >>> alignment length, but that would take a little while, and unfortunately >>> I don't have the time right now. >>> >>> Instead, what I have done is introduced a subtle hack. If you specify >>> the command-line option "--maxrounds 0", then this will bypass the >>> branch-length refinement code (specifically, what it does is tell xrate >>> that you want the EM algorithm to run for a maximum of zero iterations, >>> or "rounds"). >>> >>> Thus, with "--maxrounds 0" the tree will be estimated using >>> neighbor-joining only. >>> >>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>> an alignment this big before (for our screens, we pre-specified the >>> tree, so we went directly to windowlicker). >>> >>> Hopefully, this will now work for you, but let me know if you have >>> further issues. >>> >>> Cheers, >>> Ian >>> >>> >>> marcin joachimiak wrote: >>>> hey, >>>> >>>> Thanks for the help! >>>> The short example worked just fine. >>>> >>>> log3 with my previous command so far produced this: >>>> Finding eigenvectors using MatrixExpEigenPrepare >>>> Largest eigenvalue was 0; set to 0 >>>> Updating cached M() & J() >>>> Finding eigenvectors using MatrixExpEigenPrepare >>>> Largest eigenvalue was 0; set to 0 >>>> Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>> Updating cached M() & J() >>>> Sequence_database: converting ASCII sequences to score profiles >>>> Processed 3 sequences >>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> Estimating distances of sequences 0 and 1 >>>> >>>> You can download the FASTA and Stock files here: >>>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>> >>>> I'll have more updates on Monday -- enjoy the weekend, >>>> marcin >>>> - Show quoted text - >>>> >>>> On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>>> Thanks Marcin. >>>>> >>>>> Something has definitely gone wrong if no alignment is output. Can you >>>>> check that the following outputs an alignment+tree? >>>>> >>>>> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>>> ~/dart/src/ecfg/t/notree.stk >>>>> >>>>> >>>>> Looking at your log messages it seems like the program has crashed while >>>>> trying to do neighbor-joining to estimate the tree. For the above command, I >>>>> get log messages after the "Estimating tree..." message. See below (BTW many >>>>> of these are on stderr, not stdout) >>>>> >>>>> If you can reproduce this on the short example pairwise alignment but not >>>>> with the 3M alignments, there's something else wrong that needs >>>>> investigation. In that case, can you first try running "-log 3" instead of >>>>> "-log 5" (this will give more verbose log messages during the >>>>> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >>>>> please post a URL to your large file so I can download it and attempt to >>>>> reproduce the error. >>>>> >>>>> Sorry about this... >>>>> Ian >>>>> >>>>> >>>>> >>>>> >>>>> Warning -- used random number generator during initialisation >>>>> Read 1 alignments >>>>> Sequence_database: converting ASCII sequences to score profiles >>>>> Estimating tree for the following sequences: X Y >>>>> Optimizing branch lengths of all trees in alignment database, using EM >>>>> Optimizing tree branch lengths by EM. >>>>> Tree before optimization: >>>>> (X:5.0000000000,Y:5.0000000000)root; >>>>> EM iteration #1: log-likelihood = -56.0101 bits >>>>> Optimized tree after step #1: >>>>> (X:5.1160179765,Y:5.1160180140)root; >>>>> EM iteration #2: log-likelihood = -56.0118 bits >>>>> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>>> Failed EM improvement threshold for the 1th time; stopping >>>>> Checking post-iteration log-likelihood >>>>> Post-iteration log-likelihood = -56.0118 bits >>>>> Restoring previous best branch lengths >>>>> Tree after optimization: >>>>> (X:5.0000000000,Y:5.0000000000)root; >>>>> Alignment database log-likelihood: -56.0101 bits >>>>> Processing alignment Alignment1 (1 of 1): 14 columns >>>>> # STOCKHOLM 1.0 >>>>> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>>> X AAAAAAaaUUUUUU >>>>> Y GGGGGGaaCCCCCC >>>>> // >>>>> >>>>> >>>>> >>>>> >>>>> marcin joachimiak wrote: >>>>> >>>>>> Hey there, >>>>>> >>>>>> The xrate tree job took a while but it seemed to finish without >>>>>> errors. Alas there isn't any tree output ... >>>>>> >>>>>> Here's what I get from stdout: >>>>>> >>>>>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>>> Warning -- used random number generator during initialisation >>>>>> Read 1 alignments >>>>>> Sequence_database: converting ASCII sequences to score profiles >>>>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>>> >>>>>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>>> problem right? And this was run on a hefty machine ... >>>>>> >>>>>> >>>>>> marcin >>>>>> >>>>>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>>> <mar...@gm...> wrote: >>>>>> >>>>>>> much better, thanks! >>>>>>> I did notice the trees for each alignment in the output ... >>>>>>> >>>>>>> The tree estimation is running, I'll keep reporting ... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>>>>> > marcin joachimiak wrote: >>>>>>> > > Excellent, its working now! >>>>>>> > >>>>>>> > Cool! >>>>>>> > >>>>>>> > >>>>>>> > > I'll have the tree data soon and may try a run with as well, but >>>>> best >>>>>>> > > if I could know that the other tree is different from the xrate >>>>> one >>>>>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>>>> > > tree/distances from xrate? >>>>>>> > >>>>>>> > Yes, it will insert the trees into the alignment files it outputs, >>>>> using >>>>>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>>> email. >>>>>>> > >>>>>>> > One caveat. The way you're running it now, it is calculating a >>>>> separate >>>>>>> > tree *for every window* in the alignment. This may not be what you >>>>> want, >>>>>>> > and it's certainly going to slow things down. >>>>>>> > >>>>>>> > If you want to calculate the tree just once for the whole alignment, >>>>> type >>>>>>> > >>>>>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>>>> > >>>>>>> > >>>>>>> > The "--noannotate" just means "don't waste time trying to annotate >>>>> this >>>>>>> > alignment, I just want the tree". The "-log 5" will just print some >>>>> log >>>>>>> > messages on stderr, so you can watch the progress of the >>>>> tree-building >>>>>>> > algorithms. >>>>>>> > >>>>>>> > hth, >>>>>>> > >>>>>>> > I. >>>>>>> > >>>>>>> >>>>>>> > |
From: Ian H. <ih...@be...> - 2008-05-19 22:45:43
|
marcin joachimiak wrote: > hey > > Looks like I now have windowlicker output with 49500 alignment and trees like: > // > # STOCKHOLM 1.0 > #=GF NH (CP000527:0.1287975598,(CP000112:0.0000000010,AE017286:0.0000000 > 010)node_3:0.1287875646)root; > #=GF SC_max_rev -643.47 > CP000112 ACAAAGTTTGCTATCCCTTCTTTGAGCGGAGTTCCGGGCTTGAAGCCGGTGTCGGCAATCAGGT > CP000527 ACGAAGGCGGCTACGCCCTCTTCGACCGTGGTCGCGGGGCGGAAGCCCGTGTCAGCGATGAGGT > AE017286 ---------------------------------------------------------------- > > CP000112 CGTCCACATTGGCATATGTTGCTTCCACATCGCCGGGCTGCATGGGCATCATGTTTTTAACGGC > CP000527 CGTCGACGTCGGCGTAGGTCGCTGCCACGTCGCCGGGCTGCATGGGCAGCATGTTCTTGACCGC > AE017286 ---------------------------------------------------------------- > > CP000112 CTTTTTGCCCAGGCATTCTTCCAGAGTCGCGATGAACTCGCCAAGCTCTACCGTGTTGTTGTTG > CP000527 CTTCTTGCCGAGGCACTCTTCGAGTACTTCGATGAACCTGCCGAGTTCGACGGTGTTGTTGTTC > AE017286 ---------------------------------------------------------------- > > CP000112 CCAATGTT > CP000527 CCGATGTT > AE017286 -------- > > > But I'll need a little guidance as to what these are are and how to > interpret them. Arre these ~50,000 ncRNA predictions? > > > For completeness, here's the command I used to generate the output: > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg > &> windowlicker.out Oops. It does not look like you have actually specified a genefinding grammar. The score "SC_max_rev" that's reported indicates that windowlicker was using xrate's default "rev" grammar which is not a genefinder... this is consistent with the command line above. You need to download a genefinding grammar e.g. from here http://biowiki.org/results/ncRNA_predictions/12fly/ncRna_v22.eg ...then change your command line to include this grammar... perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg -g ncRna_v22.eg &> windowlicker.out I am sorry that with all the discussion of how to get the trees right, I/we somehow lost sight of this fairly important point... I think we did mention it initially but while trying to figure out options for the tree estimation, it must have got dropped at some point... I. > > thanks, > marcin > > On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >> Hi Marcin... >> >> So, I never ran xrate on an alignment this big before (6 million >> columns!) and it turned out there was a drastic inefficiency in the >> distance matrix code, that was causing it to scale as the square of the >> number of columns. Unnoticeable for most alignments (since the code >> inside the loop was pretty quick), but your example caused it to hang... >> >> I think it was quite possible that your xrate job was getting killed >> because it was running too long. Are you running queue management >> software of some kind (e.g. Sun Grid Engine)? That can often kill >> long-running jobs. I couldn't find a specific bug that was crashing on >> this data, although I didn't finish running it because it was taking >> sooooo long. >> >> Anyway, I've now fixed the inefficiency bug. The tree estimation by >> neighbor-joining now finishes pretty fast. However, it now takes a long >> time to complete the second step of the tree-estimation procedure, which >> is to refine the branch lengths of the tree using the EM algorithm. >> >> There is less that I can do to optimize this. The neighbor-joining code >> is not heavily dependent on the alignment length, but the branch-length >> refinement is. I could optimize the branch-length code in the same way >> that the neighbor-joining is optimized, so it was less dependent on >> alignment length, but that would take a little while, and unfortunately >> I don't have the time right now. >> >> Instead, what I have done is introduced a subtle hack. If you specify >> the command-line option "--maxrounds 0", then this will bypass the >> branch-length refinement code (specifically, what it does is tell xrate >> that you want the EM algorithm to run for a maximum of zero iterations, >> or "rounds"). >> >> Thus, with "--maxrounds 0" the tree will be estimated using >> neighbor-joining only. >> >> Sorry this has taken so long to fix! Like I said, I never used xrate on >> an alignment this big before (for our screens, we pre-specified the >> tree, so we went directly to windowlicker). >> >> Hopefully, this will now work for you, but let me know if you have >> further issues. >> >> Cheers, >> Ian >> >> >> marcin joachimiak wrote: >>> hey, >>> >>> Thanks for the help! >>> The short example worked just fine. >>> >>> log3 with my previous command so far produced this: >>> Finding eigenvectors using MatrixExpEigenPrepare >>> Largest eigenvalue was 0; set to 0 >>> Updating cached M() & J() >>> Finding eigenvectors using MatrixExpEigenPrepare >>> Largest eigenvalue was 0; set to 0 >>> Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>> Updating cached M() & J() >>> Sequence_database: converting ASCII sequences to score profiles >>> Processed 3 sequences >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> Estimating distances of sequences 0 and 1 >>> >>> You can download the FASTA and Stock files here: >>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>> >>> I'll have more updates on Monday -- enjoy the weekend, >>> marcin >>> - Show quoted text - >>> >>> On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>> Thanks Marcin. >>>> >>>> Something has definitely gone wrong if no alignment is output. Can you >>>> check that the following outputs an alignment+tree? >>>> >>>> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>> ~/dart/src/ecfg/t/notree.stk >>>> >>>> >>>> Looking at your log messages it seems like the program has crashed while >>>> trying to do neighbor-joining to estimate the tree. For the above command, I >>>> get log messages after the "Estimating tree..." message. See below (BTW many >>>> of these are on stderr, not stdout) >>>> >>>> If you can reproduce this on the short example pairwise alignment but not >>>> with the 3M alignments, there's something else wrong that needs >>>> investigation. In that case, can you first try running "-log 3" instead of >>>> "-log 5" (this will give more verbose log messages during the >>>> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >>>> please post a URL to your large file so I can download it and attempt to >>>> reproduce the error. >>>> >>>> Sorry about this... >>>> Ian >>>> >>>> >>>> >>>> >>>> Warning -- used random number generator during initialisation >>>> Read 1 alignments >>>> Sequence_database: converting ASCII sequences to score profiles >>>> Estimating tree for the following sequences: X Y >>>> Optimizing branch lengths of all trees in alignment database, using EM >>>> Optimizing tree branch lengths by EM. >>>> Tree before optimization: >>>> (X:5.0000000000,Y:5.0000000000)root; >>>> EM iteration #1: log-likelihood = -56.0101 bits >>>> Optimized tree after step #1: >>>> (X:5.1160179765,Y:5.1160180140)root; >>>> EM iteration #2: log-likelihood = -56.0118 bits >>>> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>> Failed EM improvement threshold for the 1th time; stopping >>>> Checking post-iteration log-likelihood >>>> Post-iteration log-likelihood = -56.0118 bits >>>> Restoring previous best branch lengths >>>> Tree after optimization: >>>> (X:5.0000000000,Y:5.0000000000)root; >>>> Alignment database log-likelihood: -56.0101 bits >>>> Processing alignment Alignment1 (1 of 1): 14 columns >>>> # STOCKHOLM 1.0 >>>> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>> X AAAAAAaaUUUUUU >>>> Y GGGGGGaaCCCCCC >>>> // >>>> >>>> >>>> >>>> >>>> marcin joachimiak wrote: >>>> >>>>> Hey there, >>>>> >>>>> The xrate tree job took a while but it seemed to finish without >>>>> errors. Alas there isn't any tree output ... >>>>> >>>>> Here's what I get from stdout: >>>>> >>>>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>> Warning -- used random number generator during initialisation >>>>> Read 1 alignments >>>>> Sequence_database: converting ASCII sequences to score profiles >>>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>> >>>>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>> problem right? And this was run on a hefty machine ... >>>>> >>>>> >>>>> marcin >>>>> >>>>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>> <mar...@gm...> wrote: >>>>> >>>>>> much better, thanks! >>>>>> I did notice the trees for each alignment in the output ... >>>>>> >>>>>> The tree estimation is running, I'll keep reporting ... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>>>> > marcin joachimiak wrote: >>>>>> > > Excellent, its working now! >>>>>> > >>>>>> > Cool! >>>>>> > >>>>>> > >>>>>> > > I'll have the tree data soon and may try a run with as well, but >>>> best >>>>>> > > if I could know that the other tree is different from the xrate >>>> one >>>>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>>> > > tree/distances from xrate? >>>>>> > >>>>>> > Yes, it will insert the trees into the alignment files it outputs, >>>> using >>>>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>> email. >>>>>> > >>>>>> > One caveat. The way you're running it now, it is calculating a >>>> separate >>>>>> > tree *for every window* in the alignment. This may not be what you >>>> want, >>>>>> > and it's certainly going to slow things down. >>>>>> > >>>>>> > If you want to calculate the tree just once for the whole alignment, >>>> type >>>>>> > >>>>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>>> > >>>>>> > >>>>>> > The "--noannotate" just means "don't waste time trying to annotate >>>> this >>>>>> > alignment, I just want the tree". The "-log 5" will just print some >>>> log >>>>>> > messages on stderr, so you can watch the progress of the >>>> tree-building >>>>>> > algorithms. >>>>>> > >>>>>> > hth, >>>>>> > >>>>>> > I. >>>>>> > >>>>>> >>>>>> |
From: marcin j. <mar...@gm...> - 2008-05-19 21:09:53
|
hey Looks like I now have windowlicker output with 49500 alignment and trees like: // # STOCKHOLM 1.0 #=GF NH (CP000527:0.1287975598,(CP000112:0.0000000010,AE017286:0.0000000 010)node_3:0.1287875646)root; #=GF SC_max_rev -643.47 CP000112 ACAAAGTTTGCTATCCCTTCTTTGAGCGGAGTTCCGGGCTTGAAGCCGGTGTCGGCAATCAGGT CP000527 ACGAAGGCGGCTACGCCCTCTTCGACCGTGGTCGCGGGGCGGAAGCCCGTGTCAGCGATGAGGT AE017286 ---------------------------------------------------------------- CP000112 CGTCCACATTGGCATATGTTGCTTCCACATCGCCGGGCTGCATGGGCATCATGTTTTTAACGGC CP000527 CGTCGACGTCGGCGTAGGTCGCTGCCACGTCGCCGGGCTGCATGGGCAGCATGTTCTTGACCGC AE017286 ---------------------------------------------------------------- CP000112 CTTTTTGCCCAGGCATTCTTCCAGAGTCGCGATGAACTCGCCAAGCTCTACCGTGTTGTTGTTG CP000527 CTTCTTGCCGAGGCACTCTTCGAGTACTTCGATGAACCTGCCGAGTTCGACGGTGTTGTTGTTC AE017286 ---------------------------------------------------------------- CP000112 CCAATGTT CP000527 CCGATGTT AE017286 -------- But I'll need a little guidance as to what these are are and how to interpret them. Arre these ~50,000 ncRNA predictions? For completeness, here's the command I used to generate the output: perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg &> windowlicker.out thanks, marcin On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: > Hi Marcin... > > So, I never ran xrate on an alignment this big before (6 million > columns!) and it turned out there was a drastic inefficiency in the > distance matrix code, that was causing it to scale as the square of the > number of columns. Unnoticeable for most alignments (since the code > inside the loop was pretty quick), but your example caused it to hang... > > I think it was quite possible that your xrate job was getting killed > because it was running too long. Are you running queue management > software of some kind (e.g. Sun Grid Engine)? That can often kill > long-running jobs. I couldn't find a specific bug that was crashing on > this data, although I didn't finish running it because it was taking > sooooo long. > > Anyway, I've now fixed the inefficiency bug. The tree estimation by > neighbor-joining now finishes pretty fast. However, it now takes a long > time to complete the second step of the tree-estimation procedure, which > is to refine the branch lengths of the tree using the EM algorithm. > > There is less that I can do to optimize this. The neighbor-joining code > is not heavily dependent on the alignment length, but the branch-length > refinement is. I could optimize the branch-length code in the same way > that the neighbor-joining is optimized, so it was less dependent on > alignment length, but that would take a little while, and unfortunately > I don't have the time right now. > > Instead, what I have done is introduced a subtle hack. If you specify > the command-line option "--maxrounds 0", then this will bypass the > branch-length refinement code (specifically, what it does is tell xrate > that you want the EM algorithm to run for a maximum of zero iterations, > or "rounds"). > > Thus, with "--maxrounds 0" the tree will be estimated using > neighbor-joining only. > > Sorry this has taken so long to fix! Like I said, I never used xrate on > an alignment this big before (for our screens, we pre-specified the > tree, so we went directly to windowlicker). > > Hopefully, this will now work for you, but let me know if you have > further issues. > > Cheers, > Ian > > > marcin joachimiak wrote: >> hey, >> >> Thanks for the help! >> The short example worked just fine. >> >> log3 with my previous command so far produced this: >> Finding eigenvectors using MatrixExpEigenPrepare >> Largest eigenvalue was 0; set to 0 >> Updating cached M() & J() >> Finding eigenvectors using MatrixExpEigenPrepare >> Largest eigenvalue was 0; set to 0 >> Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >> Updating cached M() & J() >> Sequence_database: converting ASCII sequences to score profiles >> Processed 3 sequences >> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >> Estimating distances of sequences 0 and 1 >> >> You can download the FASTA and Stock files here: >> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >> >> I'll have more updates on Monday -- enjoy the weekend, >> marcin >> - Show quoted text - >> >> On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>> Thanks Marcin. >>> >>> Something has definitely gone wrong if no alignment is output. Can you >>> check that the following outputs an alignment+tree? >>> >>> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>> ~/dart/src/ecfg/t/notree.stk >>> >>> >>> Looking at your log messages it seems like the program has crashed while >>> trying to do neighbor-joining to estimate the tree. For the above command, I >>> get log messages after the "Estimating tree..." message. See below (BTW many >>> of these are on stderr, not stdout) >>> >>> If you can reproduce this on the short example pairwise alignment but not >>> with the 3M alignments, there's something else wrong that needs >>> investigation. In that case, can you first try running "-log 3" instead of >>> "-log 5" (this will give more verbose log messages during the >>> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >>> please post a URL to your large file so I can download it and attempt to >>> reproduce the error. >>> >>> Sorry about this... >>> Ian >>> >>> >>> >>> >>> Warning -- used random number generator during initialisation >>> Read 1 alignments >>> Sequence_database: converting ASCII sequences to score profiles >>> Estimating tree for the following sequences: X Y >>> Optimizing branch lengths of all trees in alignment database, using EM >>> Optimizing tree branch lengths by EM. >>> Tree before optimization: >>> (X:5.0000000000,Y:5.0000000000)root; >>> EM iteration #1: log-likelihood = -56.0101 bits >>> Optimized tree after step #1: >>> (X:5.1160179765,Y:5.1160180140)root; >>> EM iteration #2: log-likelihood = -56.0118 bits >>> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>> Failed EM improvement threshold for the 1th time; stopping >>> Checking post-iteration log-likelihood >>> Post-iteration log-likelihood = -56.0118 bits >>> Restoring previous best branch lengths >>> Tree after optimization: >>> (X:5.0000000000,Y:5.0000000000)root; >>> Alignment database log-likelihood: -56.0101 bits >>> Processing alignment Alignment1 (1 of 1): 14 columns >>> # STOCKHOLM 1.0 >>> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>> X AAAAAAaaUUUUUU >>> Y GGGGGGaaCCCCCC >>> // >>> >>> >>> >>> >>> marcin joachimiak wrote: >>> >>>> Hey there, >>>> >>>> The xrate tree job took a while but it seemed to finish without >>>> errors. Alas there isn't any tree output ... >>>> >>>> Here's what I get from stdout: >>>> >>>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>> Warning -- used random number generator during initialisation >>>> Read 1 alignments >>>> Sequence_database: converting ASCII sequences to score profiles >>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> >>>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>> problem right? And this was run on a hefty machine ... >>>> >>>> >>>> marcin >>>> >>>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>> <mar...@gm...> wrote: >>>> >>>>> much better, thanks! >>>>> I did notice the trees for each alignment in the output ... >>>>> >>>>> The tree estimation is running, I'll keep reporting ... >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>>> > marcin joachimiak wrote: >>>>> > > Excellent, its working now! >>>>> > >>>>> > Cool! >>>>> > >>>>> > >>>>> > > I'll have the tree data soon and may try a run with as well, but >>> best >>>>> > > if I could know that the other tree is different from the xrate >>> one >>>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>> > > tree/distances from xrate? >>>>> > >>>>> > Yes, it will insert the trees into the alignment files it outputs, >>> using >>>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>> email. >>>>> > >>>>> > One caveat. The way you're running it now, it is calculating a >>> separate >>>>> > tree *for every window* in the alignment. This may not be what you >>> want, >>>>> > and it's certainly going to slow things down. >>>>> > >>>>> > If you want to calculate the tree just once for the whole alignment, >>> type >>>>> > >>>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>> > >>>>> > >>>>> > The "--noannotate" just means "don't waste time trying to annotate >>> this >>>>> > alignment, I just want the tree". The "-log 5" will just print some >>> log >>>>> > messages on stderr, so you can watch the progress of the >>> tree-building >>>>> > algorithms. >>>>> > >>>>> > hth, >>>>> > >>>>> > I. >>>>> > >>>>> >>>>> > |
From: Ian H. <ih...@be...> - 2008-05-13 21:13:59
|
You only really need that (and the "-e $DARTDIR/grammars/jukescantor.eg" option) for the initial tree. You can pass the "-e $DARTDIR/grammars/jukescantor.eg" to windowlicker too if you like, but with the "--maxrounds 0" there it won't actually do anything, and you probably don't need it anyway. marcin joachimiak wrote: > OK, its running now! > Thanks -- originally I couldn't get output from standard redirect '>' > but it worked the second time. > > Oh yes, where does the "--maxrounds 0" argument go? Is that for > windowlicker or just the initial tree? > > m > > On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote: >> Yes -- your "&>" shell redirection is redirecting all xrate's error & >> logging output to your "out" file. That is what the second xrate run >> (called from windowlicker) is complaining about. If you use ">" instead >> of "&>" then you will not mix stderr and stdout, and what you're doing >> should work (fingers crossed) >> >> Ian >> >> marcin joachimiak wrote: >>> hey there, >>> >>> So, I got the tree output after the cvs update, see attached. >>> >>> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg >>> --noannotate -log 5 &> out >>> >>> I was only about to capture the xrate tree output using &> so there >>> are likely extraneous lines there breaking things? >>> >>> I ran windowlicker as follows: >>> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >>> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e >>> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out >>> >>> And got this, seemingly complaining that xrate failed: >>> >>> Ignoring line: Checking post-iteration log-likelihood >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Tree after optimization: >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> xrate failed (check command-line options? they were: >>> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >>> >>> I also tried w/o the -maxrounds argument and got: >>> >>> Ignoring line: Checking post-iteration log-likelihood >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Tree after optimization: >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> [stderr] ERROR: Bad node specifier: >>> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' >>> xrate returned with nonzero exit status (invoked using: >>> /usr2/people/gtl/bin/dart//bin/xrate -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >>> >>> cheers, >>> marcin >>> >>> >>> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak >>> <mar...@gm...> wrote: >>>> Hey hey, >>>> >>>> Indeed, the alignment size limitation makes sense and was one of the >>>> things I was suspecting. >>>> >>>> The job was run on a machine w.o queue management but I will >>>> investigate whether it could have been killed by other means. >>>> >>>> The 0 iteration option sounds like it will do the trick -- so I just >>>> need to checkout the DART code from cvs again, right? Thanks for >>>> adding this in and I definitely understand the time constraints etc., >>>> this should suffice for my purposes. I could of course try to obtain a >>>> tree in other ways, but actually I'm trying to turn this into a >>>> pipeline that can run on multi-genome alignments, its almost there. >>>> >>>> cheers, >>>> marcin >>>> >>>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>>>> Hi Marcin... >>>>> >>>>> So, I never ran xrate on an alignment this big before (6 million >>>>> columns!) and it turned out there was a drastic inefficiency in the >>>>> distance matrix code, that was causing it to scale as the square of the >>>>> number of columns. Unnoticeable for most alignments (since the code >>>>> inside the loop was pretty quick), but your example caused it to hang... >>>>> >>>>> I think it was quite possible that your xrate job was getting killed >>>>> because it was running too long. Are you running queue management >>>>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>>>> long-running jobs. I couldn't find a specific bug that was crashing on >>>>> this data, although I didn't finish running it because it was taking >>>>> sooooo long. >>>>> >>>>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>>>> neighbor-joining now finishes pretty fast. However, it now takes a long >>>>> time to complete the second step of the tree-estimation procedure, which >>>>> is to refine the branch lengths of the tree using the EM algorithm. >>>>> >>>>> There is less that I can do to optimize this. The neighbor-joining code >>>>> is not heavily dependent on the alignment length, but the branch-length >>>>> refinement is. I could optimize the branch-length code in the same way >>>>> that the neighbor-joining is optimized, so it was less dependent on >>>>> alignment length, but that would take a little while, and unfortunately >>>>> I don't have the time right now. >>>>> >>>>> Instead, what I have done is introduced a subtle hack. If you specify >>>>> the command-line option "--maxrounds 0", then this will bypass the >>>>> branch-length refinement code (specifically, what it does is tell xrate >>>>> that you want the EM algorithm to run for a maximum of zero iterations, >>>>> or "rounds"). >>>>> >>>>> Thus, with "--maxrounds 0" the tree will be estimated using >>>>> neighbor-joining only. >>>>> >>>>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>>>> an alignment this big before (for our screens, we pre-specified the >>>>> tree, so we went directly to windowlicker). >>>>> >>>>> Hopefully, this will now work for you, but let me know if you have >>>>> further issues. >>>>> >>>>> Cheers, >>>>> Ian >>>>> >>>>> >>>>> >>>>> >>>>> marcin joachimiak wrote: >>>>> > hey, >>>>> > >>>>> > Thanks for the help! >>>>> > The short example worked just fine. >>>>> > >>>>> > log3 with my previous command so far produced this: >>>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>>> > Largest eigenvalue was 0; set to 0 >>>>> > Updating cached M() & J() >>>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>>> > Largest eigenvalue was 0; set to 0 >>>>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>>> > Updating cached M() & J() >>>>> > Sequence_database: converting ASCII sequences to score profiles >>>>> > Processed 3 sequences >>>>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>> > Estimating distances of sequences 0 and 1 >>>>> > >>>>> > You can download the FASTA and Stock files here: >>>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>>> > >>>>> > I'll have more updates on Monday -- enjoy the weekend, >>>>> > marcin >>>>> > - Show quoted text - >>>>> > >>>>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>>> >> Thanks Marcin. >>>>> >> >>>>> >> Something has definitely gone wrong if no alignment is output. Can you >>>>> >> check that the following outputs an alignment+tree? >>>>> >> >>>>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>>> >> ~/dart/src/ecfg/t/notree.stk >>>>> >> >>>>> >> >>>>> >> Looking at your log messages it seems like the program has > crashed while >>>>> >> trying to do neighbor-joining to estimate the tree. For the >>> above command, I >>>>> >> get log messages after the "Estimating tree..." message. See >>> below (BTW many >>>>> >> of these are on stderr, not stdout) >>>>> >> >>>>> >> If you can reproduce this on the short example pairwise >>> alignment but not >>>>> >> with the 3M alignments, there's something else wrong that needs >>>>> >> investigation. In that case, can you first try running "-log 3" >>> instead of >>>>> >> "-log 5" (this will give more verbose log messages during the >>>>> >> neighbor-joining algorithm), then mail the stderr log messages >>> to me, and/or >>>>> >> please post a URL to your large file so I can download it and > attempt to >>>>> >> reproduce the error. >>>>> >> >>>>> >> Sorry about this... >>>>> >> Ian >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> Warning -- used random number generator during initialisation >>>>> >> Read 1 alignments >>>>> >> Sequence_database: converting ASCII sequences to score profiles >>>>> >> Estimating tree for the following sequences: X Y >>>>> >> Optimizing branch lengths of all trees in alignment database, using EM >>>>> >> Optimizing tree branch lengths by EM. >>>>> >> Tree before optimization: >>>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>>> >> EM iteration #1: log-likelihood = -56.0101 bits >>>>> >> Optimized tree after step #1: >>>>> >> (X:5.1160179765,Y:5.1160180140)root; >>>>> >> EM iteration #2: log-likelihood = -56.0118 bits >>>>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 > bits during EM >>>>> >> Failed EM improvement threshold for the 1th time; stopping >>>>> >> Checking post-iteration log-likelihood >>>>> >> Post-iteration log-likelihood = -56.0118 bits >>>>> >> Restoring previous best branch lengths >>>>> >> Tree after optimization: >>>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>>> >> Alignment database log-likelihood: -56.0101 bits >>>>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>>>> >> # STOCKHOLM 1.0 >>>>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>>> >> X AAAAAAaaUUUUUU >>>>> >> Y GGGGGGaaCCCCCC >>>>> >> // >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> marcin joachimiak wrote: >>>>> >> >>>>> >>> Hey there, >>>>> >>> >>>>> >>> The xrate tree job took a while but it seemed to finish without >>>>> >>> errors. Alas there isn't any tree output ... >>>>> >>> >>>>> >>> Here's what I get from stdout: >>>>> >>> >>>>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>> >>> Warning -- used random number generator during initialisation >>>>> >>> Read 1 alignments >>>>> >>> Sequence_database: converting ASCII sequences to score profiles >>>>> >>> Estimating tree for the following sequences: CP000112 > CP000527 AE017286 >>>>> >>> >>>>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>> >>> problem right? And this was run on a hefty machine ... >>>>> >>> >>>>> >>> >>>>> >>> marcin >>>>> >>> >>>>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>> >>> <mar...@gm...> wrote: >>>>> >>> >>>>> >>>> much better, thanks! >>>>> >>>> I did notice the trees for each alignment in the output ... >>>>> >>>> >>>>> >>>> The tree estimation is running, I'll keep reporting ... >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes > <ih...@be...> wrote: >>>>> >>>> > marcin joachimiak wrote: >>>>> >>>> > > Excellent, its working now! >>>>> >>>> > >>>>> >>>> > Cool! >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > > I'll have the tree data soon and may try a run with > as well, but >>>>> >> best >>>>> >>>> > > if I could know that the other tree is different from the xrate >>>>> >> one >>>>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>> >>>> > > tree/distances from xrate? >>>>> >>>> > >>>>> >>>> > Yes, it will insert the trees into the alignment files > it outputs, >>>>> >> using >>>>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>>> >> email. >>>>> >>>> > >>>>> >>>> > One caveat. The way you're running it now, it is calculating a >>>>> >> separate >>>>> >>>> > tree *for every window* in the alignment. This may not > be what you >>>>> >> want, >>>>> >>>> > and it's certainly going to slow things down. >>>>> >>>> > >>>>> >>>> > If you want to calculate the tree just once for the whole >>> alignment, >>>>> >> type >>>>> >>>> > >>>>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > The "--noannotate" just means "don't waste time trying > to annotate >>>>> >> this >>>>> >>>> > alignment, I just want the tree". The "-log 5" will > just print some >>>>> >> log >>>>> >>>> > messages on stderr, so you can watch the progress of the >>>>> >> tree-building >>>>> >>>> > algorithms. >>>>> >>>> > >>>>> >>>> > hth, >>>>> >>>> > >>>>> >>>> > I. >>>>> >>>> > >>>>> >>>> >>>>> >>>> >>>>> |
From: marcin j. <mar...@gm...> - 2008-05-13 21:10:33
|
OK, its running now! Thanks -- originally I couldn't get output from standard redirect '>' but it worked the second time. Oh yes, where does the "--maxrounds 0" argument go? Is that for windowlicker or just the initial tree? m On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote: > Yes -- your "&>" shell redirection is redirecting all xrate's error & > logging output to your "out" file. That is what the second xrate run > (called from windowlicker) is complaining about. If you use ">" instead > of "&>" then you will not mix stderr and stdout, and what you're doing > should work (fingers crossed) > > Ian > > marcin joachimiak wrote: >> hey there, >> >> So, I got the tree output after the cvs update, see attached. >> >> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg >> --noannotate -log 5 &> out >> >> I was only about to capture the xrate tree output using &> so there >> are likely extraneous lines there breaking things? >> >> I ran windowlicker as follows: >> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e >> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out >> >> And got this, seemingly complaining that xrate failed: >> >> Ignoring line: Checking post-iteration log-likelihood >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Tree after optimization: >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> xrate failed (check command-line options? they were: >> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e >> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >> >> I also tried w/o the -maxrounds argument and got: >> >> Ignoring line: Checking post-iteration log-likelihood >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Tree after optimization: >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> [stderr] ERROR: Bad node specifier: >> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' >> xrate returned with nonzero exit status (invoked using: >> /usr2/people/gtl/bin/dart//bin/xrate -e >> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >> >> cheers, >> marcin >> >> >> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak >> <mar...@gm...> wrote: >>> Hey hey, >>> >>> Indeed, the alignment size limitation makes sense and was one of the >>> things I was suspecting. >>> >>> The job was run on a machine w.o queue management but I will >>> investigate whether it could have been killed by other means. >>> >>> The 0 iteration option sounds like it will do the trick -- so I just >>> need to checkout the DART code from cvs again, right? Thanks for >>> adding this in and I definitely understand the time constraints etc., >>> this should suffice for my purposes. I could of course try to obtain a >>> tree in other ways, but actually I'm trying to turn this into a >>> pipeline that can run on multi-genome alignments, its almost there. >>> >>> cheers, >>> marcin >>> >>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>>> Hi Marcin... >>>> >>>> So, I never ran xrate on an alignment this big before (6 million >>>> columns!) and it turned out there was a drastic inefficiency in the >>>> distance matrix code, that was causing it to scale as the square of the >>>> number of columns. Unnoticeable for most alignments (since the code >>>> inside the loop was pretty quick), but your example caused it to hang... >>>> >>>> I think it was quite possible that your xrate job was getting killed >>>> because it was running too long. Are you running queue management >>>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>>> long-running jobs. I couldn't find a specific bug that was crashing on >>>> this data, although I didn't finish running it because it was taking >>>> sooooo long. >>>> >>>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>>> neighbor-joining now finishes pretty fast. However, it now takes a long >>>> time to complete the second step of the tree-estimation procedure, which >>>> is to refine the branch lengths of the tree using the EM algorithm. >>>> >>>> There is less that I can do to optimize this. The neighbor-joining code >>>> is not heavily dependent on the alignment length, but the branch-length >>>> refinement is. I could optimize the branch-length code in the same way >>>> that the neighbor-joining is optimized, so it was less dependent on >>>> alignment length, but that would take a little while, and unfortunately >>>> I don't have the time right now. >>>> >>>> Instead, what I have done is introduced a subtle hack. If you specify >>>> the command-line option "--maxrounds 0", then this will bypass the >>>> branch-length refinement code (specifically, what it does is tell xrate >>>> that you want the EM algorithm to run for a maximum of zero iterations, >>>> or "rounds"). >>>> >>>> Thus, with "--maxrounds 0" the tree will be estimated using >>>> neighbor-joining only. >>>> >>>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>>> an alignment this big before (for our screens, we pre-specified the >>>> tree, so we went directly to windowlicker). >>>> >>>> Hopefully, this will now work for you, but let me know if you have >>>> further issues. >>>> >>>> Cheers, >>>> Ian >>>> >>>> >>>> >>>> >>>> marcin joachimiak wrote: >>>> > hey, >>>> > >>>> > Thanks for the help! >>>> > The short example worked just fine. >>>> > >>>> > log3 with my previous command so far produced this: >>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>> > Largest eigenvalue was 0; set to 0 >>>> > Updating cached M() & J() >>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>> > Largest eigenvalue was 0; set to 0 >>>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>> > Updating cached M() & J() >>>> > Sequence_database: converting ASCII sequences to score profiles >>>> > Processed 3 sequences >>>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> > Estimating distances of sequences 0 and 1 >>>> > >>>> > You can download the FASTA and Stock files here: >>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>> > >>>> > I'll have more updates on Monday -- enjoy the weekend, >>>> > marcin >>>> > - Show quoted text - >>>> > >>>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>> >> Thanks Marcin. >>>> >> >>>> >> Something has definitely gone wrong if no alignment is output. Can you >>>> >> check that the following outputs an alignment+tree? >>>> >> >>>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>> >> ~/dart/src/ecfg/t/notree.stk >>>> >> >>>> >> >>>> >> Looking at your log messages it seems like the program has crashed while >>>> >> trying to do neighbor-joining to estimate the tree. For the >> above command, I >>>> >> get log messages after the "Estimating tree..." message. See >> below (BTW many >>>> >> of these are on stderr, not stdout) >>>> >> >>>> >> If you can reproduce this on the short example pairwise >> alignment but not >>>> >> with the 3M alignments, there's something else wrong that needs >>>> >> investigation. In that case, can you first try running "-log 3" >> instead of >>>> >> "-log 5" (this will give more verbose log messages during the >>>> >> neighbor-joining algorithm), then mail the stderr log messages >> to me, and/or >>>> >> please post a URL to your large file so I can download it and attempt to >>>> >> reproduce the error. >>>> >> >>>> >> Sorry about this... >>>> >> Ian >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> Warning -- used random number generator during initialisation >>>> >> Read 1 alignments >>>> >> Sequence_database: converting ASCII sequences to score profiles >>>> >> Estimating tree for the following sequences: X Y >>>> >> Optimizing branch lengths of all trees in alignment database, using EM >>>> >> Optimizing tree branch lengths by EM. >>>> >> Tree before optimization: >>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>> >> EM iteration #1: log-likelihood = -56.0101 bits >>>> >> Optimized tree after step #1: >>>> >> (X:5.1160179765,Y:5.1160180140)root; >>>> >> EM iteration #2: log-likelihood = -56.0118 bits >>>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>> >> Failed EM improvement threshold for the 1th time; stopping >>>> >> Checking post-iteration log-likelihood >>>> >> Post-iteration log-likelihood = -56.0118 bits >>>> >> Restoring previous best branch lengths >>>> >> Tree after optimization: >>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>> >> Alignment database log-likelihood: -56.0101 bits >>>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>>> >> # STOCKHOLM 1.0 >>>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>> >> X AAAAAAaaUUUUUU >>>> >> Y GGGGGGaaCCCCCC >>>> >> // >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> marcin joachimiak wrote: >>>> >> >>>> >>> Hey there, >>>> >>> >>>> >>> The xrate tree job took a while but it seemed to finish without >>>> >>> errors. Alas there isn't any tree output ... >>>> >>> >>>> >>> Here's what I get from stdout: >>>> >>> >>>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>> >>> Warning -- used random number generator during initialisation >>>> >>> Read 1 alignments >>>> >>> Sequence_database: converting ASCII sequences to score profiles >>>> >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> >>> >>>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>> >>> problem right? And this was run on a hefty machine ... >>>> >>> >>>> >>> >>>> >>> marcin >>>> >>> >>>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>> >>> <mar...@gm...> wrote: >>>> >>> >>>> >>>> much better, thanks! >>>> >>>> I did notice the trees for each alignment in the output ... >>>> >>>> >>>> >>>> The tree estimation is running, I'll keep reporting ... >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>> >>>> > marcin joachimiak wrote: >>>> >>>> > > Excellent, its working now! >>>> >>>> > >>>> >>>> > Cool! >>>> >>>> > >>>> >>>> > >>>> >>>> > > I'll have the tree data soon and may try a run with as well, but >>>> >> best >>>> >>>> > > if I could know that the other tree is different from the xrate >>>> >> one >>>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>> >>>> > > tree/distances from xrate? >>>> >>>> > >>>> >>>> > Yes, it will insert the trees into the alignment files it outputs, >>>> >> using >>>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>> >> email. >>>> >>>> > >>>> >>>> > One caveat. The way you're running it now, it is calculating a >>>> >> separate >>>> >>>> > tree *for every window* in the alignment. This may not be what you >>>> >> want, >>>> >>>> > and it's certainly going to slow things down. >>>> >>>> > >>>> >>>> > If you want to calculate the tree just once for the whole >> alignment, >>>> >> type >>>> >>>> > >>>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>> >>>> > >>>> >>>> > >>>> >>>> > The "--noannotate" just means "don't waste time trying to annotate >>>> >> this >>>> >>>> > alignment, I just want the tree". The "-log 5" will just print some >>>> >> log >>>> >>>> > messages on stderr, so you can watch the progress of the >>>> >> tree-building >>>> >>>> > algorithms. >>>> >>>> > >>>> >>>> > hth, >>>> >>>> > >>>> >>>> > I. >>>> >>>> > >>>> >>>> >>>> >>>> >>>> > |
From: Ian H. <ih...@be...> - 2008-05-13 21:03:45
|
Yes -- your "&>" shell redirection is redirecting all xrate's error & logging output to your "out" file. That is what the second xrate run (called from windowlicker) is complaining about. If you use ">" instead of "&>" then you will not mix stderr and stdout, and what you're doing should work (fingers crossed) Ian marcin joachimiak wrote: > hey there, > > So, I got the tree output after the cvs update, see attached. > > xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg > --noannotate -log 5 &> out > > I was only about to capture the xrate tree output using &> so there > are likely extraneous lines there breaking things? > > I ran windowlicker as follows: > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e > $DARTDIR/grammars/jukescantor.eg &> windowlicker.out > > And got this, seemingly complaining that xrate failed: > > Ignoring line: Checking post-iteration log-likelihood > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Tree after optimization: > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > xrate failed (check command-line options? they were: > /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) > > I also tried w/o the -maxrounds argument and got: > > Ignoring line: Checking post-iteration log-likelihood > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Tree after optimization: > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > [stderr] ERROR: Bad node specifier: > '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) > > cheers, > marcin > > > On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak > <mar...@gm...> wrote: >> Hey hey, >> >> Indeed, the alignment size limitation makes sense and was one of the >> things I was suspecting. >> >> The job was run on a machine w.o queue management but I will >> investigate whether it could have been killed by other means. >> >> The 0 iteration option sounds like it will do the trick -- so I just >> need to checkout the DART code from cvs again, right? Thanks for >> adding this in and I definitely understand the time constraints etc., >> this should suffice for my purposes. I could of course try to obtain a >> tree in other ways, but actually I'm trying to turn this into a >> pipeline that can run on multi-genome alignments, its almost there. >> >> cheers, >> marcin >> >> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>> Hi Marcin... >>> >>> So, I never ran xrate on an alignment this big before (6 million >>> columns!) and it turned out there was a drastic inefficiency in the >>> distance matrix code, that was causing it to scale as the square of the >>> number of columns. Unnoticeable for most alignments (since the code >>> inside the loop was pretty quick), but your example caused it to hang... >>> >>> I think it was quite possible that your xrate job was getting killed >>> because it was running too long. Are you running queue management >>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>> long-running jobs. I couldn't find a specific bug that was crashing on >>> this data, although I didn't finish running it because it was taking >>> sooooo long. >>> >>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>> neighbor-joining now finishes pretty fast. However, it now takes a long >>> time to complete the second step of the tree-estimation procedure, which >>> is to refine the branch lengths of the tree using the EM algorithm. >>> >>> There is less that I can do to optimize this. The neighbor-joining code >>> is not heavily dependent on the alignment length, but the branch-length >>> refinement is. I could optimize the branch-length code in the same way >>> that the neighbor-joining is optimized, so it was less dependent on >>> alignment length, but that would take a little while, and unfortunately >>> I don't have the time right now. >>> >>> Instead, what I have done is introduced a subtle hack. If you specify >>> the command-line option "--maxrounds 0", then this will bypass the >>> branch-length refinement code (specifically, what it does is tell xrate >>> that you want the EM algorithm to run for a maximum of zero iterations, >>> or "rounds"). >>> >>> Thus, with "--maxrounds 0" the tree will be estimated using >>> neighbor-joining only. >>> >>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>> an alignment this big before (for our screens, we pre-specified the >>> tree, so we went directly to windowlicker). >>> >>> Hopefully, this will now work for you, but let me know if you have >>> further issues. >>> >>> Cheers, >>> Ian >>> >>> >>> >>> >>> marcin joachimiak wrote: >>> > hey, >>> > >>> > Thanks for the help! >>> > The short example worked just fine. >>> > >>> > log3 with my previous command so far produced this: >>> > Finding eigenvectors using MatrixExpEigenPrepare >>> > Largest eigenvalue was 0; set to 0 >>> > Updating cached M() & J() >>> > Finding eigenvectors using MatrixExpEigenPrepare >>> > Largest eigenvalue was 0; set to 0 >>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>> > Updating cached M() & J() >>> > Sequence_database: converting ASCII sequences to score profiles >>> > Processed 3 sequences >>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> > Estimating distances of sequences 0 and 1 >>> > >>> > You can download the FASTA and Stock files here: >>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>> > >>> > I'll have more updates on Monday -- enjoy the weekend, >>> > marcin >>> > - Show quoted text - >>> > >>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>> >> Thanks Marcin. >>> >> >>> >> Something has definitely gone wrong if no alignment is output. Can you >>> >> check that the following outputs an alignment+tree? >>> >> >>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>> >> ~/dart/src/ecfg/t/notree.stk >>> >> >>> >> >>> >> Looking at your log messages it seems like the program has crashed while >>> >> trying to do neighbor-joining to estimate the tree. For the > above command, I >>> >> get log messages after the "Estimating tree..." message. See > below (BTW many >>> >> of these are on stderr, not stdout) >>> >> >>> >> If you can reproduce this on the short example pairwise > alignment but not >>> >> with the 3M alignments, there's something else wrong that needs >>> >> investigation. In that case, can you first try running "-log 3" > instead of >>> >> "-log 5" (this will give more verbose log messages during the >>> >> neighbor-joining algorithm), then mail the stderr log messages > to me, and/or >>> >> please post a URL to your large file so I can download it and attempt to >>> >> reproduce the error. >>> >> >>> >> Sorry about this... >>> >> Ian >>> >> >>> >> >>> >> >>> >> >>> >> Warning -- used random number generator during initialisation >>> >> Read 1 alignments >>> >> Sequence_database: converting ASCII sequences to score profiles >>> >> Estimating tree for the following sequences: X Y >>> >> Optimizing branch lengths of all trees in alignment database, using EM >>> >> Optimizing tree branch lengths by EM. >>> >> Tree before optimization: >>> >> (X:5.0000000000,Y:5.0000000000)root; >>> >> EM iteration #1: log-likelihood = -56.0101 bits >>> >> Optimized tree after step #1: >>> >> (X:5.1160179765,Y:5.1160180140)root; >>> >> EM iteration #2: log-likelihood = -56.0118 bits >>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>> >> Failed EM improvement threshold for the 1th time; stopping >>> >> Checking post-iteration log-likelihood >>> >> Post-iteration log-likelihood = -56.0118 bits >>> >> Restoring previous best branch lengths >>> >> Tree after optimization: >>> >> (X:5.0000000000,Y:5.0000000000)root; >>> >> Alignment database log-likelihood: -56.0101 bits >>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>> >> # STOCKHOLM 1.0 >>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>> >> X AAAAAAaaUUUUUU >>> >> Y GGGGGGaaCCCCCC >>> >> // >>> >> >>> >> >>> >> >>> >> >>> >> marcin joachimiak wrote: >>> >> >>> >>> Hey there, >>> >>> >>> >>> The xrate tree job took a while but it seemed to finish without >>> >>> errors. Alas there isn't any tree output ... >>> >>> >>> >>> Here's what I get from stdout: >>> >>> >>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>> >>> Warning -- used random number generator during initialisation >>> >>> Read 1 alignments >>> >>> Sequence_database: converting ASCII sequences to score profiles >>> >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> >>> >>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>> >>> problem right? And this was run on a hefty machine ... >>> >>> >>> >>> >>> >>> marcin >>> >>> >>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>> >>> <mar...@gm...> wrote: >>> >>> >>> >>>> much better, thanks! >>> >>>> I did notice the trees for each alignment in the output ... >>> >>>> >>> >>>> The tree estimation is running, I'll keep reporting ... >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>> >>>> > marcin joachimiak wrote: >>> >>>> > > Excellent, its working now! >>> >>>> > >>> >>>> > Cool! >>> >>>> > >>> >>>> > >>> >>>> > > I'll have the tree data soon and may try a run with as well, but >>> >> best >>> >>>> > > if I could know that the other tree is different from the xrate >>> >> one >>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>> >>>> > > tree/distances from xrate? >>> >>>> > >>> >>>> > Yes, it will insert the trees into the alignment files it outputs, >>> >> using >>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>> >> email. >>> >>>> > >>> >>>> > One caveat. The way you're running it now, it is calculating a >>> >> separate >>> >>>> > tree *for every window* in the alignment. This may not be what you >>> >> want, >>> >>>> > and it's certainly going to slow things down. >>> >>>> > >>> >>>> > If you want to calculate the tree just once for the whole > alignment, >>> >> type >>> >>>> > >>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>> >>>> > >>> >>>> > >>> >>>> > The "--noannotate" just means "don't waste time trying to annotate >>> >> this >>> >>>> > alignment, I just want the tree". The "-log 5" will just print some >>> >> log >>> >>>> > messages on stderr, so you can watch the progress of the >>> >> tree-building >>> >>>> > algorithms. >>> >>>> > >>> >>>> > hth, >>> >>>> > >>> >>>> > I. >>> >>>> > >>> >>>> >>> >>>> >>> |
From: marcin j. <mar...@gm...> - 2008-04-29 17:48:58
|
Hey hey, Indeed, the alignment size limitation makes sense and was one of the things I was suspecting. The job was run on a machine w.o queue management but I will investigate whether it could have been killed by other means. The 0 iteration option sounds like it will do the trick -- so I just need to checkout the DART code from cvs again, right? Thanks for adding this in and I definitely understand the time constraints etc., this should suffice for my purposes. I could of course try to obtain a tree in other ways, but actually I'm trying to turn this into a pipeline that can run on multi-genome alignments, its almost there. cheers, marcin On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: > Hi Marcin... > > So, I never ran xrate on an alignment this big before (6 million > columns!) and it turned out there was a drastic inefficiency in the > distance matrix code, that was causing it to scale as the square of the > number of columns. Unnoticeable for most alignments (since the code > inside the loop was pretty quick), but your example caused it to hang... > > I think it was quite possible that your xrate job was getting killed > because it was running too long. Are you running queue management > software of some kind (e.g. Sun Grid Engine)? That can often kill > long-running jobs. I couldn't find a specific bug that was crashing on > this data, although I didn't finish running it because it was taking > sooooo long. > > Anyway, I've now fixed the inefficiency bug. The tree estimation by > neighbor-joining now finishes pretty fast. However, it now takes a long > time to complete the second step of the tree-estimation procedure, which > is to refine the branch lengths of the tree using the EM algorithm. > > There is less that I can do to optimize this. The neighbor-joining code > is not heavily dependent on the alignment length, but the branch-length > refinement is. I could optimize the branch-length code in the same way > that the neighbor-joining is optimized, so it was less dependent on > alignment length, but that would take a little while, and unfortunately > I don't have the time right now. > > Instead, what I have done is introduced a subtle hack. If you specify > the command-line option "--maxrounds 0", then this will bypass the > branch-length refinement code (specifically, what it does is tell xrate > that you want the EM algorithm to run for a maximum of zero iterations, > or "rounds"). > > Thus, with "--maxrounds 0" the tree will be estimated using > neighbor-joining only. > > Sorry this has taken so long to fix! Like I said, I never used xrate on > an alignment this big before (for our screens, we pre-specified the > tree, so we went directly to windowlicker). > > Hopefully, this will now work for you, but let me know if you have > further issues. > > Cheers, > Ian > > > > > marcin joachimiak wrote: > > hey, > > > > Thanks for the help! > > The short example worked just fine. > > > > log3 with my previous command so far produced this: > > Finding eigenvectors using MatrixExpEigenPrepare > > Largest eigenvalue was 0; set to 0 > > Updating cached M() & J() > > Finding eigenvectors using MatrixExpEigenPrepare > > Largest eigenvalue was 0; set to 0 > > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) > > Updating cached M() & J() > > Sequence_database: converting ASCII sequences to score profiles > > Processed 3 sequences > > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > > Estimating distances of sequences 0 and 1 > > > > You can download the FASTA and Stock files here: > > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz > > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz > > > > I'll have more updates on Monday -- enjoy the weekend, > > marcin > > - Show quoted text - > > > > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: > >> Thanks Marcin. > >> > >> Something has definitely gone wrong if no alignment is output. Can you > >> check that the following outputs an alignment+tree? > >> > >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg > >> ~/dart/src/ecfg/t/notree.stk > >> > >> > >> Looking at your log messages it seems like the program has crashed while > >> trying to do neighbor-joining to estimate the tree. For the above command, I > >> get log messages after the "Estimating tree..." message. See below (BTW many > >> of these are on stderr, not stdout) > >> > >> If you can reproduce this on the short example pairwise alignment but not > >> with the 3M alignments, there's something else wrong that needs > >> investigation. In that case, can you first try running "-log 3" instead of > >> "-log 5" (this will give more verbose log messages during the > >> neighbor-joining algorithm), then mail the stderr log messages to me, and/or > >> please post a URL to your large file so I can download it and attempt to > >> reproduce the error. > >> > >> Sorry about this... > >> Ian > >> > >> > >> > >> > >> Warning -- used random number generator during initialisation > >> Read 1 alignments > >> Sequence_database: converting ASCII sequences to score profiles > >> Estimating tree for the following sequences: X Y > >> Optimizing branch lengths of all trees in alignment database, using EM > >> Optimizing tree branch lengths by EM. > >> Tree before optimization: > >> (X:5.0000000000,Y:5.0000000000)root; > >> EM iteration #1: log-likelihood = -56.0101 bits > >> Optimized tree after step #1: > >> (X:5.1160179765,Y:5.1160180140)root; > >> EM iteration #2: log-likelihood = -56.0118 bits > >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM > >> Failed EM improvement threshold for the 1th time; stopping > >> Checking post-iteration log-likelihood > >> Post-iteration log-likelihood = -56.0118 bits > >> Restoring previous best branch lengths > >> Tree after optimization: > >> (X:5.0000000000,Y:5.0000000000)root; > >> Alignment database log-likelihood: -56.0101 bits > >> Processing alignment Alignment1 (1 of 1): 14 columns > >> # STOCKHOLM 1.0 > >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; > >> X AAAAAAaaUUUUUU > >> Y GGGGGGaaCCCCCC > >> // > >> > >> > >> > >> > >> marcin joachimiak wrote: > >> > >>> Hey there, > >>> > >>> The xrate tree job took a while but it seemed to finish without > >>> errors. Alas there isn't any tree output ... > >>> > >>> Here's what I get from stdout: > >>> > >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e > >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 > >>> Warning -- used random number generator during initialisation > >>> Read 1 alignments > >>> Sequence_database: converting ASCII sequences to score profiles > >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 > >>> > >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a > >>> problem right? And this was run on a hefty machine ... > >>> > >>> > >>> marcin > >>> > >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak > >>> <mar...@gm...> wrote: > >>> > >>>> much better, thanks! > >>>> I did notice the trees for each alignment in the output ... > >>>> > >>>> The tree estimation is running, I'll keep reporting ... > >>>> > >>>> > >>>> > >>>> > >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > >>>> > marcin joachimiak wrote: > >>>> > > Excellent, its working now! > >>>> > > >>>> > Cool! > >>>> > > >>>> > > >>>> > > I'll have the tree data soon and may try a run with as well, but > >> best > >>>> > > if I could know that the other tree is different from the xrate > >> one > >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > >>>> > > tree/distances from xrate? > >>>> > > >>>> > Yes, it will insert the trees into the alignment files it outputs, > >> using > >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous > >> email. > >>>> > > >>>> > One caveat. The way you're running it now, it is calculating a > >> separate > >>>> > tree *for every window* in the alignment. This may not be what you > >> want, > >>>> > and it's certainly going to slow things down. > >>>> > > >>>> > If you want to calculate the tree just once for the whole alignment, > >> type > >>>> > > >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > >>>> > > >>>> > > >>>> > The "--noannotate" just means "don't waste time trying to annotate > >> this > >>>> > alignment, I just want the tree". The "-log 5" will just print some > >> log > >>>> > messages on stderr, so you can watch the progress of the > >> tree-building > >>>> > algorithms. > >>>> > > >>>> > hth, > >>>> > > >>>> > I. > >>>> > > >>>> > >>>> > |
From: Ian H. <ih...@be...> - 2008-04-28 19:30:48
|
Hi Marcin... So, I never ran xrate on an alignment this big before (6 million columns!) and it turned out there was a drastic inefficiency in the distance matrix code, that was causing it to scale as the square of the number of columns. Unnoticeable for most alignments (since the code inside the loop was pretty quick), but your example caused it to hang... I think it was quite possible that your xrate job was getting killed because it was running too long. Are you running queue management software of some kind (e.g. Sun Grid Engine)? That can often kill long-running jobs. I couldn't find a specific bug that was crashing on this data, although I didn't finish running it because it was taking sooooo long. Anyway, I've now fixed the inefficiency bug. The tree estimation by neighbor-joining now finishes pretty fast. However, it now takes a long time to complete the second step of the tree-estimation procedure, which is to refine the branch lengths of the tree using the EM algorithm. There is less that I can do to optimize this. The neighbor-joining code is not heavily dependent on the alignment length, but the branch-length refinement is. I could optimize the branch-length code in the same way that the neighbor-joining is optimized, so it was less dependent on alignment length, but that would take a little while, and unfortunately I don't have the time right now. Instead, what I have done is introduced a subtle hack. If you specify the command-line option "--maxrounds 0", then this will bypass the branch-length refinement code (specifically, what it does is tell xrate that you want the EM algorithm to run for a maximum of zero iterations, or "rounds"). Thus, with "--maxrounds 0" the tree will be estimated using neighbor-joining only. Sorry this has taken so long to fix! Like I said, I never used xrate on an alignment this big before (for our screens, we pre-specified the tree, so we went directly to windowlicker). Hopefully, this will now work for you, but let me know if you have further issues. Cheers, Ian marcin joachimiak wrote: > hey, > > Thanks for the help! > The short example worked just fine. > > log3 with my previous command so far produced this: > Finding eigenvectors using MatrixExpEigenPrepare > Largest eigenvalue was 0; set to 0 > Updating cached M() & J() > Finding eigenvectors using MatrixExpEigenPrepare > Largest eigenvalue was 0; set to 0 > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) > Updating cached M() & J() > Sequence_database: converting ASCII sequences to score profiles > Processed 3 sequences > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > Estimating distances of sequences 0 and 1 > > You can download the FASTA and Stock files here: > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz > > I'll have more updates on Monday -- enjoy the weekend, > marcin > - Show quoted text - > > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >> Thanks Marcin. >> >> Something has definitely gone wrong if no alignment is output. Can you >> check that the following outputs an alignment+tree? >> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >> ~/dart/src/ecfg/t/notree.stk >> >> >> Looking at your log messages it seems like the program has crashed while >> trying to do neighbor-joining to estimate the tree. For the above command, I >> get log messages after the "Estimating tree..." message. See below (BTW many >> of these are on stderr, not stdout) >> >> If you can reproduce this on the short example pairwise alignment but not >> with the 3M alignments, there's something else wrong that needs >> investigation. In that case, can you first try running "-log 3" instead of >> "-log 5" (this will give more verbose log messages during the >> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >> please post a URL to your large file so I can download it and attempt to >> reproduce the error. >> >> Sorry about this... >> Ian >> >> >> >> >> Warning -- used random number generator during initialisation >> Read 1 alignments >> Sequence_database: converting ASCII sequences to score profiles >> Estimating tree for the following sequences: X Y >> Optimizing branch lengths of all trees in alignment database, using EM >> Optimizing tree branch lengths by EM. >> Tree before optimization: >> (X:5.0000000000,Y:5.0000000000)root; >> EM iteration #1: log-likelihood = -56.0101 bits >> Optimized tree after step #1: >> (X:5.1160179765,Y:5.1160180140)root; >> EM iteration #2: log-likelihood = -56.0118 bits >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >> Failed EM improvement threshold for the 1th time; stopping >> Checking post-iteration log-likelihood >> Post-iteration log-likelihood = -56.0118 bits >> Restoring previous best branch lengths >> Tree after optimization: >> (X:5.0000000000,Y:5.0000000000)root; >> Alignment database log-likelihood: -56.0101 bits >> Processing alignment Alignment1 (1 of 1): 14 columns >> # STOCKHOLM 1.0 >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >> X AAAAAAaaUUUUUU >> Y GGGGGGaaCCCCCC >> // >> >> >> >> >> marcin joachimiak wrote: >> >>> Hey there, >>> >>> The xrate tree job took a while but it seemed to finish without >>> errors. Alas there isn't any tree output ... >>> >>> Here's what I get from stdout: >>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>> Warning -- used random number generator during initialisation >>> Read 1 alignments >>> Sequence_database: converting ASCII sequences to score profiles >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>> problem right? And this was run on a hefty machine ... >>> >>> >>> marcin >>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>> <mar...@gm...> wrote: >>> >>>> much better, thanks! >>>> I did notice the trees for each alignment in the output ... >>>> >>>> The tree estimation is running, I'll keep reporting ... >>>> >>>> >>>> >>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>> > marcin joachimiak wrote: >>>> > > Excellent, its working now! >>>> > >>>> > Cool! >>>> > >>>> > >>>> > > I'll have the tree data soon and may try a run with as well, but >> best >>>> > > if I could know that the other tree is different from the xrate >> one >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>> > > tree/distances from xrate? >>>> > >>>> > Yes, it will insert the trees into the alignment files it outputs, >> using >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >> email. >>>> > >>>> > One caveat. The way you're running it now, it is calculating a >> separate >>>> > tree *for every window* in the alignment. This may not be what you >> want, >>>> > and it's certainly going to slow things down. >>>> > >>>> > If you want to calculate the tree just once for the whole alignment, >> type >>>> > >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>> > >>>> > >>>> > The "--noannotate" just means "don't waste time trying to annotate >> this >>>> > alignment, I just want the tree". The "-log 5" will just print some >> log >>>> > messages on stderr, so you can watch the progress of the >> tree-building >>>> > algorithms. >>>> > >>>> > hth, >>>> > >>>> > I. >>>> > >>>> >>>> |
From: marcin j. <mar...@gm...> - 2008-04-27 01:36:45
|
hey, Thanks for the help! The short example worked just fine. log3 with my previous command so far produced this: Finding eigenvectors using MatrixExpEigenPrepare Largest eigenvalue was 0; set to 0 Updating cached M() & J() Finding eigenvectors using MatrixExpEigenPrepare Largest eigenvalue was 0; set to 0 Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) Updating cached M() & J() Sequence_database: converting ASCII sequences to score profiles Processed 3 sequences Estimating tree for the following sequences: CP000112 CP000527 AE017286 Estimating distances of sequences 0 and 1 You can download the FASTA and Stock files here: http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz I'll have more updates on Monday -- enjoy the weekend, marcin - Show quoted text - On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: > Thanks Marcin. > > Something has definitely gone wrong if no alignment is output. Can you > check that the following outputs an alignment+tree? > > xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg > ~/dart/src/ecfg/t/notree.stk > > > Looking at your log messages it seems like the program has crashed while > trying to do neighbor-joining to estimate the tree. For the above command, I > get log messages after the "Estimating tree..." message. See below (BTW many > of these are on stderr, not stdout) > > If you can reproduce this on the short example pairwise alignment but not > with the 3M alignments, there's something else wrong that needs > investigation. In that case, can you first try running "-log 3" instead of > "-log 5" (this will give more verbose log messages during the > neighbor-joining algorithm), then mail the stderr log messages to me, and/or > please post a URL to your large file so I can download it and attempt to > reproduce the error. > > Sorry about this... > Ian > > > > > Warning -- used random number generator during initialisation > Read 1 alignments > Sequence_database: converting ASCII sequences to score profiles > Estimating tree for the following sequences: X Y > Optimizing branch lengths of all trees in alignment database, using EM > Optimizing tree branch lengths by EM. > Tree before optimization: > (X:5.0000000000,Y:5.0000000000)root; > EM iteration #1: log-likelihood = -56.0101 bits > Optimized tree after step #1: > (X:5.1160179765,Y:5.1160180140)root; > EM iteration #2: log-likelihood = -56.0118 bits > Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM > Failed EM improvement threshold for the 1th time; stopping > Checking post-iteration log-likelihood > Post-iteration log-likelihood = -56.0118 bits > Restoring previous best branch lengths > Tree after optimization: > (X:5.0000000000,Y:5.0000000000)root; > Alignment database log-likelihood: -56.0101 bits > Processing alignment Alignment1 (1 of 1): 14 columns > # STOCKHOLM 1.0 > #=GF NH (X:5.0000000000,Y:5.0000000000)root; > X AAAAAAaaUUUUUU > Y GGGGGGaaCCCCCC > // > > > > > marcin joachimiak wrote: > > > Hey there, > > > > The xrate tree job took a while but it seemed to finish without > > errors. Alas there isn't any tree output ... > > > > Here's what I get from stdout: > > > > Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e > > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 > > Warning -- used random number generator during initialisation > > Read 1 alignments > > Sequence_database: converting ASCII sequences to score profiles > > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > > > > The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a > > problem right? And this was run on a hefty machine ... > > > > > > marcin > > > > On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak > > <mar...@gm...> wrote: > > > > > much better, thanks! > > > I did notice the trees for each alignment in the output ... > > > > > > The tree estimation is running, I'll keep reporting ... > > > > > > > > > > > > > > > On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > > > > marcin joachimiak wrote: > > > > > Excellent, its working now! > > > > > > > > Cool! > > > > > > > > > > > > > I'll have the tree data soon and may try a run with as well, but > best > > > > > if I could know that the other tree is different from the xrate > one > > > > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > > > > > tree/distances from xrate? > > > > > > > > Yes, it will insert the trees into the alignment files it outputs, > using > > > > the New-Hampshire-encoded-in-Stockholm format from my previous > email. > > > > > > > > One caveat. The way you're running it now, it is calculating a > separate > > > > tree *for every window* in the alignment. This may not be what you > want, > > > > and it's certainly going to slow things down. > > > > > > > > If you want to calculate the tree just once for the whole alignment, > type > > > > > > > > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > > > > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > > > > > > > > > > > > The "--noannotate" just means "don't waste time trying to annotate > this > > > > alignment, I just want the tree". The "-log 5" will just print some > log > > > > messages on stderr, so you can watch the progress of the > tree-building > > > > algorithms. > > > > > > > > hth, > > > > > > > > I. > > > > > > > > > > > > > |
From: Ian H. <ih...@be...> - 2008-04-26 20:25:27
|
Thanks Marcin. Something has definitely gone wrong if no alignment is output. Can you check that the following outputs an alignment+tree? xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg ~/dart/src/ecfg/t/notree.stk Looking at your log messages it seems like the program has crashed while trying to do neighbor-joining to estimate the tree. For the above command, I get log messages after the "Estimating tree..." message. See below (BTW many of these are on stderr, not stdout) If you can reproduce this on the short example pairwise alignment but not with the 3M alignments, there's something else wrong that needs investigation. In that case, can you first try running "-log 3" instead of "-log 5" (this will give more verbose log messages during the neighbor-joining algorithm), then mail the stderr log messages to me, and/or please post a URL to your large file so I can download it and attempt to reproduce the error. Sorry about this... Ian Warning -- used random number generator during initialisation Read 1 alignments Sequence_database: converting ASCII sequences to score profiles Estimating tree for the following sequences: X Y Optimizing branch lengths of all trees in alignment database, using EM Optimizing tree branch lengths by EM. Tree before optimization: (X:5.0000000000,Y:5.0000000000)root; EM iteration #1: log-likelihood = -56.0101 bits Optimized tree after step #1: (X:5.1160179765,Y:5.1160180140)root; EM iteration #2: log-likelihood = -56.0118 bits Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM Failed EM improvement threshold for the 1th time; stopping Checking post-iteration log-likelihood Post-iteration log-likelihood = -56.0118 bits Restoring previous best branch lengths Tree after optimization: (X:5.0000000000,Y:5.0000000000)root; Alignment database log-likelihood: -56.0101 bits Processing alignment Alignment1 (1 of 1): 14 columns # STOCKHOLM 1.0 #=GF NH (X:5.0000000000,Y:5.0000000000)root; X AAAAAAaaUUUUUU Y GGGGGGaaCCCCCC // marcin joachimiak wrote: > Hey there, > > The xrate tree job took a while but it seemed to finish without > errors. Alas there isn't any tree output ... > > Here's what I get from stdout: > > Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 > Warning -- used random number generator during initialisation > Read 1 alignments > Sequence_database: converting ASCII sequences to score profiles > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > > The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a > problem right? And this was run on a hefty machine ... > > > marcin > > On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak > <mar...@gm...> wrote: >> much better, thanks! >> I did notice the trees for each alignment in the output ... >> >> The tree estimation is running, I'll keep reporting ... >> >> >> >> >> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >> > marcin joachimiak wrote: >> > > Excellent, its working now! >> > >> > Cool! >> > >> > >> > > I'll have the tree data soon and may try a run with as well, but best >> > > if I could know that the other tree is different from the xrate one >> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >> > > tree/distances from xrate? >> > >> > Yes, it will insert the trees into the alignment files it outputs, using >> > the New-Hampshire-encoded-in-Stockholm format from my previous email. >> > >> > One caveat. The way you're running it now, it is calculating a separate >> > tree *for every window* in the alignment. This may not be what you want, >> > and it's certainly going to slow things down. >> > >> > If you want to calculate the tree just once for the whole alignment, type >> > >> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >> > >> > >> > The "--noannotate" just means "don't waste time trying to annotate this >> > alignment, I just want the tree". The "-log 5" will just print some log >> > messages on stderr, so you can watch the progress of the tree-building >> > algorithms. >> > >> > hth, >> > >> > I. >> > >> |
From: marcin j. <mar...@gm...> - 2008-04-26 18:11:07
|
Hey there, The xrate tree job took a while but it seemed to finish without errors. Alas there isn't any tree output ... Here's what I get from stdout: Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 Warning -- used random number generator during initialisation Read 1 alignments Sequence_database: converting ASCII sequences to score profiles Estimating tree for the following sequences: CP000112 CP000527 AE017286 The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a problem right? And this was run on a hefty machine ... marcin On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak <mar...@gm...> wrote: > much better, thanks! > I did notice the trees for each alignment in the output ... > > The tree estimation is running, I'll keep reporting ... > > > > > On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > > marcin joachimiak wrote: > > > Excellent, its working now! > > > > Cool! > > > > > > > I'll have the tree data soon and may try a run with as well, but best > > > if I could know that the other tree is different from the xrate one > > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > > > tree/distances from xrate? > > > > Yes, it will insert the trees into the alignment files it outputs, using > > the New-Hampshire-encoded-in-Stockholm format from my previous email. > > > > One caveat. The way you're running it now, it is calculating a separate > > tree *for every window* in the alignment. This may not be what you want, > > and it's certainly going to slow things down. > > > > If you want to calculate the tree just once for the whole alignment, type > > > > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > > > > > > The "--noannotate" just means "don't waste time trying to annotate this > > alignment, I just want the tree". The "-log 5" will just print some log > > messages on stderr, so you can watch the progress of the tree-building > > algorithms. > > > > hth, > > > > I. > > > |
From: marcin j. <mar...@gm...> - 2008-04-25 02:36:42
|
much better, thanks! I did notice the trees for each alignment in the output ... The tree estimation is running, I'll keep reporting ... On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > marcin joachimiak wrote: > > Excellent, its working now! > > Cool! > > > > I'll have the tree data soon and may try a run with as well, but best > > if I could know that the other tree is different from the xrate one > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > > tree/distances from xrate? > > Yes, it will insert the trees into the alignment files it outputs, using > the New-Hampshire-encoded-in-Stockholm format from my previous email. > > One caveat. The way you're running it now, it is calculating a separate > tree *for every window* in the alignment. This may not be what you want, > and it's certainly going to slow things down. > > If you want to calculate the tree just once for the whole alignment, type > > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > > > The "--noannotate" just means "don't waste time trying to annotate this > alignment, I just want the tree". The "-log 5" will just print some log > messages on stderr, so you can watch the progress of the tree-building > algorithms. > > hth, > > I. > |
From: Ian H. <ih...@be...> - 2008-04-25 02:23:31
|
marcin joachimiak wrote: > Excellent, its working now! Cool! > I'll have the tree data soon and may try a run with as well, but best > if I could know that the other tree is different from the xrate one > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > tree/distances from xrate? Yes, it will insert the trees into the alignment files it outputs, using the New-Hampshire-encoded-in-Stockholm format from my previous email. One caveat. The way you're running it now, it is calculating a separate tree *for every window* in the alignment. This may not be what you want, and it's certainly going to slow things down. If you want to calculate the tree just once for the whole alignment, type xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock The "--noannotate" just means "don't waste time trying to annotate this alignment, I just want the tree". The "-log 5" will just print some log messages on stderr, so you can watch the progress of the tree-building algorithms. hth, I. |
From: marcin j. <mar...@gm...> - 2008-04-25 02:02:19
|
Excellent, its working now! I'll have the tree data soon and may try a run with as well, but best if I could know that the other tree is different from the xrate one with the Jukes-Cantor model. Can I get the Jukes-Cantor model tree/distances from xrate? marcin On Thu, Apr 24, 2008 at 6:51 PM, Ian Holmes <ih...@be...> wrote: > marcin joachimiak wrote: > > Thanks! > > Initially the only environment changes I made was adding the perl > > module dir to PERLLIB. > > > > I've now created the DARTDIR variable and there is some progress but > > now I get a different error message: > > > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > > DvH_DP4_G20_mVISTA.stock > > [stderr] ERROR: Every input alignment needs to be annotated with a > > phylogenetic tree. > > [stderr] Given that this isn't the case, I need a rate matrix in order > > to do some ad-hoc phylogeny > > [stderr] (neighbor-joining followed by EM on the branch lengths, if > > you must know.) > > [stderr] I want to get that rate matrix from the tree estimation grammar file. > > [stderr] However, I haven't been told about any such file, so I'm bailing. > > xrate returned with nonzero exit status (invoked using: > > /usr2/people/gtl/bin/dart//bin/xrate ) > > Do you already have a phylogenetic tree for these sequences? If so, and > you can write the tree in New Hampshire format, then you can put it > directly into your Stockholm alignment file, like so: > > # STOCKHOLM 1.0 > #=GF NH ((seq1:0.01,seq2:0.03):0.04,seq3:0.05); > seq1 AAGGGCCC > seq2 AGGGGCCT > seq3 GAGGGCCT > // > > > If not, you need to specify a grammar file for tree estimation. This is > not the same as the grammar file for alignment annotation (confusing, I > know). Currently tree estimation can only be done with simpler grammars > than the RNA gene prediction model we are using. > > We typically use the Jukes-Cantor model for tree estimation. This is > encoded as a grammar file in "$DARTDIR/grammars/jukescantor.eg". To > specify the tree-estimation grammar, you use the "-e" option (the "-g" > option is for the alignment annotation grammar). > > So, what you want is probably something like this: > > > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock -- -e $DARTDIR/grammars/jukescantor.eg > > > hope this helps! if not, send along the alignment file and I'll take a look. > > Ian > > > > > > > > > > I've also tried specifying the grammer file for xrate but apparently > > unsuccessfully: > > > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > > DvH_DP4_G20_mVISTA.stock -- -g ncRna_v22.eg > > [stderr] ERROR: Every input alignment needs to be annotated with a > > phylogenetic tree. > > [stderr] Given that this isn't the case, I need a rate matrix in order > > to do some ad-hoc phylogeny > > [stderr] (neighbor-joining followed by EM on the branch lengths, if > > you must know.) > > [stderr] I want to get that rate matrix from the tree estimation grammar file. > > [stderr] However, I haven't been told about any such file, so I'm bailing. > > xrate returned with nonzero exit status (invoked using: > > /usr2/people/gtl/bin/dart//bin/xrate -g ncRna_v22.eg) > > > > > > marcin > > > > > > > > On Thu, Apr 24, 2008 at 5:07 PM, Ian Holmes <ih...@be...> wrote: > >> Hi Marcin, > >> > >> first, sorry I didn't reply to your earlier email to me, been a bit busy > >> here. There is actually a dart-help mailing list... > >> > >> http://biowiki.org/DartMailingLists > >> > >> ...but it'll pretty much end up in the same inboxes if you email > >> holmes-lab instead ;) > >> > >> > >> second, the error you're getting means that xrate couldn't be run. If > >> you just type "xrate -h" from your command prompt, do you get a help > >> message? Or do you have to type in the entire path to xrate, e.g. > >> "/nfs/src/dart/bin/xrate -h" or whatever? > >> > >> If the latter, then it's likely that windowlicker just can't find the > >> xrate binary. You can test this by typing "windowlicker.pl -h" and > >> looking at the "-x" option, whose default value is the path that > >> windowlicker is trying to use. > >> > >> Best fix for this is to set the DARTDIR environment variable to the root > >> of your DART installation (/nfs/src/dart in the above example). You can > >> also use windowlicker's -x option to tell it the path to the xrate > >> executable. > >> > >> > >> third, if the above doesn't fix your problem, can you email me the > >> actual file you're having problems with (if it's huge, then gzip or bzip > >> it first) so I can try to reproduce the issue. > >> > >> > >> Hope this helps! > >> Ian > >> > >> > >> > >> PS I am still trying to get around to reading your guide tree > >> manuscript. It will happen though! > >> > >> > >> > >> > >> marcin joachimiak wrote: > >> > hello, > >> > > >> > I have the DART package built and running at the syntax help level. > >> > Formatting the FASTA file to Stockholm was successful. > >> > I've tried running windowlicker on the stocholm data and got: > >> > > >> > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > >> > DvH_DP4_G20_mVISTA.stock > >> > xrate failed (check command-line options? they were: xrate ) > >> > > >> > So I assume there are more required arguments? And how would I supply > >> > the reference grammars? > >> > > >> > thanks, > >> > marcin > >> > |
From: Ian H. <ih...@be...> - 2008-04-25 01:51:48
|
marcin joachimiak wrote: > Thanks! > Initially the only environment changes I made was adding the perl > module dir to PERLLIB. > > I've now created the DARTDIR variable and there is some progress but > now I get a different error message: > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock > [stderr] ERROR: Every input alignment needs to be annotated with a > phylogenetic tree. > [stderr] Given that this isn't the case, I need a rate matrix in order > to do some ad-hoc phylogeny > [stderr] (neighbor-joining followed by EM on the branch lengths, if > you must know.) > [stderr] I want to get that rate matrix from the tree estimation grammar file. > [stderr] However, I haven't been told about any such file, so I'm bailing. > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate ) Do you already have a phylogenetic tree for these sequences? If so, and you can write the tree in New Hampshire format, then you can put it directly into your Stockholm alignment file, like so: # STOCKHOLM 1.0 #=GF NH ((seq1:0.01,seq2:0.03):0.04,seq3:0.05); seq1 AAGGGCCC seq2 AGGGGCCT seq3 GAGGGCCT // If not, you need to specify a grammar file for tree estimation. This is not the same as the grammar file for alignment annotation (confusing, I know). Currently tree estimation can only be done with simpler grammars than the RNA gene prediction model we are using. We typically use the Jukes-Cantor model for tree estimation. This is encoded as a grammar file in "$DARTDIR/grammars/jukescantor.eg". To specify the tree-estimation grammar, you use the "-e" option (the "-g" option is for the alignment annotation grammar). So, what you want is probably something like this: perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA.stock -- -e $DARTDIR/grammars/jukescantor.eg hope this helps! if not, send along the alignment file and I'll take a look. Ian > > I've also tried specifying the grammer file for xrate but apparently > unsuccessfully: > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock -- -g ncRna_v22.eg > [stderr] ERROR: Every input alignment needs to be annotated with a > phylogenetic tree. > [stderr] Given that this isn't the case, I need a rate matrix in order > to do some ad-hoc phylogeny > [stderr] (neighbor-joining followed by EM on the branch lengths, if > you must know.) > [stderr] I want to get that rate matrix from the tree estimation grammar file. > [stderr] However, I haven't been told about any such file, so I'm bailing. > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate -g ncRna_v22.eg) > > > marcin > > > > On Thu, Apr 24, 2008 at 5:07 PM, Ian Holmes <ih...@be...> wrote: >> Hi Marcin, >> >> first, sorry I didn't reply to your earlier email to me, been a bit busy >> here. There is actually a dart-help mailing list... >> >> http://biowiki.org/DartMailingLists >> >> ...but it'll pretty much end up in the same inboxes if you email >> holmes-lab instead ;) >> >> >> second, the error you're getting means that xrate couldn't be run. If >> you just type "xrate -h" from your command prompt, do you get a help >> message? Or do you have to type in the entire path to xrate, e.g. >> "/nfs/src/dart/bin/xrate -h" or whatever? >> >> If the latter, then it's likely that windowlicker just can't find the >> xrate binary. You can test this by typing "windowlicker.pl -h" and >> looking at the "-x" option, whose default value is the path that >> windowlicker is trying to use. >> >> Best fix for this is to set the DARTDIR environment variable to the root >> of your DART installation (/nfs/src/dart in the above example). You can >> also use windowlicker's -x option to tell it the path to the xrate >> executable. >> >> >> third, if the above doesn't fix your problem, can you email me the >> actual file you're having problems with (if it's huge, then gzip or bzip >> it first) so I can try to reproduce the issue. >> >> >> Hope this helps! >> Ian >> >> >> >> PS I am still trying to get around to reading your guide tree >> manuscript. It will happen though! >> >> >> >> >> marcin joachimiak wrote: >> > hello, >> > >> > I have the DART package built and running at the syntax help level. >> > Formatting the FASTA file to Stockholm was successful. >> > I've tried running windowlicker on the stocholm data and got: >> > >> > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> > DvH_DP4_G20_mVISTA.stock >> > xrate failed (check command-line options? they were: xrate ) >> > >> > So I assume there are more required arguments? And how would I supply >> > the reference grammars? >> > >> > thanks, >> > marcin >> |
From: marcin j. <mar...@gm...> - 2008-04-25 00:43:59
|
Thanks! Initially the only environment changes I made was adding the perl module dir to PERLLIB. I've now created the DARTDIR variable and there is some progress but now I get a different error message: -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA.stock [stderr] ERROR: Every input alignment needs to be annotated with a phylogenetic tree. [stderr] Given that this isn't the case, I need a rate matrix in order to do some ad-hoc phylogeny [stderr] (neighbor-joining followed by EM on the branch lengths, if you must know.) [stderr] I want to get that rate matrix from the tree estimation grammar file. [stderr] However, I haven't been told about any such file, so I'm bailing. xrate returned with nonzero exit status (invoked using: /usr2/people/gtl/bin/dart//bin/xrate ) I've also tried specifying the grammer file for xrate but apparently unsuccessfully: -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA.stock -- -g ncRna_v22.eg [stderr] ERROR: Every input alignment needs to be annotated with a phylogenetic tree. [stderr] Given that this isn't the case, I need a rate matrix in order to do some ad-hoc phylogeny [stderr] (neighbor-joining followed by EM on the branch lengths, if you must know.) [stderr] I want to get that rate matrix from the tree estimation grammar file. [stderr] However, I haven't been told about any such file, so I'm bailing. xrate returned with nonzero exit status (invoked using: /usr2/people/gtl/bin/dart//bin/xrate -g ncRna_v22.eg) marcin On Thu, Apr 24, 2008 at 5:07 PM, Ian Holmes <ih...@be...> wrote: > Hi Marcin, > > first, sorry I didn't reply to your earlier email to me, been a bit busy > here. There is actually a dart-help mailing list... > > http://biowiki.org/DartMailingLists > > ...but it'll pretty much end up in the same inboxes if you email > holmes-lab instead ;) > > > second, the error you're getting means that xrate couldn't be run. If > you just type "xrate -h" from your command prompt, do you get a help > message? Or do you have to type in the entire path to xrate, e.g. > "/nfs/src/dart/bin/xrate -h" or whatever? > > If the latter, then it's likely that windowlicker just can't find the > xrate binary. You can test this by typing "windowlicker.pl -h" and > looking at the "-x" option, whose default value is the path that > windowlicker is trying to use. > > Best fix for this is to set the DARTDIR environment variable to the root > of your DART installation (/nfs/src/dart in the above example). You can > also use windowlicker's -x option to tell it the path to the xrate > executable. > > > third, if the above doesn't fix your problem, can you email me the > actual file you're having problems with (if it's huge, then gzip or bzip > it first) so I can try to reproduce the issue. > > > Hope this helps! > Ian > > > > PS I am still trying to get around to reading your guide tree > manuscript. It will happen though! > > > > > marcin joachimiak wrote: > > hello, > > > > I have the DART package built and running at the syntax help level. > > Formatting the FASTA file to Stockholm was successful. > > I've tried running windowlicker on the stocholm data and got: > > > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > > DvH_DP4_G20_mVISTA.stock > > xrate failed (check command-line options? they were: xrate ) > > > > So I assume there are more required arguments? And how would I supply > > the reference grammars? > > > > thanks, > > marcin > |
From: Ian H. <ih...@be...> - 2008-04-25 00:07:55
|
Hi Marcin, first, sorry I didn't reply to your earlier email to me, been a bit busy here. There is actually a dart-help mailing list... http://biowiki.org/DartMailingLists ...but it'll pretty much end up in the same inboxes if you email holmes-lab instead ;) second, the error you're getting means that xrate couldn't be run. If you just type "xrate -h" from your command prompt, do you get a help message? Or do you have to type in the entire path to xrate, e.g. "/nfs/src/dart/bin/xrate -h" or whatever? If the latter, then it's likely that windowlicker just can't find the xrate binary. You can test this by typing "windowlicker.pl -h" and looking at the "-x" option, whose default value is the path that windowlicker is trying to use. Best fix for this is to set the DARTDIR environment variable to the root of your DART installation (/nfs/src/dart in the above example). You can also use windowlicker's -x option to tell it the path to the xrate executable. third, if the above doesn't fix your problem, can you email me the actual file you're having problems with (if it's huge, then gzip or bzip it first) so I can try to reproduce the issue. Hope this helps! Ian PS I am still trying to get around to reading your guide tree manuscript. It will happen though! marcin joachimiak wrote: > hello, > > I have the DART package built and running at the syntax help level. > Formatting the FASTA file to Stockholm was successful. > I've tried running windowlicker on the stocholm data and got: > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock > xrate failed (check command-line options? they were: xrate ) > > So I assume there are more required arguments? And how would I supply > the reference grammars? > > thanks, > marcin |
From: argriffi <arg...@nc...> - 2008-04-16 16:50:46
|
Hi, I would like to use XRate to estimate a reversible model with a constraint similar to the RIND constraint. The RIND constraint is R_{ij} \propto pi_j, and I would like to estimate a codon model with a constraint that R_{ij} \propto sqrt(pi_aa_j / pi_aa_i). The XRate model specification language doesn't seem to allow square roots. Is this an extension that I could add, or would it be fundamentally incompatible with what XRate is doing? Alex |
From: Robert B. <rbr...@be...> - 2008-02-01 16:36:56
|
Hi Jakob, I've cc'ed the dart-help mailing list. If you like, you can subscribe to the low-volume lists dart-help (https://lists.sourceforge.net/lists/listinfo/dart-help ) and dart-announce (https://lists.sourceforge.net/lists/listinfo/dart-announce ). The fault was mine with HKY85--a bug with the (simplest!) case where there is only one Markov chain in the grammar. You can get the bugfix with a cvs up of your copy of dart. Please let me know if you're still having problems. Perhaps you've already found these pages, but http://biowiki.org/XrateSoftware gives some information about Xrate and http://biowiki.org/ XrateFormat information about the Xrate grammar format and its rapidly- expanding capabilities. I hope that you enjoy using this software! take care, Rob On Feb 1, 2008, at 6:37 AM, Jakob Skou Pedersen wrote: > Dear Robert, > > Many thanks for your detailed and prompt reply. Esteban was just > helping for a brief spell and I will probably be the one bothering > you with any questions in the future. > > I successfully ran visualizeRates.pl on several of the examples > provided. However, I could not get it to run on the single- > nucleotide matrices (e.g., HKY). However, I only tried it briefly > and will return to you if the problems persist after having tested > it more thoroughly. > > I love the program and think it is extremely neat and helpful! > > Cheers, > Jakob > > > > On Jan 31, 2008, at 6:56 PM, Robert Bradley wrote: > >> Hello, >> I can probably help you out with this. I'm a grad student in the >> Holmes lab and I wrote a lot of the visualizeRates code. You can >> see a short description at http://biowiki.org/BubblePlots. >> >> There are lots of sample grammars in dart/grammars/. You can test >> out the bubbleplots as e.g. >> /usr/local/dart/perl/visualizeRates.pl /usr/local/dart/grammars/ >> codon.eg >> (replacing '/usr/local/dart' with your dart install directory). >> >> The most common error which people have is due to not having the >> required dart perl libraries accessible. You can add a line to >> your .bashrc file to fix this: >> export PERL5LIB=/usr/local/dart/perl:$PERL5LIB >> and similarly for tcsh, etc. >> >> Does this help? If you're still having problems, please don't >> hesitate to ask. >> >> Rob >> >>> From: Esteban Czwan <cz...@bi...> >>> Date: January 31, 2008 2:20:30 AM PST >>> To: ybe...@be..., bi...@ar... >>> Cc: ih...@be..., Jakob Skou Pedersen <js...@bi...> >>> Subject: xrate and visualizeRates.pl >>> >>> >>> Dear Yuri and Mitch, >>> I'm assisting Jakob Pedersen in teaching a course in >>> "phylogenetics and comparative genomics" this quarter and he would >>> love to use xrate. The plan is to let the students use it to >>> estimate rate matrices from different genomic regions, visualize >>> them as bubble plots, and do some interpretation. Later in the >>> course he would also like to have them construct some simple phylo- >>> HMMs / phylo-grammars. >>> >>> I believe I have managed to use xrate to estimate a rate matrix >>> but I can't get the bubble plot script to work on the grammar file >>> trained. I always get the same error: >>> "Can't call method "find_all" on an undefined value at ChainDat.pm >>> line 141." >>> For that matter, I get the same error on any grammar file that I >>> use as input. Most likely, I am doing something totally wrong. Do >>> you happen to have any student exercises / examples / tutorials on >>> xrate that you can share with us? >>> I have looked at http://biowiki.org/XrateSoftware but I didn't see >>> any of those. >>> >>> Many thanks! >>> >>> Best, >>> >>> -Esteban >>> >>> cc: >>> Jakob P. >>> Ian H. >>> >>> >> > |
From: Ian H. <ih...@be...> - 2008-01-31 22:27:14
|
This is a test email, ignore. Welcome to the brave new world of dart-help. |