Thread: Re: [Dart-help] help with windowlicker
Brought to you by:
ihh
From: Ian H. <ih...@be...> - 2008-04-25 00:07:55
|
Hi Marcin, first, sorry I didn't reply to your earlier email to me, been a bit busy here. There is actually a dart-help mailing list... http://biowiki.org/DartMailingLists ...but it'll pretty much end up in the same inboxes if you email holmes-lab instead ;) second, the error you're getting means that xrate couldn't be run. If you just type "xrate -h" from your command prompt, do you get a help message? Or do you have to type in the entire path to xrate, e.g. "/nfs/src/dart/bin/xrate -h" or whatever? If the latter, then it's likely that windowlicker just can't find the xrate binary. You can test this by typing "windowlicker.pl -h" and looking at the "-x" option, whose default value is the path that windowlicker is trying to use. Best fix for this is to set the DARTDIR environment variable to the root of your DART installation (/nfs/src/dart in the above example). You can also use windowlicker's -x option to tell it the path to the xrate executable. third, if the above doesn't fix your problem, can you email me the actual file you're having problems with (if it's huge, then gzip or bzip it first) so I can try to reproduce the issue. Hope this helps! Ian PS I am still trying to get around to reading your guide tree manuscript. It will happen though! marcin joachimiak wrote: > hello, > > I have the DART package built and running at the syntax help level. > Formatting the FASTA file to Stockholm was successful. > I've tried running windowlicker on the stocholm data and got: > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock > xrate failed (check command-line options? they were: xrate ) > > So I assume there are more required arguments? And how would I supply > the reference grammars? > > thanks, > marcin |
From: marcin j. <mar...@gm...> - 2008-04-25 00:43:59
|
Thanks! Initially the only environment changes I made was adding the perl module dir to PERLLIB. I've now created the DARTDIR variable and there is some progress but now I get a different error message: -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA.stock [stderr] ERROR: Every input alignment needs to be annotated with a phylogenetic tree. [stderr] Given that this isn't the case, I need a rate matrix in order to do some ad-hoc phylogeny [stderr] (neighbor-joining followed by EM on the branch lengths, if you must know.) [stderr] I want to get that rate matrix from the tree estimation grammar file. [stderr] However, I haven't been told about any such file, so I'm bailing. xrate returned with nonzero exit status (invoked using: /usr2/people/gtl/bin/dart//bin/xrate ) I've also tried specifying the grammer file for xrate but apparently unsuccessfully: -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA.stock -- -g ncRna_v22.eg [stderr] ERROR: Every input alignment needs to be annotated with a phylogenetic tree. [stderr] Given that this isn't the case, I need a rate matrix in order to do some ad-hoc phylogeny [stderr] (neighbor-joining followed by EM on the branch lengths, if you must know.) [stderr] I want to get that rate matrix from the tree estimation grammar file. [stderr] However, I haven't been told about any such file, so I'm bailing. xrate returned with nonzero exit status (invoked using: /usr2/people/gtl/bin/dart//bin/xrate -g ncRna_v22.eg) marcin On Thu, Apr 24, 2008 at 5:07 PM, Ian Holmes <ih...@be...> wrote: > Hi Marcin, > > first, sorry I didn't reply to your earlier email to me, been a bit busy > here. There is actually a dart-help mailing list... > > http://biowiki.org/DartMailingLists > > ...but it'll pretty much end up in the same inboxes if you email > holmes-lab instead ;) > > > second, the error you're getting means that xrate couldn't be run. If > you just type "xrate -h" from your command prompt, do you get a help > message? Or do you have to type in the entire path to xrate, e.g. > "/nfs/src/dart/bin/xrate -h" or whatever? > > If the latter, then it's likely that windowlicker just can't find the > xrate binary. You can test this by typing "windowlicker.pl -h" and > looking at the "-x" option, whose default value is the path that > windowlicker is trying to use. > > Best fix for this is to set the DARTDIR environment variable to the root > of your DART installation (/nfs/src/dart in the above example). You can > also use windowlicker's -x option to tell it the path to the xrate > executable. > > > third, if the above doesn't fix your problem, can you email me the > actual file you're having problems with (if it's huge, then gzip or bzip > it first) so I can try to reproduce the issue. > > > Hope this helps! > Ian > > > > PS I am still trying to get around to reading your guide tree > manuscript. It will happen though! > > > > > marcin joachimiak wrote: > > hello, > > > > I have the DART package built and running at the syntax help level. > > Formatting the FASTA file to Stockholm was successful. > > I've tried running windowlicker on the stocholm data and got: > > > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > > DvH_DP4_G20_mVISTA.stock > > xrate failed (check command-line options? they were: xrate ) > > > > So I assume there are more required arguments? And how would I supply > > the reference grammars? > > > > thanks, > > marcin > |
From: Ian H. <ih...@be...> - 2008-04-25 01:51:48
|
marcin joachimiak wrote: > Thanks! > Initially the only environment changes I made was adding the perl > module dir to PERLLIB. > > I've now created the DARTDIR variable and there is some progress but > now I get a different error message: > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock > [stderr] ERROR: Every input alignment needs to be annotated with a > phylogenetic tree. > [stderr] Given that this isn't the case, I need a rate matrix in order > to do some ad-hoc phylogeny > [stderr] (neighbor-joining followed by EM on the branch lengths, if > you must know.) > [stderr] I want to get that rate matrix from the tree estimation grammar file. > [stderr] However, I haven't been told about any such file, so I'm bailing. > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate ) Do you already have a phylogenetic tree for these sequences? If so, and you can write the tree in New Hampshire format, then you can put it directly into your Stockholm alignment file, like so: # STOCKHOLM 1.0 #=GF NH ((seq1:0.01,seq2:0.03):0.04,seq3:0.05); seq1 AAGGGCCC seq2 AGGGGCCT seq3 GAGGGCCT // If not, you need to specify a grammar file for tree estimation. This is not the same as the grammar file for alignment annotation (confusing, I know). Currently tree estimation can only be done with simpler grammars than the RNA gene prediction model we are using. We typically use the Jukes-Cantor model for tree estimation. This is encoded as a grammar file in "$DARTDIR/grammars/jukescantor.eg". To specify the tree-estimation grammar, you use the "-e" option (the "-g" option is for the alignment annotation grammar). So, what you want is probably something like this: perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA.stock -- -e $DARTDIR/grammars/jukescantor.eg hope this helps! if not, send along the alignment file and I'll take a look. Ian > > I've also tried specifying the grammer file for xrate but apparently > unsuccessfully: > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock -- -g ncRna_v22.eg > [stderr] ERROR: Every input alignment needs to be annotated with a > phylogenetic tree. > [stderr] Given that this isn't the case, I need a rate matrix in order > to do some ad-hoc phylogeny > [stderr] (neighbor-joining followed by EM on the branch lengths, if > you must know.) > [stderr] I want to get that rate matrix from the tree estimation grammar file. > [stderr] However, I haven't been told about any such file, so I'm bailing. > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate -g ncRna_v22.eg) > > > marcin > > > > On Thu, Apr 24, 2008 at 5:07 PM, Ian Holmes <ih...@be...> wrote: >> Hi Marcin, >> >> first, sorry I didn't reply to your earlier email to me, been a bit busy >> here. There is actually a dart-help mailing list... >> >> http://biowiki.org/DartMailingLists >> >> ...but it'll pretty much end up in the same inboxes if you email >> holmes-lab instead ;) >> >> >> second, the error you're getting means that xrate couldn't be run. If >> you just type "xrate -h" from your command prompt, do you get a help >> message? Or do you have to type in the entire path to xrate, e.g. >> "/nfs/src/dart/bin/xrate -h" or whatever? >> >> If the latter, then it's likely that windowlicker just can't find the >> xrate binary. You can test this by typing "windowlicker.pl -h" and >> looking at the "-x" option, whose default value is the path that >> windowlicker is trying to use. >> >> Best fix for this is to set the DARTDIR environment variable to the root >> of your DART installation (/nfs/src/dart in the above example). You can >> also use windowlicker's -x option to tell it the path to the xrate >> executable. >> >> >> third, if the above doesn't fix your problem, can you email me the >> actual file you're having problems with (if it's huge, then gzip or bzip >> it first) so I can try to reproduce the issue. >> >> >> Hope this helps! >> Ian >> >> >> >> PS I am still trying to get around to reading your guide tree >> manuscript. It will happen though! >> >> >> >> >> marcin joachimiak wrote: >> > hello, >> > >> > I have the DART package built and running at the syntax help level. >> > Formatting the FASTA file to Stockholm was successful. >> > I've tried running windowlicker on the stocholm data and got: >> > >> > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> > DvH_DP4_G20_mVISTA.stock >> > xrate failed (check command-line options? they were: xrate ) >> > >> > So I assume there are more required arguments? And how would I supply >> > the reference grammars? >> > >> > thanks, >> > marcin >> |
From: marcin j. <mar...@gm...> - 2008-04-25 02:02:19
|
Excellent, its working now! I'll have the tree data soon and may try a run with as well, but best if I could know that the other tree is different from the xrate one with the Jukes-Cantor model. Can I get the Jukes-Cantor model tree/distances from xrate? marcin On Thu, Apr 24, 2008 at 6:51 PM, Ian Holmes <ih...@be...> wrote: > marcin joachimiak wrote: > > Thanks! > > Initially the only environment changes I made was adding the perl > > module dir to PERLLIB. > > > > I've now created the DARTDIR variable and there is some progress but > > now I get a different error message: > > > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > > DvH_DP4_G20_mVISTA.stock > > [stderr] ERROR: Every input alignment needs to be annotated with a > > phylogenetic tree. > > [stderr] Given that this isn't the case, I need a rate matrix in order > > to do some ad-hoc phylogeny > > [stderr] (neighbor-joining followed by EM on the branch lengths, if > > you must know.) > > [stderr] I want to get that rate matrix from the tree estimation grammar file. > > [stderr] However, I haven't been told about any such file, so I'm bailing. > > xrate returned with nonzero exit status (invoked using: > > /usr2/people/gtl/bin/dart//bin/xrate ) > > Do you already have a phylogenetic tree for these sequences? If so, and > you can write the tree in New Hampshire format, then you can put it > directly into your Stockholm alignment file, like so: > > # STOCKHOLM 1.0 > #=GF NH ((seq1:0.01,seq2:0.03):0.04,seq3:0.05); > seq1 AAGGGCCC > seq2 AGGGGCCT > seq3 GAGGGCCT > // > > > If not, you need to specify a grammar file for tree estimation. This is > not the same as the grammar file for alignment annotation (confusing, I > know). Currently tree estimation can only be done with simpler grammars > than the RNA gene prediction model we are using. > > We typically use the Jukes-Cantor model for tree estimation. This is > encoded as a grammar file in "$DARTDIR/grammars/jukescantor.eg". To > specify the tree-estimation grammar, you use the "-e" option (the "-g" > option is for the alignment annotation grammar). > > So, what you want is probably something like this: > > > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA.stock -- -e $DARTDIR/grammars/jukescantor.eg > > > hope this helps! if not, send along the alignment file and I'll take a look. > > Ian > > > > > > > > > > I've also tried specifying the grammer file for xrate but apparently > > unsuccessfully: > > > > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > > DvH_DP4_G20_mVISTA.stock -- -g ncRna_v22.eg > > [stderr] ERROR: Every input alignment needs to be annotated with a > > phylogenetic tree. > > [stderr] Given that this isn't the case, I need a rate matrix in order > > to do some ad-hoc phylogeny > > [stderr] (neighbor-joining followed by EM on the branch lengths, if > > you must know.) > > [stderr] I want to get that rate matrix from the tree estimation grammar file. > > [stderr] However, I haven't been told about any such file, so I'm bailing. > > xrate returned with nonzero exit status (invoked using: > > /usr2/people/gtl/bin/dart//bin/xrate -g ncRna_v22.eg) > > > > > > marcin > > > > > > > > On Thu, Apr 24, 2008 at 5:07 PM, Ian Holmes <ih...@be...> wrote: > >> Hi Marcin, > >> > >> first, sorry I didn't reply to your earlier email to me, been a bit busy > >> here. There is actually a dart-help mailing list... > >> > >> http://biowiki.org/DartMailingLists > >> > >> ...but it'll pretty much end up in the same inboxes if you email > >> holmes-lab instead ;) > >> > >> > >> second, the error you're getting means that xrate couldn't be run. If > >> you just type "xrate -h" from your command prompt, do you get a help > >> message? Or do you have to type in the entire path to xrate, e.g. > >> "/nfs/src/dart/bin/xrate -h" or whatever? > >> > >> If the latter, then it's likely that windowlicker just can't find the > >> xrate binary. You can test this by typing "windowlicker.pl -h" and > >> looking at the "-x" option, whose default value is the path that > >> windowlicker is trying to use. > >> > >> Best fix for this is to set the DARTDIR environment variable to the root > >> of your DART installation (/nfs/src/dart in the above example). You can > >> also use windowlicker's -x option to tell it the path to the xrate > >> executable. > >> > >> > >> third, if the above doesn't fix your problem, can you email me the > >> actual file you're having problems with (if it's huge, then gzip or bzip > >> it first) so I can try to reproduce the issue. > >> > >> > >> Hope this helps! > >> Ian > >> > >> > >> > >> PS I am still trying to get around to reading your guide tree > >> manuscript. It will happen though! > >> > >> > >> > >> > >> marcin joachimiak wrote: > >> > hello, > >> > > >> > I have the DART package built and running at the syntax help level. > >> > Formatting the FASTA file to Stockholm was successful. > >> > I've tried running windowlicker on the stocholm data and got: > >> > > >> > -bash-3.00$ perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > >> > DvH_DP4_G20_mVISTA.stock > >> > xrate failed (check command-line options? they were: xrate ) > >> > > >> > So I assume there are more required arguments? And how would I supply > >> > the reference grammars? > >> > > >> > thanks, > >> > marcin > >> > |
From: Ian H. <ih...@be...> - 2008-04-25 02:23:31
|
marcin joachimiak wrote: > Excellent, its working now! Cool! > I'll have the tree data soon and may try a run with as well, but best > if I could know that the other tree is different from the xrate one > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > tree/distances from xrate? Yes, it will insert the trees into the alignment files it outputs, using the New-Hampshire-encoded-in-Stockholm format from my previous email. One caveat. The way you're running it now, it is calculating a separate tree *for every window* in the alignment. This may not be what you want, and it's certainly going to slow things down. If you want to calculate the tree just once for the whole alignment, type xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock The "--noannotate" just means "don't waste time trying to annotate this alignment, I just want the tree". The "-log 5" will just print some log messages on stderr, so you can watch the progress of the tree-building algorithms. hth, I. |
From: marcin j. <mar...@gm...> - 2008-04-25 02:36:42
|
much better, thanks! I did notice the trees for each alignment in the output ... The tree estimation is running, I'll keep reporting ... On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > marcin joachimiak wrote: > > Excellent, its working now! > > Cool! > > > > I'll have the tree data soon and may try a run with as well, but best > > if I could know that the other tree is different from the xrate one > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > > tree/distances from xrate? > > Yes, it will insert the trees into the alignment files it outputs, using > the New-Hampshire-encoded-in-Stockholm format from my previous email. > > One caveat. The way you're running it now, it is calculating a separate > tree *for every window* in the alignment. This may not be what you want, > and it's certainly going to slow things down. > > If you want to calculate the tree just once for the whole alignment, type > > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > > > The "--noannotate" just means "don't waste time trying to annotate this > alignment, I just want the tree". The "-log 5" will just print some log > messages on stderr, so you can watch the progress of the tree-building > algorithms. > > hth, > > I. > |
From: marcin j. <mar...@gm...> - 2008-04-26 18:11:07
|
Hey there, The xrate tree job took a while but it seemed to finish without errors. Alas there isn't any tree output ... Here's what I get from stdout: Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 Warning -- used random number generator during initialisation Read 1 alignments Sequence_database: converting ASCII sequences to score profiles Estimating tree for the following sequences: CP000112 CP000527 AE017286 The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a problem right? And this was run on a hefty machine ... marcin On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak <mar...@gm...> wrote: > much better, thanks! > I did notice the trees for each alignment in the output ... > > The tree estimation is running, I'll keep reporting ... > > > > > On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > > marcin joachimiak wrote: > > > Excellent, its working now! > > > > Cool! > > > > > > > I'll have the tree data soon and may try a run with as well, but best > > > if I could know that the other tree is different from the xrate one > > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > > > tree/distances from xrate? > > > > Yes, it will insert the trees into the alignment files it outputs, using > > the New-Hampshire-encoded-in-Stockholm format from my previous email. > > > > One caveat. The way you're running it now, it is calculating a separate > > tree *for every window* in the alignment. This may not be what you want, > > and it's certainly going to slow things down. > > > > If you want to calculate the tree just once for the whole alignment, type > > > > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > > > > > > The "--noannotate" just means "don't waste time trying to annotate this > > alignment, I just want the tree". The "-log 5" will just print some log > > messages on stderr, so you can watch the progress of the tree-building > > algorithms. > > > > hth, > > > > I. > > > |
From: Ian H. <ih...@be...> - 2008-04-26 20:25:27
|
Thanks Marcin. Something has definitely gone wrong if no alignment is output. Can you check that the following outputs an alignment+tree? xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg ~/dart/src/ecfg/t/notree.stk Looking at your log messages it seems like the program has crashed while trying to do neighbor-joining to estimate the tree. For the above command, I get log messages after the "Estimating tree..." message. See below (BTW many of these are on stderr, not stdout) If you can reproduce this on the short example pairwise alignment but not with the 3M alignments, there's something else wrong that needs investigation. In that case, can you first try running "-log 3" instead of "-log 5" (this will give more verbose log messages during the neighbor-joining algorithm), then mail the stderr log messages to me, and/or please post a URL to your large file so I can download it and attempt to reproduce the error. Sorry about this... Ian Warning -- used random number generator during initialisation Read 1 alignments Sequence_database: converting ASCII sequences to score profiles Estimating tree for the following sequences: X Y Optimizing branch lengths of all trees in alignment database, using EM Optimizing tree branch lengths by EM. Tree before optimization: (X:5.0000000000,Y:5.0000000000)root; EM iteration #1: log-likelihood = -56.0101 bits Optimized tree after step #1: (X:5.1160179765,Y:5.1160180140)root; EM iteration #2: log-likelihood = -56.0118 bits Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM Failed EM improvement threshold for the 1th time; stopping Checking post-iteration log-likelihood Post-iteration log-likelihood = -56.0118 bits Restoring previous best branch lengths Tree after optimization: (X:5.0000000000,Y:5.0000000000)root; Alignment database log-likelihood: -56.0101 bits Processing alignment Alignment1 (1 of 1): 14 columns # STOCKHOLM 1.0 #=GF NH (X:5.0000000000,Y:5.0000000000)root; X AAAAAAaaUUUUUU Y GGGGGGaaCCCCCC // marcin joachimiak wrote: > Hey there, > > The xrate tree job took a while but it seemed to finish without > errors. Alas there isn't any tree output ... > > Here's what I get from stdout: > > Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 > Warning -- used random number generator during initialisation > Read 1 alignments > Sequence_database: converting ASCII sequences to score profiles > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > > The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a > problem right? And this was run on a hefty machine ... > > > marcin > > On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak > <mar...@gm...> wrote: >> much better, thanks! >> I did notice the trees for each alignment in the output ... >> >> The tree estimation is running, I'll keep reporting ... >> >> >> >> >> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >> > marcin joachimiak wrote: >> > > Excellent, its working now! >> > >> > Cool! >> > >> > >> > > I'll have the tree data soon and may try a run with as well, but best >> > > if I could know that the other tree is different from the xrate one >> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >> > > tree/distances from xrate? >> > >> > Yes, it will insert the trees into the alignment files it outputs, using >> > the New-Hampshire-encoded-in-Stockholm format from my previous email. >> > >> > One caveat. The way you're running it now, it is calculating a separate >> > tree *for every window* in the alignment. This may not be what you want, >> > and it's certainly going to slow things down. >> > >> > If you want to calculate the tree just once for the whole alignment, type >> > >> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >> > >> > >> > The "--noannotate" just means "don't waste time trying to annotate this >> > alignment, I just want the tree". The "-log 5" will just print some log >> > messages on stderr, so you can watch the progress of the tree-building >> > algorithms. >> > >> > hth, >> > >> > I. >> > >> |
From: marcin j. <mar...@gm...> - 2008-04-27 01:36:45
|
hey, Thanks for the help! The short example worked just fine. log3 with my previous command so far produced this: Finding eigenvectors using MatrixExpEigenPrepare Largest eigenvalue was 0; set to 0 Updating cached M() & J() Finding eigenvectors using MatrixExpEigenPrepare Largest eigenvalue was 0; set to 0 Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) Updating cached M() & J() Sequence_database: converting ASCII sequences to score profiles Processed 3 sequences Estimating tree for the following sequences: CP000112 CP000527 AE017286 Estimating distances of sequences 0 and 1 You can download the FASTA and Stock files here: http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz I'll have more updates on Monday -- enjoy the weekend, marcin - Show quoted text - On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: > Thanks Marcin. > > Something has definitely gone wrong if no alignment is output. Can you > check that the following outputs an alignment+tree? > > xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg > ~/dart/src/ecfg/t/notree.stk > > > Looking at your log messages it seems like the program has crashed while > trying to do neighbor-joining to estimate the tree. For the above command, I > get log messages after the "Estimating tree..." message. See below (BTW many > of these are on stderr, not stdout) > > If you can reproduce this on the short example pairwise alignment but not > with the 3M alignments, there's something else wrong that needs > investigation. In that case, can you first try running "-log 3" instead of > "-log 5" (this will give more verbose log messages during the > neighbor-joining algorithm), then mail the stderr log messages to me, and/or > please post a URL to your large file so I can download it and attempt to > reproduce the error. > > Sorry about this... > Ian > > > > > Warning -- used random number generator during initialisation > Read 1 alignments > Sequence_database: converting ASCII sequences to score profiles > Estimating tree for the following sequences: X Y > Optimizing branch lengths of all trees in alignment database, using EM > Optimizing tree branch lengths by EM. > Tree before optimization: > (X:5.0000000000,Y:5.0000000000)root; > EM iteration #1: log-likelihood = -56.0101 bits > Optimized tree after step #1: > (X:5.1160179765,Y:5.1160180140)root; > EM iteration #2: log-likelihood = -56.0118 bits > Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM > Failed EM improvement threshold for the 1th time; stopping > Checking post-iteration log-likelihood > Post-iteration log-likelihood = -56.0118 bits > Restoring previous best branch lengths > Tree after optimization: > (X:5.0000000000,Y:5.0000000000)root; > Alignment database log-likelihood: -56.0101 bits > Processing alignment Alignment1 (1 of 1): 14 columns > # STOCKHOLM 1.0 > #=GF NH (X:5.0000000000,Y:5.0000000000)root; > X AAAAAAaaUUUUUU > Y GGGGGGaaCCCCCC > // > > > > > marcin joachimiak wrote: > > > Hey there, > > > > The xrate tree job took a while but it seemed to finish without > > errors. Alas there isn't any tree output ... > > > > Here's what I get from stdout: > > > > Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e > > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 > > Warning -- used random number generator during initialisation > > Read 1 alignments > > Sequence_database: converting ASCII sequences to score profiles > > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > > > > The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a > > problem right? And this was run on a hefty machine ... > > > > > > marcin > > > > On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak > > <mar...@gm...> wrote: > > > > > much better, thanks! > > > I did notice the trees for each alignment in the output ... > > > > > > The tree estimation is running, I'll keep reporting ... > > > > > > > > > > > > > > > On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > > > > marcin joachimiak wrote: > > > > > Excellent, its working now! > > > > > > > > Cool! > > > > > > > > > > > > > I'll have the tree data soon and may try a run with as well, but > best > > > > > if I could know that the other tree is different from the xrate > one > > > > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > > > > > tree/distances from xrate? > > > > > > > > Yes, it will insert the trees into the alignment files it outputs, > using > > > > the New-Hampshire-encoded-in-Stockholm format from my previous > email. > > > > > > > > One caveat. The way you're running it now, it is calculating a > separate > > > > tree *for every window* in the alignment. This may not be what you > want, > > > > and it's certainly going to slow things down. > > > > > > > > If you want to calculate the tree just once for the whole alignment, > type > > > > > > > > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > > > > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > > > > > > > > > > > > The "--noannotate" just means "don't waste time trying to annotate > this > > > > alignment, I just want the tree". The "-log 5" will just print some > log > > > > messages on stderr, so you can watch the progress of the > tree-building > > > > algorithms. > > > > > > > > hth, > > > > > > > > I. > > > > > > > > > > > > > |
From: Ian H. <ih...@be...> - 2008-04-28 19:30:48
|
Hi Marcin... So, I never ran xrate on an alignment this big before (6 million columns!) and it turned out there was a drastic inefficiency in the distance matrix code, that was causing it to scale as the square of the number of columns. Unnoticeable for most alignments (since the code inside the loop was pretty quick), but your example caused it to hang... I think it was quite possible that your xrate job was getting killed because it was running too long. Are you running queue management software of some kind (e.g. Sun Grid Engine)? That can often kill long-running jobs. I couldn't find a specific bug that was crashing on this data, although I didn't finish running it because it was taking sooooo long. Anyway, I've now fixed the inefficiency bug. The tree estimation by neighbor-joining now finishes pretty fast. However, it now takes a long time to complete the second step of the tree-estimation procedure, which is to refine the branch lengths of the tree using the EM algorithm. There is less that I can do to optimize this. The neighbor-joining code is not heavily dependent on the alignment length, but the branch-length refinement is. I could optimize the branch-length code in the same way that the neighbor-joining is optimized, so it was less dependent on alignment length, but that would take a little while, and unfortunately I don't have the time right now. Instead, what I have done is introduced a subtle hack. If you specify the command-line option "--maxrounds 0", then this will bypass the branch-length refinement code (specifically, what it does is tell xrate that you want the EM algorithm to run for a maximum of zero iterations, or "rounds"). Thus, with "--maxrounds 0" the tree will be estimated using neighbor-joining only. Sorry this has taken so long to fix! Like I said, I never used xrate on an alignment this big before (for our screens, we pre-specified the tree, so we went directly to windowlicker). Hopefully, this will now work for you, but let me know if you have further issues. Cheers, Ian marcin joachimiak wrote: > hey, > > Thanks for the help! > The short example worked just fine. > > log3 with my previous command so far produced this: > Finding eigenvectors using MatrixExpEigenPrepare > Largest eigenvalue was 0; set to 0 > Updating cached M() & J() > Finding eigenvectors using MatrixExpEigenPrepare > Largest eigenvalue was 0; set to 0 > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) > Updating cached M() & J() > Sequence_database: converting ASCII sequences to score profiles > Processed 3 sequences > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > Estimating distances of sequences 0 and 1 > > You can download the FASTA and Stock files here: > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz > > I'll have more updates on Monday -- enjoy the weekend, > marcin > - Show quoted text - > > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >> Thanks Marcin. >> >> Something has definitely gone wrong if no alignment is output. Can you >> check that the following outputs an alignment+tree? >> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >> ~/dart/src/ecfg/t/notree.stk >> >> >> Looking at your log messages it seems like the program has crashed while >> trying to do neighbor-joining to estimate the tree. For the above command, I >> get log messages after the "Estimating tree..." message. See below (BTW many >> of these are on stderr, not stdout) >> >> If you can reproduce this on the short example pairwise alignment but not >> with the 3M alignments, there's something else wrong that needs >> investigation. In that case, can you first try running "-log 3" instead of >> "-log 5" (this will give more verbose log messages during the >> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >> please post a URL to your large file so I can download it and attempt to >> reproduce the error. >> >> Sorry about this... >> Ian >> >> >> >> >> Warning -- used random number generator during initialisation >> Read 1 alignments >> Sequence_database: converting ASCII sequences to score profiles >> Estimating tree for the following sequences: X Y >> Optimizing branch lengths of all trees in alignment database, using EM >> Optimizing tree branch lengths by EM. >> Tree before optimization: >> (X:5.0000000000,Y:5.0000000000)root; >> EM iteration #1: log-likelihood = -56.0101 bits >> Optimized tree after step #1: >> (X:5.1160179765,Y:5.1160180140)root; >> EM iteration #2: log-likelihood = -56.0118 bits >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >> Failed EM improvement threshold for the 1th time; stopping >> Checking post-iteration log-likelihood >> Post-iteration log-likelihood = -56.0118 bits >> Restoring previous best branch lengths >> Tree after optimization: >> (X:5.0000000000,Y:5.0000000000)root; >> Alignment database log-likelihood: -56.0101 bits >> Processing alignment Alignment1 (1 of 1): 14 columns >> # STOCKHOLM 1.0 >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >> X AAAAAAaaUUUUUU >> Y GGGGGGaaCCCCCC >> // >> >> >> >> >> marcin joachimiak wrote: >> >>> Hey there, >>> >>> The xrate tree job took a while but it seemed to finish without >>> errors. Alas there isn't any tree output ... >>> >>> Here's what I get from stdout: >>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>> Warning -- used random number generator during initialisation >>> Read 1 alignments >>> Sequence_database: converting ASCII sequences to score profiles >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>> problem right? And this was run on a hefty machine ... >>> >>> >>> marcin >>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>> <mar...@gm...> wrote: >>> >>>> much better, thanks! >>>> I did notice the trees for each alignment in the output ... >>>> >>>> The tree estimation is running, I'll keep reporting ... >>>> >>>> >>>> >>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>> > marcin joachimiak wrote: >>>> > > Excellent, its working now! >>>> > >>>> > Cool! >>>> > >>>> > >>>> > > I'll have the tree data soon and may try a run with as well, but >> best >>>> > > if I could know that the other tree is different from the xrate >> one >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>> > > tree/distances from xrate? >>>> > >>>> > Yes, it will insert the trees into the alignment files it outputs, >> using >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >> email. >>>> > >>>> > One caveat. The way you're running it now, it is calculating a >> separate >>>> > tree *for every window* in the alignment. This may not be what you >> want, >>>> > and it's certainly going to slow things down. >>>> > >>>> > If you want to calculate the tree just once for the whole alignment, >> type >>>> > >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>> > >>>> > >>>> > The "--noannotate" just means "don't waste time trying to annotate >> this >>>> > alignment, I just want the tree". The "-log 5" will just print some >> log >>>> > messages on stderr, so you can watch the progress of the >> tree-building >>>> > algorithms. >>>> > >>>> > hth, >>>> > >>>> > I. >>>> > >>>> >>>> |
From: marcin j. <mar...@gm...> - 2008-04-29 17:48:58
|
Hey hey, Indeed, the alignment size limitation makes sense and was one of the things I was suspecting. The job was run on a machine w.o queue management but I will investigate whether it could have been killed by other means. The 0 iteration option sounds like it will do the trick -- so I just need to checkout the DART code from cvs again, right? Thanks for adding this in and I definitely understand the time constraints etc., this should suffice for my purposes. I could of course try to obtain a tree in other ways, but actually I'm trying to turn this into a pipeline that can run on multi-genome alignments, its almost there. cheers, marcin On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: > Hi Marcin... > > So, I never ran xrate on an alignment this big before (6 million > columns!) and it turned out there was a drastic inefficiency in the > distance matrix code, that was causing it to scale as the square of the > number of columns. Unnoticeable for most alignments (since the code > inside the loop was pretty quick), but your example caused it to hang... > > I think it was quite possible that your xrate job was getting killed > because it was running too long. Are you running queue management > software of some kind (e.g. Sun Grid Engine)? That can often kill > long-running jobs. I couldn't find a specific bug that was crashing on > this data, although I didn't finish running it because it was taking > sooooo long. > > Anyway, I've now fixed the inefficiency bug. The tree estimation by > neighbor-joining now finishes pretty fast. However, it now takes a long > time to complete the second step of the tree-estimation procedure, which > is to refine the branch lengths of the tree using the EM algorithm. > > There is less that I can do to optimize this. The neighbor-joining code > is not heavily dependent on the alignment length, but the branch-length > refinement is. I could optimize the branch-length code in the same way > that the neighbor-joining is optimized, so it was less dependent on > alignment length, but that would take a little while, and unfortunately > I don't have the time right now. > > Instead, what I have done is introduced a subtle hack. If you specify > the command-line option "--maxrounds 0", then this will bypass the > branch-length refinement code (specifically, what it does is tell xrate > that you want the EM algorithm to run for a maximum of zero iterations, > or "rounds"). > > Thus, with "--maxrounds 0" the tree will be estimated using > neighbor-joining only. > > Sorry this has taken so long to fix! Like I said, I never used xrate on > an alignment this big before (for our screens, we pre-specified the > tree, so we went directly to windowlicker). > > Hopefully, this will now work for you, but let me know if you have > further issues. > > Cheers, > Ian > > > > > marcin joachimiak wrote: > > hey, > > > > Thanks for the help! > > The short example worked just fine. > > > > log3 with my previous command so far produced this: > > Finding eigenvectors using MatrixExpEigenPrepare > > Largest eigenvalue was 0; set to 0 > > Updating cached M() & J() > > Finding eigenvectors using MatrixExpEigenPrepare > > Largest eigenvalue was 0; set to 0 > > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) > > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) > > Updating cached M() & J() > > Sequence_database: converting ASCII sequences to score profiles > > Processed 3 sequences > > Estimating tree for the following sequences: CP000112 CP000527 AE017286 > > Estimating distances of sequences 0 and 1 > > > > You can download the FASTA and Stock files here: > > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz > > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz > > > > I'll have more updates on Monday -- enjoy the weekend, > > marcin > > - Show quoted text - > > > > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: > >> Thanks Marcin. > >> > >> Something has definitely gone wrong if no alignment is output. Can you > >> check that the following outputs an alignment+tree? > >> > >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg > >> ~/dart/src/ecfg/t/notree.stk > >> > >> > >> Looking at your log messages it seems like the program has crashed while > >> trying to do neighbor-joining to estimate the tree. For the above command, I > >> get log messages after the "Estimating tree..." message. See below (BTW many > >> of these are on stderr, not stdout) > >> > >> If you can reproduce this on the short example pairwise alignment but not > >> with the 3M alignments, there's something else wrong that needs > >> investigation. In that case, can you first try running "-log 3" instead of > >> "-log 5" (this will give more verbose log messages during the > >> neighbor-joining algorithm), then mail the stderr log messages to me, and/or > >> please post a URL to your large file so I can download it and attempt to > >> reproduce the error. > >> > >> Sorry about this... > >> Ian > >> > >> > >> > >> > >> Warning -- used random number generator during initialisation > >> Read 1 alignments > >> Sequence_database: converting ASCII sequences to score profiles > >> Estimating tree for the following sequences: X Y > >> Optimizing branch lengths of all trees in alignment database, using EM > >> Optimizing tree branch lengths by EM. > >> Tree before optimization: > >> (X:5.0000000000,Y:5.0000000000)root; > >> EM iteration #1: log-likelihood = -56.0101 bits > >> Optimized tree after step #1: > >> (X:5.1160179765,Y:5.1160180140)root; > >> EM iteration #2: log-likelihood = -56.0118 bits > >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM > >> Failed EM improvement threshold for the 1th time; stopping > >> Checking post-iteration log-likelihood > >> Post-iteration log-likelihood = -56.0118 bits > >> Restoring previous best branch lengths > >> Tree after optimization: > >> (X:5.0000000000,Y:5.0000000000)root; > >> Alignment database log-likelihood: -56.0101 bits > >> Processing alignment Alignment1 (1 of 1): 14 columns > >> # STOCKHOLM 1.0 > >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; > >> X AAAAAAaaUUUUUU > >> Y GGGGGGaaCCCCCC > >> // > >> > >> > >> > >> > >> marcin joachimiak wrote: > >> > >>> Hey there, > >>> > >>> The xrate tree job took a while but it seemed to finish without > >>> errors. Alas there isn't any tree output ... > >>> > >>> Here's what I get from stdout: > >>> > >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e > >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 > >>> Warning -- used random number generator during initialisation > >>> Read 1 alignments > >>> Sequence_database: converting ASCII sequences to score profiles > >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 > >>> > >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a > >>> problem right? And this was run on a hefty machine ... > >>> > >>> > >>> marcin > >>> > >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak > >>> <mar...@gm...> wrote: > >>> > >>>> much better, thanks! > >>>> I did notice the trees for each alignment in the output ... > >>>> > >>>> The tree estimation is running, I'll keep reporting ... > >>>> > >>>> > >>>> > >>>> > >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: > >>>> > marcin joachimiak wrote: > >>>> > > Excellent, its working now! > >>>> > > >>>> > Cool! > >>>> > > >>>> > > >>>> > > I'll have the tree data soon and may try a run with as well, but > >> best > >>>> > > if I could know that the other tree is different from the xrate > >> one > >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model > >>>> > > tree/distances from xrate? > >>>> > > >>>> > Yes, it will insert the trees into the alignment files it outputs, > >> using > >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous > >> email. > >>>> > > >>>> > One caveat. The way you're running it now, it is calculating a > >> separate > >>>> > tree *for every window* in the alignment. This may not be what you > >> want, > >>>> > and it's certainly going to slow things down. > >>>> > > >>>> > If you want to calculate the tree just once for the whole alignment, > >> type > >>>> > > >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg > >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock > >>>> > > >>>> > > >>>> > The "--noannotate" just means "don't waste time trying to annotate > >> this > >>>> > alignment, I just want the tree". The "-log 5" will just print some > >> log > >>>> > messages on stderr, so you can watch the progress of the > >> tree-building > >>>> > algorithms. > >>>> > > >>>> > hth, > >>>> > > >>>> > I. > >>>> > > >>>> > >>>> > |
From: Ian H. <ih...@be...> - 2008-05-13 21:03:45
|
Yes -- your "&>" shell redirection is redirecting all xrate's error & logging output to your "out" file. That is what the second xrate run (called from windowlicker) is complaining about. If you use ">" instead of "&>" then you will not mix stderr and stdout, and what you're doing should work (fingers crossed) Ian marcin joachimiak wrote: > hey there, > > So, I got the tree output after the cvs update, see attached. > > xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg > --noannotate -log 5 &> out > > I was only about to capture the xrate tree output using &> so there > are likely extraneous lines there breaking things? > > I ran windowlicker as follows: > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e > $DARTDIR/grammars/jukescantor.eg &> windowlicker.out > > And got this, seemingly complaining that xrate failed: > > Ignoring line: Checking post-iteration log-likelihood > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Tree after optimization: > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > xrate failed (check command-line options? they were: > /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) > > I also tried w/o the -maxrounds argument and got: > > Ignoring line: Checking post-iteration log-likelihood > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Tree after optimization: > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > [stderr] ERROR: Bad node specifier: > '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) > > cheers, > marcin > > > On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak > <mar...@gm...> wrote: >> Hey hey, >> >> Indeed, the alignment size limitation makes sense and was one of the >> things I was suspecting. >> >> The job was run on a machine w.o queue management but I will >> investigate whether it could have been killed by other means. >> >> The 0 iteration option sounds like it will do the trick -- so I just >> need to checkout the DART code from cvs again, right? Thanks for >> adding this in and I definitely understand the time constraints etc., >> this should suffice for my purposes. I could of course try to obtain a >> tree in other ways, but actually I'm trying to turn this into a >> pipeline that can run on multi-genome alignments, its almost there. >> >> cheers, >> marcin >> >> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>> Hi Marcin... >>> >>> So, I never ran xrate on an alignment this big before (6 million >>> columns!) and it turned out there was a drastic inefficiency in the >>> distance matrix code, that was causing it to scale as the square of the >>> number of columns. Unnoticeable for most alignments (since the code >>> inside the loop was pretty quick), but your example caused it to hang... >>> >>> I think it was quite possible that your xrate job was getting killed >>> because it was running too long. Are you running queue management >>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>> long-running jobs. I couldn't find a specific bug that was crashing on >>> this data, although I didn't finish running it because it was taking >>> sooooo long. >>> >>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>> neighbor-joining now finishes pretty fast. However, it now takes a long >>> time to complete the second step of the tree-estimation procedure, which >>> is to refine the branch lengths of the tree using the EM algorithm. >>> >>> There is less that I can do to optimize this. The neighbor-joining code >>> is not heavily dependent on the alignment length, but the branch-length >>> refinement is. I could optimize the branch-length code in the same way >>> that the neighbor-joining is optimized, so it was less dependent on >>> alignment length, but that would take a little while, and unfortunately >>> I don't have the time right now. >>> >>> Instead, what I have done is introduced a subtle hack. If you specify >>> the command-line option "--maxrounds 0", then this will bypass the >>> branch-length refinement code (specifically, what it does is tell xrate >>> that you want the EM algorithm to run for a maximum of zero iterations, >>> or "rounds"). >>> >>> Thus, with "--maxrounds 0" the tree will be estimated using >>> neighbor-joining only. >>> >>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>> an alignment this big before (for our screens, we pre-specified the >>> tree, so we went directly to windowlicker). >>> >>> Hopefully, this will now work for you, but let me know if you have >>> further issues. >>> >>> Cheers, >>> Ian >>> >>> >>> >>> >>> marcin joachimiak wrote: >>> > hey, >>> > >>> > Thanks for the help! >>> > The short example worked just fine. >>> > >>> > log3 with my previous command so far produced this: >>> > Finding eigenvectors using MatrixExpEigenPrepare >>> > Largest eigenvalue was 0; set to 0 >>> > Updating cached M() & J() >>> > Finding eigenvectors using MatrixExpEigenPrepare >>> > Largest eigenvalue was 0; set to 0 >>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>> > Updating cached M() & J() >>> > Sequence_database: converting ASCII sequences to score profiles >>> > Processed 3 sequences >>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> > Estimating distances of sequences 0 and 1 >>> > >>> > You can download the FASTA and Stock files here: >>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>> > >>> > I'll have more updates on Monday -- enjoy the weekend, >>> > marcin >>> > - Show quoted text - >>> > >>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>> >> Thanks Marcin. >>> >> >>> >> Something has definitely gone wrong if no alignment is output. Can you >>> >> check that the following outputs an alignment+tree? >>> >> >>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>> >> ~/dart/src/ecfg/t/notree.stk >>> >> >>> >> >>> >> Looking at your log messages it seems like the program has crashed while >>> >> trying to do neighbor-joining to estimate the tree. For the > above command, I >>> >> get log messages after the "Estimating tree..." message. See > below (BTW many >>> >> of these are on stderr, not stdout) >>> >> >>> >> If you can reproduce this on the short example pairwise > alignment but not >>> >> with the 3M alignments, there's something else wrong that needs >>> >> investigation. In that case, can you first try running "-log 3" > instead of >>> >> "-log 5" (this will give more verbose log messages during the >>> >> neighbor-joining algorithm), then mail the stderr log messages > to me, and/or >>> >> please post a URL to your large file so I can download it and attempt to >>> >> reproduce the error. >>> >> >>> >> Sorry about this... >>> >> Ian >>> >> >>> >> >>> >> >>> >> >>> >> Warning -- used random number generator during initialisation >>> >> Read 1 alignments >>> >> Sequence_database: converting ASCII sequences to score profiles >>> >> Estimating tree for the following sequences: X Y >>> >> Optimizing branch lengths of all trees in alignment database, using EM >>> >> Optimizing tree branch lengths by EM. >>> >> Tree before optimization: >>> >> (X:5.0000000000,Y:5.0000000000)root; >>> >> EM iteration #1: log-likelihood = -56.0101 bits >>> >> Optimized tree after step #1: >>> >> (X:5.1160179765,Y:5.1160180140)root; >>> >> EM iteration #2: log-likelihood = -56.0118 bits >>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>> >> Failed EM improvement threshold for the 1th time; stopping >>> >> Checking post-iteration log-likelihood >>> >> Post-iteration log-likelihood = -56.0118 bits >>> >> Restoring previous best branch lengths >>> >> Tree after optimization: >>> >> (X:5.0000000000,Y:5.0000000000)root; >>> >> Alignment database log-likelihood: -56.0101 bits >>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>> >> # STOCKHOLM 1.0 >>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>> >> X AAAAAAaaUUUUUU >>> >> Y GGGGGGaaCCCCCC >>> >> // >>> >> >>> >> >>> >> >>> >> >>> >> marcin joachimiak wrote: >>> >> >>> >>> Hey there, >>> >>> >>> >>> The xrate tree job took a while but it seemed to finish without >>> >>> errors. Alas there isn't any tree output ... >>> >>> >>> >>> Here's what I get from stdout: >>> >>> >>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>> >>> Warning -- used random number generator during initialisation >>> >>> Read 1 alignments >>> >>> Sequence_database: converting ASCII sequences to score profiles >>> >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> >>> >>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>> >>> problem right? And this was run on a hefty machine ... >>> >>> >>> >>> >>> >>> marcin >>> >>> >>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>> >>> <mar...@gm...> wrote: >>> >>> >>> >>>> much better, thanks! >>> >>>> I did notice the trees for each alignment in the output ... >>> >>>> >>> >>>> The tree estimation is running, I'll keep reporting ... >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>> >>>> > marcin joachimiak wrote: >>> >>>> > > Excellent, its working now! >>> >>>> > >>> >>>> > Cool! >>> >>>> > >>> >>>> > >>> >>>> > > I'll have the tree data soon and may try a run with as well, but >>> >> best >>> >>>> > > if I could know that the other tree is different from the xrate >>> >> one >>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>> >>>> > > tree/distances from xrate? >>> >>>> > >>> >>>> > Yes, it will insert the trees into the alignment files it outputs, >>> >> using >>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>> >> email. >>> >>>> > >>> >>>> > One caveat. The way you're running it now, it is calculating a >>> >> separate >>> >>>> > tree *for every window* in the alignment. This may not be what you >>> >> want, >>> >>>> > and it's certainly going to slow things down. >>> >>>> > >>> >>>> > If you want to calculate the tree just once for the whole > alignment, >>> >> type >>> >>>> > >>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>> >>>> > >>> >>>> > >>> >>>> > The "--noannotate" just means "don't waste time trying to annotate >>> >> this >>> >>>> > alignment, I just want the tree". The "-log 5" will just print some >>> >> log >>> >>>> > messages on stderr, so you can watch the progress of the >>> >> tree-building >>> >>>> > algorithms. >>> >>>> > >>> >>>> > hth, >>> >>>> > >>> >>>> > I. >>> >>>> > >>> >>>> >>> >>>> >>> |
From: marcin j. <mar...@gm...> - 2008-05-13 21:10:33
|
OK, its running now! Thanks -- originally I couldn't get output from standard redirect '>' but it worked the second time. Oh yes, where does the "--maxrounds 0" argument go? Is that for windowlicker or just the initial tree? m On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote: > Yes -- your "&>" shell redirection is redirecting all xrate's error & > logging output to your "out" file. That is what the second xrate run > (called from windowlicker) is complaining about. If you use ">" instead > of "&>" then you will not mix stderr and stdout, and what you're doing > should work (fingers crossed) > > Ian > > marcin joachimiak wrote: >> hey there, >> >> So, I got the tree output after the cvs update, see attached. >> >> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg >> --noannotate -log 5 &> out >> >> I was only about to capture the xrate tree output using &> so there >> are likely extraneous lines there breaking things? >> >> I ran windowlicker as follows: >> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e >> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out >> >> And got this, seemingly complaining that xrate failed: >> >> Ignoring line: Checking post-iteration log-likelihood >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Tree after optimization: >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> xrate failed (check command-line options? they were: >> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e >> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >> >> I also tried w/o the -maxrounds argument and got: >> >> Ignoring line: Checking post-iteration log-likelihood >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Tree after optimization: >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> [stderr] ERROR: Bad node specifier: >> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' >> xrate returned with nonzero exit status (invoked using: >> /usr2/people/gtl/bin/dart//bin/xrate -e >> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >> >> cheers, >> marcin >> >> >> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak >> <mar...@gm...> wrote: >>> Hey hey, >>> >>> Indeed, the alignment size limitation makes sense and was one of the >>> things I was suspecting. >>> >>> The job was run on a machine w.o queue management but I will >>> investigate whether it could have been killed by other means. >>> >>> The 0 iteration option sounds like it will do the trick -- so I just >>> need to checkout the DART code from cvs again, right? Thanks for >>> adding this in and I definitely understand the time constraints etc., >>> this should suffice for my purposes. I could of course try to obtain a >>> tree in other ways, but actually I'm trying to turn this into a >>> pipeline that can run on multi-genome alignments, its almost there. >>> >>> cheers, >>> marcin >>> >>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>>> Hi Marcin... >>>> >>>> So, I never ran xrate on an alignment this big before (6 million >>>> columns!) and it turned out there was a drastic inefficiency in the >>>> distance matrix code, that was causing it to scale as the square of the >>>> number of columns. Unnoticeable for most alignments (since the code >>>> inside the loop was pretty quick), but your example caused it to hang... >>>> >>>> I think it was quite possible that your xrate job was getting killed >>>> because it was running too long. Are you running queue management >>>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>>> long-running jobs. I couldn't find a specific bug that was crashing on >>>> this data, although I didn't finish running it because it was taking >>>> sooooo long. >>>> >>>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>>> neighbor-joining now finishes pretty fast. However, it now takes a long >>>> time to complete the second step of the tree-estimation procedure, which >>>> is to refine the branch lengths of the tree using the EM algorithm. >>>> >>>> There is less that I can do to optimize this. The neighbor-joining code >>>> is not heavily dependent on the alignment length, but the branch-length >>>> refinement is. I could optimize the branch-length code in the same way >>>> that the neighbor-joining is optimized, so it was less dependent on >>>> alignment length, but that would take a little while, and unfortunately >>>> I don't have the time right now. >>>> >>>> Instead, what I have done is introduced a subtle hack. If you specify >>>> the command-line option "--maxrounds 0", then this will bypass the >>>> branch-length refinement code (specifically, what it does is tell xrate >>>> that you want the EM algorithm to run for a maximum of zero iterations, >>>> or "rounds"). >>>> >>>> Thus, with "--maxrounds 0" the tree will be estimated using >>>> neighbor-joining only. >>>> >>>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>>> an alignment this big before (for our screens, we pre-specified the >>>> tree, so we went directly to windowlicker). >>>> >>>> Hopefully, this will now work for you, but let me know if you have >>>> further issues. >>>> >>>> Cheers, >>>> Ian >>>> >>>> >>>> >>>> >>>> marcin joachimiak wrote: >>>> > hey, >>>> > >>>> > Thanks for the help! >>>> > The short example worked just fine. >>>> > >>>> > log3 with my previous command so far produced this: >>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>> > Largest eigenvalue was 0; set to 0 >>>> > Updating cached M() & J() >>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>> > Largest eigenvalue was 0; set to 0 >>>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>> > Updating cached M() & J() >>>> > Sequence_database: converting ASCII sequences to score profiles >>>> > Processed 3 sequences >>>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> > Estimating distances of sequences 0 and 1 >>>> > >>>> > You can download the FASTA and Stock files here: >>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>> > >>>> > I'll have more updates on Monday -- enjoy the weekend, >>>> > marcin >>>> > - Show quoted text - >>>> > >>>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>> >> Thanks Marcin. >>>> >> >>>> >> Something has definitely gone wrong if no alignment is output. Can you >>>> >> check that the following outputs an alignment+tree? >>>> >> >>>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>> >> ~/dart/src/ecfg/t/notree.stk >>>> >> >>>> >> >>>> >> Looking at your log messages it seems like the program has crashed while >>>> >> trying to do neighbor-joining to estimate the tree. For the >> above command, I >>>> >> get log messages after the "Estimating tree..." message. See >> below (BTW many >>>> >> of these are on stderr, not stdout) >>>> >> >>>> >> If you can reproduce this on the short example pairwise >> alignment but not >>>> >> with the 3M alignments, there's something else wrong that needs >>>> >> investigation. In that case, can you first try running "-log 3" >> instead of >>>> >> "-log 5" (this will give more verbose log messages during the >>>> >> neighbor-joining algorithm), then mail the stderr log messages >> to me, and/or >>>> >> please post a URL to your large file so I can download it and attempt to >>>> >> reproduce the error. >>>> >> >>>> >> Sorry about this... >>>> >> Ian >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> Warning -- used random number generator during initialisation >>>> >> Read 1 alignments >>>> >> Sequence_database: converting ASCII sequences to score profiles >>>> >> Estimating tree for the following sequences: X Y >>>> >> Optimizing branch lengths of all trees in alignment database, using EM >>>> >> Optimizing tree branch lengths by EM. >>>> >> Tree before optimization: >>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>> >> EM iteration #1: log-likelihood = -56.0101 bits >>>> >> Optimized tree after step #1: >>>> >> (X:5.1160179765,Y:5.1160180140)root; >>>> >> EM iteration #2: log-likelihood = -56.0118 bits >>>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>> >> Failed EM improvement threshold for the 1th time; stopping >>>> >> Checking post-iteration log-likelihood >>>> >> Post-iteration log-likelihood = -56.0118 bits >>>> >> Restoring previous best branch lengths >>>> >> Tree after optimization: >>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>> >> Alignment database log-likelihood: -56.0101 bits >>>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>>> >> # STOCKHOLM 1.0 >>>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>> >> X AAAAAAaaUUUUUU >>>> >> Y GGGGGGaaCCCCCC >>>> >> // >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> marcin joachimiak wrote: >>>> >> >>>> >>> Hey there, >>>> >>> >>>> >>> The xrate tree job took a while but it seemed to finish without >>>> >>> errors. Alas there isn't any tree output ... >>>> >>> >>>> >>> Here's what I get from stdout: >>>> >>> >>>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>> >>> Warning -- used random number generator during initialisation >>>> >>> Read 1 alignments >>>> >>> Sequence_database: converting ASCII sequences to score profiles >>>> >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> >>> >>>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>> >>> problem right? And this was run on a hefty machine ... >>>> >>> >>>> >>> >>>> >>> marcin >>>> >>> >>>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>> >>> <mar...@gm...> wrote: >>>> >>> >>>> >>>> much better, thanks! >>>> >>>> I did notice the trees for each alignment in the output ... >>>> >>>> >>>> >>>> The tree estimation is running, I'll keep reporting ... >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>> >>>> > marcin joachimiak wrote: >>>> >>>> > > Excellent, its working now! >>>> >>>> > >>>> >>>> > Cool! >>>> >>>> > >>>> >>>> > >>>> >>>> > > I'll have the tree data soon and may try a run with as well, but >>>> >> best >>>> >>>> > > if I could know that the other tree is different from the xrate >>>> >> one >>>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>> >>>> > > tree/distances from xrate? >>>> >>>> > >>>> >>>> > Yes, it will insert the trees into the alignment files it outputs, >>>> >> using >>>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>> >> email. >>>> >>>> > >>>> >>>> > One caveat. The way you're running it now, it is calculating a >>>> >> separate >>>> >>>> > tree *for every window* in the alignment. This may not be what you >>>> >> want, >>>> >>>> > and it's certainly going to slow things down. >>>> >>>> > >>>> >>>> > If you want to calculate the tree just once for the whole >> alignment, >>>> >> type >>>> >>>> > >>>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>> >>>> > >>>> >>>> > >>>> >>>> > The "--noannotate" just means "don't waste time trying to annotate >>>> >> this >>>> >>>> > alignment, I just want the tree". The "-log 5" will just print some >>>> >> log >>>> >>>> > messages on stderr, so you can watch the progress of the >>>> >> tree-building >>>> >>>> > algorithms. >>>> >>>> > >>>> >>>> > hth, >>>> >>>> > >>>> >>>> > I. >>>> >>>> > >>>> >>>> >>>> >>>> >>>> > |
From: Ian H. <ih...@be...> - 2008-05-13 21:13:59
|
You only really need that (and the "-e $DARTDIR/grammars/jukescantor.eg" option) for the initial tree. You can pass the "-e $DARTDIR/grammars/jukescantor.eg" to windowlicker too if you like, but with the "--maxrounds 0" there it won't actually do anything, and you probably don't need it anyway. marcin joachimiak wrote: > OK, its running now! > Thanks -- originally I couldn't get output from standard redirect '>' > but it worked the second time. > > Oh yes, where does the "--maxrounds 0" argument go? Is that for > windowlicker or just the initial tree? > > m > > On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote: >> Yes -- your "&>" shell redirection is redirecting all xrate's error & >> logging output to your "out" file. That is what the second xrate run >> (called from windowlicker) is complaining about. If you use ">" instead >> of "&>" then you will not mix stderr and stdout, and what you're doing >> should work (fingers crossed) >> >> Ian >> >> marcin joachimiak wrote: >>> hey there, >>> >>> So, I got the tree output after the cvs update, see attached. >>> >>> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg >>> --noannotate -log 5 &> out >>> >>> I was only about to capture the xrate tree output using &> so there >>> are likely extraneous lines there breaking things? >>> >>> I ran windowlicker as follows: >>> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >>> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e >>> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out >>> >>> And got this, seemingly complaining that xrate failed: >>> >>> Ignoring line: Checking post-iteration log-likelihood >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Tree after optimization: >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> xrate failed (check command-line options? they were: >>> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >>> >>> I also tried w/o the -maxrounds argument and got: >>> >>> Ignoring line: Checking post-iteration log-likelihood >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Tree after optimization: >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> [stderr] ERROR: Bad node specifier: >>> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' >>> xrate returned with nonzero exit status (invoked using: >>> /usr2/people/gtl/bin/dart//bin/xrate -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >>> >>> cheers, >>> marcin >>> >>> >>> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak >>> <mar...@gm...> wrote: >>>> Hey hey, >>>> >>>> Indeed, the alignment size limitation makes sense and was one of the >>>> things I was suspecting. >>>> >>>> The job was run on a machine w.o queue management but I will >>>> investigate whether it could have been killed by other means. >>>> >>>> The 0 iteration option sounds like it will do the trick -- so I just >>>> need to checkout the DART code from cvs again, right? Thanks for >>>> adding this in and I definitely understand the time constraints etc., >>>> this should suffice for my purposes. I could of course try to obtain a >>>> tree in other ways, but actually I'm trying to turn this into a >>>> pipeline that can run on multi-genome alignments, its almost there. >>>> >>>> cheers, >>>> marcin >>>> >>>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>>>> Hi Marcin... >>>>> >>>>> So, I never ran xrate on an alignment this big before (6 million >>>>> columns!) and it turned out there was a drastic inefficiency in the >>>>> distance matrix code, that was causing it to scale as the square of the >>>>> number of columns. Unnoticeable for most alignments (since the code >>>>> inside the loop was pretty quick), but your example caused it to hang... >>>>> >>>>> I think it was quite possible that your xrate job was getting killed >>>>> because it was running too long. Are you running queue management >>>>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>>>> long-running jobs. I couldn't find a specific bug that was crashing on >>>>> this data, although I didn't finish running it because it was taking >>>>> sooooo long. >>>>> >>>>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>>>> neighbor-joining now finishes pretty fast. However, it now takes a long >>>>> time to complete the second step of the tree-estimation procedure, which >>>>> is to refine the branch lengths of the tree using the EM algorithm. >>>>> >>>>> There is less that I can do to optimize this. The neighbor-joining code >>>>> is not heavily dependent on the alignment length, but the branch-length >>>>> refinement is. I could optimize the branch-length code in the same way >>>>> that the neighbor-joining is optimized, so it was less dependent on >>>>> alignment length, but that would take a little while, and unfortunately >>>>> I don't have the time right now. >>>>> >>>>> Instead, what I have done is introduced a subtle hack. If you specify >>>>> the command-line option "--maxrounds 0", then this will bypass the >>>>> branch-length refinement code (specifically, what it does is tell xrate >>>>> that you want the EM algorithm to run for a maximum of zero iterations, >>>>> or "rounds"). >>>>> >>>>> Thus, with "--maxrounds 0" the tree will be estimated using >>>>> neighbor-joining only. >>>>> >>>>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>>>> an alignment this big before (for our screens, we pre-specified the >>>>> tree, so we went directly to windowlicker). >>>>> >>>>> Hopefully, this will now work for you, but let me know if you have >>>>> further issues. >>>>> >>>>> Cheers, >>>>> Ian >>>>> >>>>> >>>>> >>>>> >>>>> marcin joachimiak wrote: >>>>> > hey, >>>>> > >>>>> > Thanks for the help! >>>>> > The short example worked just fine. >>>>> > >>>>> > log3 with my previous command so far produced this: >>>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>>> > Largest eigenvalue was 0; set to 0 >>>>> > Updating cached M() & J() >>>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>>> > Largest eigenvalue was 0; set to 0 >>>>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>>> > Updating cached M() & J() >>>>> > Sequence_database: converting ASCII sequences to score profiles >>>>> > Processed 3 sequences >>>>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>> > Estimating distances of sequences 0 and 1 >>>>> > >>>>> > You can download the FASTA and Stock files here: >>>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>>> > >>>>> > I'll have more updates on Monday -- enjoy the weekend, >>>>> > marcin >>>>> > - Show quoted text - >>>>> > >>>>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>>> >> Thanks Marcin. >>>>> >> >>>>> >> Something has definitely gone wrong if no alignment is output. Can you >>>>> >> check that the following outputs an alignment+tree? >>>>> >> >>>>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>>> >> ~/dart/src/ecfg/t/notree.stk >>>>> >> >>>>> >> >>>>> >> Looking at your log messages it seems like the program has > crashed while >>>>> >> trying to do neighbor-joining to estimate the tree. For the >>> above command, I >>>>> >> get log messages after the "Estimating tree..." message. See >>> below (BTW many >>>>> >> of these are on stderr, not stdout) >>>>> >> >>>>> >> If you can reproduce this on the short example pairwise >>> alignment but not >>>>> >> with the 3M alignments, there's something else wrong that needs >>>>> >> investigation. In that case, can you first try running "-log 3" >>> instead of >>>>> >> "-log 5" (this will give more verbose log messages during the >>>>> >> neighbor-joining algorithm), then mail the stderr log messages >>> to me, and/or >>>>> >> please post a URL to your large file so I can download it and > attempt to >>>>> >> reproduce the error. >>>>> >> >>>>> >> Sorry about this... >>>>> >> Ian >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> Warning -- used random number generator during initialisation >>>>> >> Read 1 alignments >>>>> >> Sequence_database: converting ASCII sequences to score profiles >>>>> >> Estimating tree for the following sequences: X Y >>>>> >> Optimizing branch lengths of all trees in alignment database, using EM >>>>> >> Optimizing tree branch lengths by EM. >>>>> >> Tree before optimization: >>>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>>> >> EM iteration #1: log-likelihood = -56.0101 bits >>>>> >> Optimized tree after step #1: >>>>> >> (X:5.1160179765,Y:5.1160180140)root; >>>>> >> EM iteration #2: log-likelihood = -56.0118 bits >>>>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 > bits during EM >>>>> >> Failed EM improvement threshold for the 1th time; stopping >>>>> >> Checking post-iteration log-likelihood >>>>> >> Post-iteration log-likelihood = -56.0118 bits >>>>> >> Restoring previous best branch lengths >>>>> >> Tree after optimization: >>>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>>> >> Alignment database log-likelihood: -56.0101 bits >>>>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>>>> >> # STOCKHOLM 1.0 >>>>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>>> >> X AAAAAAaaUUUUUU >>>>> >> Y GGGGGGaaCCCCCC >>>>> >> // >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> marcin joachimiak wrote: >>>>> >> >>>>> >>> Hey there, >>>>> >>> >>>>> >>> The xrate tree job took a while but it seemed to finish without >>>>> >>> errors. Alas there isn't any tree output ... >>>>> >>> >>>>> >>> Here's what I get from stdout: >>>>> >>> >>>>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>> >>> Warning -- used random number generator during initialisation >>>>> >>> Read 1 alignments >>>>> >>> Sequence_database: converting ASCII sequences to score profiles >>>>> >>> Estimating tree for the following sequences: CP000112 > CP000527 AE017286 >>>>> >>> >>>>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>> >>> problem right? And this was run on a hefty machine ... >>>>> >>> >>>>> >>> >>>>> >>> marcin >>>>> >>> >>>>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>> >>> <mar...@gm...> wrote: >>>>> >>> >>>>> >>>> much better, thanks! >>>>> >>>> I did notice the trees for each alignment in the output ... >>>>> >>>> >>>>> >>>> The tree estimation is running, I'll keep reporting ... >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes > <ih...@be...> wrote: >>>>> >>>> > marcin joachimiak wrote: >>>>> >>>> > > Excellent, its working now! >>>>> >>>> > >>>>> >>>> > Cool! >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > > I'll have the tree data soon and may try a run with > as well, but >>>>> >> best >>>>> >>>> > > if I could know that the other tree is different from the xrate >>>>> >> one >>>>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>> >>>> > > tree/distances from xrate? >>>>> >>>> > >>>>> >>>> > Yes, it will insert the trees into the alignment files > it outputs, >>>>> >> using >>>>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>>> >> email. >>>>> >>>> > >>>>> >>>> > One caveat. The way you're running it now, it is calculating a >>>>> >> separate >>>>> >>>> > tree *for every window* in the alignment. This may not > be what you >>>>> >> want, >>>>> >>>> > and it's certainly going to slow things down. >>>>> >>>> > >>>>> >>>> > If you want to calculate the tree just once for the whole >>> alignment, >>>>> >> type >>>>> >>>> > >>>>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > The "--noannotate" just means "don't waste time trying > to annotate >>>>> >> this >>>>> >>>> > alignment, I just want the tree". The "-log 5" will > just print some >>>>> >> log >>>>> >>>> > messages on stderr, so you can watch the progress of the >>>>> >> tree-building >>>>> >>>> > algorithms. >>>>> >>>> > >>>>> >>>> > hth, >>>>> >>>> > >>>>> >>>> > I. >>>>> >>>> > >>>>> >>>> >>>>> >>>> >>>>> |
From: marcin j. <mar...@gm...> - 2008-05-19 21:09:53
|
hey Looks like I now have windowlicker output with 49500 alignment and trees like: // # STOCKHOLM 1.0 #=GF NH (CP000527:0.1287975598,(CP000112:0.0000000010,AE017286:0.0000000 010)node_3:0.1287875646)root; #=GF SC_max_rev -643.47 CP000112 ACAAAGTTTGCTATCCCTTCTTTGAGCGGAGTTCCGGGCTTGAAGCCGGTGTCGGCAATCAGGT CP000527 ACGAAGGCGGCTACGCCCTCTTCGACCGTGGTCGCGGGGCGGAAGCCCGTGTCAGCGATGAGGT AE017286 ---------------------------------------------------------------- CP000112 CGTCCACATTGGCATATGTTGCTTCCACATCGCCGGGCTGCATGGGCATCATGTTTTTAACGGC CP000527 CGTCGACGTCGGCGTAGGTCGCTGCCACGTCGCCGGGCTGCATGGGCAGCATGTTCTTGACCGC AE017286 ---------------------------------------------------------------- CP000112 CTTTTTGCCCAGGCATTCTTCCAGAGTCGCGATGAACTCGCCAAGCTCTACCGTGTTGTTGTTG CP000527 CTTCTTGCCGAGGCACTCTTCGAGTACTTCGATGAACCTGCCGAGTTCGACGGTGTTGTTGTTC AE017286 ---------------------------------------------------------------- CP000112 CCAATGTT CP000527 CCGATGTT AE017286 -------- But I'll need a little guidance as to what these are are and how to interpret them. Arre these ~50,000 ncRNA predictions? For completeness, here's the command I used to generate the output: perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg &> windowlicker.out thanks, marcin On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: > Hi Marcin... > > So, I never ran xrate on an alignment this big before (6 million > columns!) and it turned out there was a drastic inefficiency in the > distance matrix code, that was causing it to scale as the square of the > number of columns. Unnoticeable for most alignments (since the code > inside the loop was pretty quick), but your example caused it to hang... > > I think it was quite possible that your xrate job was getting killed > because it was running too long. Are you running queue management > software of some kind (e.g. Sun Grid Engine)? That can often kill > long-running jobs. I couldn't find a specific bug that was crashing on > this data, although I didn't finish running it because it was taking > sooooo long. > > Anyway, I've now fixed the inefficiency bug. The tree estimation by > neighbor-joining now finishes pretty fast. However, it now takes a long > time to complete the second step of the tree-estimation procedure, which > is to refine the branch lengths of the tree using the EM algorithm. > > There is less that I can do to optimize this. The neighbor-joining code > is not heavily dependent on the alignment length, but the branch-length > refinement is. I could optimize the branch-length code in the same way > that the neighbor-joining is optimized, so it was less dependent on > alignment length, but that would take a little while, and unfortunately > I don't have the time right now. > > Instead, what I have done is introduced a subtle hack. If you specify > the command-line option "--maxrounds 0", then this will bypass the > branch-length refinement code (specifically, what it does is tell xrate > that you want the EM algorithm to run for a maximum of zero iterations, > or "rounds"). > > Thus, with "--maxrounds 0" the tree will be estimated using > neighbor-joining only. > > Sorry this has taken so long to fix! Like I said, I never used xrate on > an alignment this big before (for our screens, we pre-specified the > tree, so we went directly to windowlicker). > > Hopefully, this will now work for you, but let me know if you have > further issues. > > Cheers, > Ian > > > marcin joachimiak wrote: >> hey, >> >> Thanks for the help! >> The short example worked just fine. >> >> log3 with my previous command so far produced this: >> Finding eigenvectors using MatrixExpEigenPrepare >> Largest eigenvalue was 0; set to 0 >> Updating cached M() & J() >> Finding eigenvectors using MatrixExpEigenPrepare >> Largest eigenvalue was 0; set to 0 >> Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >> Updating cached M() & J() >> Sequence_database: converting ASCII sequences to score profiles >> Processed 3 sequences >> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >> Estimating distances of sequences 0 and 1 >> >> You can download the FASTA and Stock files here: >> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >> >> I'll have more updates on Monday -- enjoy the weekend, >> marcin >> - Show quoted text - >> >> On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>> Thanks Marcin. >>> >>> Something has definitely gone wrong if no alignment is output. Can you >>> check that the following outputs an alignment+tree? >>> >>> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>> ~/dart/src/ecfg/t/notree.stk >>> >>> >>> Looking at your log messages it seems like the program has crashed while >>> trying to do neighbor-joining to estimate the tree. For the above command, I >>> get log messages after the "Estimating tree..." message. See below (BTW many >>> of these are on stderr, not stdout) >>> >>> If you can reproduce this on the short example pairwise alignment but not >>> with the 3M alignments, there's something else wrong that needs >>> investigation. In that case, can you first try running "-log 3" instead of >>> "-log 5" (this will give more verbose log messages during the >>> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >>> please post a URL to your large file so I can download it and attempt to >>> reproduce the error. >>> >>> Sorry about this... >>> Ian >>> >>> >>> >>> >>> Warning -- used random number generator during initialisation >>> Read 1 alignments >>> Sequence_database: converting ASCII sequences to score profiles >>> Estimating tree for the following sequences: X Y >>> Optimizing branch lengths of all trees in alignment database, using EM >>> Optimizing tree branch lengths by EM. >>> Tree before optimization: >>> (X:5.0000000000,Y:5.0000000000)root; >>> EM iteration #1: log-likelihood = -56.0101 bits >>> Optimized tree after step #1: >>> (X:5.1160179765,Y:5.1160180140)root; >>> EM iteration #2: log-likelihood = -56.0118 bits >>> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>> Failed EM improvement threshold for the 1th time; stopping >>> Checking post-iteration log-likelihood >>> Post-iteration log-likelihood = -56.0118 bits >>> Restoring previous best branch lengths >>> Tree after optimization: >>> (X:5.0000000000,Y:5.0000000000)root; >>> Alignment database log-likelihood: -56.0101 bits >>> Processing alignment Alignment1 (1 of 1): 14 columns >>> # STOCKHOLM 1.0 >>> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>> X AAAAAAaaUUUUUU >>> Y GGGGGGaaCCCCCC >>> // >>> >>> >>> >>> >>> marcin joachimiak wrote: >>> >>>> Hey there, >>>> >>>> The xrate tree job took a while but it seemed to finish without >>>> errors. Alas there isn't any tree output ... >>>> >>>> Here's what I get from stdout: >>>> >>>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>> Warning -- used random number generator during initialisation >>>> Read 1 alignments >>>> Sequence_database: converting ASCII sequences to score profiles >>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> >>>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>> problem right? And this was run on a hefty machine ... >>>> >>>> >>>> marcin >>>> >>>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>> <mar...@gm...> wrote: >>>> >>>>> much better, thanks! >>>>> I did notice the trees for each alignment in the output ... >>>>> >>>>> The tree estimation is running, I'll keep reporting ... >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>>> > marcin joachimiak wrote: >>>>> > > Excellent, its working now! >>>>> > >>>>> > Cool! >>>>> > >>>>> > >>>>> > > I'll have the tree data soon and may try a run with as well, but >>> best >>>>> > > if I could know that the other tree is different from the xrate >>> one >>>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>> > > tree/distances from xrate? >>>>> > >>>>> > Yes, it will insert the trees into the alignment files it outputs, >>> using >>>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>> email. >>>>> > >>>>> > One caveat. The way you're running it now, it is calculating a >>> separate >>>>> > tree *for every window* in the alignment. This may not be what you >>> want, >>>>> > and it's certainly going to slow things down. >>>>> > >>>>> > If you want to calculate the tree just once for the whole alignment, >>> type >>>>> > >>>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>> > >>>>> > >>>>> > The "--noannotate" just means "don't waste time trying to annotate >>> this >>>>> > alignment, I just want the tree". The "-log 5" will just print some >>> log >>>>> > messages on stderr, so you can watch the progress of the >>> tree-building >>>>> > algorithms. >>>>> > >>>>> > hth, >>>>> > >>>>> > I. >>>>> > >>>>> >>>>> > |
From: Ian H. <ih...@be...> - 2008-05-19 22:45:43
|
marcin joachimiak wrote: > hey > > Looks like I now have windowlicker output with 49500 alignment and trees like: > // > # STOCKHOLM 1.0 > #=GF NH (CP000527:0.1287975598,(CP000112:0.0000000010,AE017286:0.0000000 > 010)node_3:0.1287875646)root; > #=GF SC_max_rev -643.47 > CP000112 ACAAAGTTTGCTATCCCTTCTTTGAGCGGAGTTCCGGGCTTGAAGCCGGTGTCGGCAATCAGGT > CP000527 ACGAAGGCGGCTACGCCCTCTTCGACCGTGGTCGCGGGGCGGAAGCCCGTGTCAGCGATGAGGT > AE017286 ---------------------------------------------------------------- > > CP000112 CGTCCACATTGGCATATGTTGCTTCCACATCGCCGGGCTGCATGGGCATCATGTTTTTAACGGC > CP000527 CGTCGACGTCGGCGTAGGTCGCTGCCACGTCGCCGGGCTGCATGGGCAGCATGTTCTTGACCGC > AE017286 ---------------------------------------------------------------- > > CP000112 CTTTTTGCCCAGGCATTCTTCCAGAGTCGCGATGAACTCGCCAAGCTCTACCGTGTTGTTGTTG > CP000527 CTTCTTGCCGAGGCACTCTTCGAGTACTTCGATGAACCTGCCGAGTTCGACGGTGTTGTTGTTC > AE017286 ---------------------------------------------------------------- > > CP000112 CCAATGTT > CP000527 CCGATGTT > AE017286 -------- > > > But I'll need a little guidance as to what these are are and how to > interpret them. Arre these ~50,000 ncRNA predictions? > > > For completeness, here's the command I used to generate the output: > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg > &> windowlicker.out Oops. It does not look like you have actually specified a genefinding grammar. The score "SC_max_rev" that's reported indicates that windowlicker was using xrate's default "rev" grammar which is not a genefinder... this is consistent with the command line above. You need to download a genefinding grammar e.g. from here http://biowiki.org/results/ncRNA_predictions/12fly/ncRna_v22.eg ...then change your command line to include this grammar... perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg -g ncRna_v22.eg &> windowlicker.out I am sorry that with all the discussion of how to get the trees right, I/we somehow lost sight of this fairly important point... I think we did mention it initially but while trying to figure out options for the tree estimation, it must have got dropped at some point... I. > > thanks, > marcin > > On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >> Hi Marcin... >> >> So, I never ran xrate on an alignment this big before (6 million >> columns!) and it turned out there was a drastic inefficiency in the >> distance matrix code, that was causing it to scale as the square of the >> number of columns. Unnoticeable for most alignments (since the code >> inside the loop was pretty quick), but your example caused it to hang... >> >> I think it was quite possible that your xrate job was getting killed >> because it was running too long. Are you running queue management >> software of some kind (e.g. Sun Grid Engine)? That can often kill >> long-running jobs. I couldn't find a specific bug that was crashing on >> this data, although I didn't finish running it because it was taking >> sooooo long. >> >> Anyway, I've now fixed the inefficiency bug. The tree estimation by >> neighbor-joining now finishes pretty fast. However, it now takes a long >> time to complete the second step of the tree-estimation procedure, which >> is to refine the branch lengths of the tree using the EM algorithm. >> >> There is less that I can do to optimize this. The neighbor-joining code >> is not heavily dependent on the alignment length, but the branch-length >> refinement is. I could optimize the branch-length code in the same way >> that the neighbor-joining is optimized, so it was less dependent on >> alignment length, but that would take a little while, and unfortunately >> I don't have the time right now. >> >> Instead, what I have done is introduced a subtle hack. If you specify >> the command-line option "--maxrounds 0", then this will bypass the >> branch-length refinement code (specifically, what it does is tell xrate >> that you want the EM algorithm to run for a maximum of zero iterations, >> or "rounds"). >> >> Thus, with "--maxrounds 0" the tree will be estimated using >> neighbor-joining only. >> >> Sorry this has taken so long to fix! Like I said, I never used xrate on >> an alignment this big before (for our screens, we pre-specified the >> tree, so we went directly to windowlicker). >> >> Hopefully, this will now work for you, but let me know if you have >> further issues. >> >> Cheers, >> Ian >> >> >> marcin joachimiak wrote: >>> hey, >>> >>> Thanks for the help! >>> The short example worked just fine. >>> >>> log3 with my previous command so far produced this: >>> Finding eigenvectors using MatrixExpEigenPrepare >>> Largest eigenvalue was 0; set to 0 >>> Updating cached M() & J() >>> Finding eigenvectors using MatrixExpEigenPrepare >>> Largest eigenvalue was 0; set to 0 >>> Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>> Updating cached M() & J() >>> Sequence_database: converting ASCII sequences to score profiles >>> Processed 3 sequences >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> Estimating distances of sequences 0 and 1 >>> >>> You can download the FASTA and Stock files here: >>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>> >>> I'll have more updates on Monday -- enjoy the weekend, >>> marcin >>> - Show quoted text - >>> >>> On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>> Thanks Marcin. >>>> >>>> Something has definitely gone wrong if no alignment is output. Can you >>>> check that the following outputs an alignment+tree? >>>> >>>> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>> ~/dart/src/ecfg/t/notree.stk >>>> >>>> >>>> Looking at your log messages it seems like the program has crashed while >>>> trying to do neighbor-joining to estimate the tree. For the above command, I >>>> get log messages after the "Estimating tree..." message. See below (BTW many >>>> of these are on stderr, not stdout) >>>> >>>> If you can reproduce this on the short example pairwise alignment but not >>>> with the 3M alignments, there's something else wrong that needs >>>> investigation. In that case, can you first try running "-log 3" instead of >>>> "-log 5" (this will give more verbose log messages during the >>>> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >>>> please post a URL to your large file so I can download it and attempt to >>>> reproduce the error. >>>> >>>> Sorry about this... >>>> Ian >>>> >>>> >>>> >>>> >>>> Warning -- used random number generator during initialisation >>>> Read 1 alignments >>>> Sequence_database: converting ASCII sequences to score profiles >>>> Estimating tree for the following sequences: X Y >>>> Optimizing branch lengths of all trees in alignment database, using EM >>>> Optimizing tree branch lengths by EM. >>>> Tree before optimization: >>>> (X:5.0000000000,Y:5.0000000000)root; >>>> EM iteration #1: log-likelihood = -56.0101 bits >>>> Optimized tree after step #1: >>>> (X:5.1160179765,Y:5.1160180140)root; >>>> EM iteration #2: log-likelihood = -56.0118 bits >>>> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>> Failed EM improvement threshold for the 1th time; stopping >>>> Checking post-iteration log-likelihood >>>> Post-iteration log-likelihood = -56.0118 bits >>>> Restoring previous best branch lengths >>>> Tree after optimization: >>>> (X:5.0000000000,Y:5.0000000000)root; >>>> Alignment database log-likelihood: -56.0101 bits >>>> Processing alignment Alignment1 (1 of 1): 14 columns >>>> # STOCKHOLM 1.0 >>>> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>> X AAAAAAaaUUUUUU >>>> Y GGGGGGaaCCCCCC >>>> // >>>> >>>> >>>> >>>> >>>> marcin joachimiak wrote: >>>> >>>>> Hey there, >>>>> >>>>> The xrate tree job took a while but it seemed to finish without >>>>> errors. Alas there isn't any tree output ... >>>>> >>>>> Here's what I get from stdout: >>>>> >>>>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>> Warning -- used random number generator during initialisation >>>>> Read 1 alignments >>>>> Sequence_database: converting ASCII sequences to score profiles >>>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>> >>>>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>> problem right? And this was run on a hefty machine ... >>>>> >>>>> >>>>> marcin >>>>> >>>>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>> <mar...@gm...> wrote: >>>>> >>>>>> much better, thanks! >>>>>> I did notice the trees for each alignment in the output ... >>>>>> >>>>>> The tree estimation is running, I'll keep reporting ... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>>>> > marcin joachimiak wrote: >>>>>> > > Excellent, its working now! >>>>>> > >>>>>> > Cool! >>>>>> > >>>>>> > >>>>>> > > I'll have the tree data soon and may try a run with as well, but >>>> best >>>>>> > > if I could know that the other tree is different from the xrate >>>> one >>>>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>>> > > tree/distances from xrate? >>>>>> > >>>>>> > Yes, it will insert the trees into the alignment files it outputs, >>>> using >>>>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>> email. >>>>>> > >>>>>> > One caveat. The way you're running it now, it is calculating a >>>> separate >>>>>> > tree *for every window* in the alignment. This may not be what you >>>> want, >>>>>> > and it's certainly going to slow things down. >>>>>> > >>>>>> > If you want to calculate the tree just once for the whole alignment, >>>> type >>>>>> > >>>>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>>> > >>>>>> > >>>>>> > The "--noannotate" just means "don't waste time trying to annotate >>>> this >>>>>> > alignment, I just want the tree". The "-log 5" will just print some >>>> log >>>>>> > messages on stderr, so you can watch the progress of the >>>> tree-building >>>>>> > algorithms. >>>>>> > >>>>>> > hth, >>>>>> > >>>>>> > I. >>>>>> > >>>>>> >>>>>> |
From: marcin j. <mar...@gm...> - 2008-05-19 22:55:53
|
No problem, thanks again for all the hand-holding. I've started up a new job with the gene-finding grammar which will take at least a few hours based on the last (default grammar) run. So what is the default 'rev' grammar useful for? Just so I have a better context, what can I expect to get out of this given that the genomes are from fairly uncharacterized and potentially unusual sulfate-reducing anaerobic bacteria? Also, the three genomes have one close pair and two distant pairs. thanks. marcin On Mon, May 19, 2008 at 3:44 PM, Ian Holmes <ih...@be...> wrote: > marcin joachimiak wrote: >> hey >> >> Looks like I now have windowlicker output with 49500 alignment and trees like: >> // >> # STOCKHOLM 1.0 >> #=GF NH (CP000527:0.1287975598,(CP000112:0.0000000010,AE017286:0.0000000 >> 010)node_3:0.1287875646)root; >> #=GF SC_max_rev -643.47 >> CP000112 ACAAAGTTTGCTATCCCTTCTTTGAGCGGAGTTCCGGGCTTGAAGCCGGTGTCGGCAATCAGGT >> CP000527 ACGAAGGCGGCTACGCCCTCTTCGACCGTGGTCGCGGGGCGGAAGCCCGTGTCAGCGATGAGGT >> AE017286 ---------------------------------------------------------------- >> >> CP000112 CGTCCACATTGGCATATGTTGCTTCCACATCGCCGGGCTGCATGGGCATCATGTTTTTAACGGC >> CP000527 CGTCGACGTCGGCGTAGGTCGCTGCCACGTCGCCGGGCTGCATGGGCAGCATGTTCTTGACCGC >> AE017286 ---------------------------------------------------------------- >> >> CP000112 CTTTTTGCCCAGGCATTCTTCCAGAGTCGCGATGAACTCGCCAAGCTCTACCGTGTTGTTGTTG >> CP000527 CTTCTTGCCGAGGCACTCTTCGAGTACTTCGATGAACCTGCCGAGTTCGACGGTGTTGTTGTTC >> AE017286 ---------------------------------------------------------------- >> >> CP000112 CCAATGTT >> CP000527 CCGATGTT >> AE017286 -------- >> >> >> But I'll need a little guidance as to what these are are and how to >> interpret them. Arre these ~50,000 ncRNA predictions? >> >> >> For completeness, here's the command I used to generate the output: >> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg >> &> windowlicker.out > > > Oops. It does not look like you have actually specified a genefinding > grammar. The score "SC_max_rev" that's reported indicates that > windowlicker was using xrate's default "rev" grammar which is not a > genefinder... this is consistent with the command line above. > > You need to download a genefinding grammar e.g. from here > > http://biowiki.org/results/ncRNA_predictions/12fly/ncRna_v22.eg > > ...then change your command line to include this grammar... > > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -e $DARTDIR/grammars/jukescantor.eg > -g ncRna_v22.eg &> windowlicker.out > > > I am sorry that with all the discussion of how to get the trees right, > I/we somehow lost sight of this fairly important point... I think we did > mention it initially but while trying to figure out options for the tree > estimation, it must have got dropped at some point... > > I. > > > > > >> >> thanks, >> marcin >> >> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>> Hi Marcin... >>> >>> So, I never ran xrate on an alignment this big before (6 million >>> columns!) and it turned out there was a drastic inefficiency in the >>> distance matrix code, that was causing it to scale as the square of the >>> number of columns. Unnoticeable for most alignments (since the code >>> inside the loop was pretty quick), but your example caused it to hang... >>> >>> I think it was quite possible that your xrate job was getting killed >>> because it was running too long. Are you running queue management >>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>> long-running jobs. I couldn't find a specific bug that was crashing on >>> this data, although I didn't finish running it because it was taking >>> sooooo long. >>> >>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>> neighbor-joining now finishes pretty fast. However, it now takes a long >>> time to complete the second step of the tree-estimation procedure, which >>> is to refine the branch lengths of the tree using the EM algorithm. >>> >>> There is less that I can do to optimize this. The neighbor-joining code >>> is not heavily dependent on the alignment length, but the branch-length >>> refinement is. I could optimize the branch-length code in the same way >>> that the neighbor-joining is optimized, so it was less dependent on >>> alignment length, but that would take a little while, and unfortunately >>> I don't have the time right now. >>> >>> Instead, what I have done is introduced a subtle hack. If you specify >>> the command-line option "--maxrounds 0", then this will bypass the >>> branch-length refinement code (specifically, what it does is tell xrate >>> that you want the EM algorithm to run for a maximum of zero iterations, >>> or "rounds"). >>> >>> Thus, with "--maxrounds 0" the tree will be estimated using >>> neighbor-joining only. >>> >>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>> an alignment this big before (for our screens, we pre-specified the >>> tree, so we went directly to windowlicker). >>> >>> Hopefully, this will now work for you, but let me know if you have >>> further issues. >>> >>> Cheers, >>> Ian >>> >>> >>> marcin joachimiak wrote: >>>> hey, >>>> >>>> Thanks for the help! >>>> The short example worked just fine. >>>> >>>> log3 with my previous command so far produced this: >>>> Finding eigenvectors using MatrixExpEigenPrepare >>>> Largest eigenvalue was 0; set to 0 >>>> Updating cached M() & J() >>>> Finding eigenvectors using MatrixExpEigenPrepare >>>> Largest eigenvalue was 0; set to 0 >>>> Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>> Updating cached M() & J() >>>> Sequence_database: converting ASCII sequences to score profiles >>>> Processed 3 sequences >>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> Estimating distances of sequences 0 and 1 >>>> >>>> You can download the FASTA and Stock files here: >>>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>> http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>> >>>> I'll have more updates on Monday -- enjoy the weekend, >>>> marcin >>>> - Show quoted text - >>>> >>>> On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>>> Thanks Marcin. >>>>> >>>>> Something has definitely gone wrong if no alignment is output. Can you >>>>> check that the following outputs an alignment+tree? >>>>> >>>>> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>>> ~/dart/src/ecfg/t/notree.stk >>>>> >>>>> >>>>> Looking at your log messages it seems like the program has crashed while >>>>> trying to do neighbor-joining to estimate the tree. For the above command, I >>>>> get log messages after the "Estimating tree..." message. See below (BTW many >>>>> of these are on stderr, not stdout) >>>>> >>>>> If you can reproduce this on the short example pairwise alignment but not >>>>> with the 3M alignments, there's something else wrong that needs >>>>> investigation. In that case, can you first try running "-log 3" instead of >>>>> "-log 5" (this will give more verbose log messages during the >>>>> neighbor-joining algorithm), then mail the stderr log messages to me, and/or >>>>> please post a URL to your large file so I can download it and attempt to >>>>> reproduce the error. >>>>> >>>>> Sorry about this... >>>>> Ian >>>>> >>>>> >>>>> >>>>> >>>>> Warning -- used random number generator during initialisation >>>>> Read 1 alignments >>>>> Sequence_database: converting ASCII sequences to score profiles >>>>> Estimating tree for the following sequences: X Y >>>>> Optimizing branch lengths of all trees in alignment database, using EM >>>>> Optimizing tree branch lengths by EM. >>>>> Tree before optimization: >>>>> (X:5.0000000000,Y:5.0000000000)root; >>>>> EM iteration #1: log-likelihood = -56.0101 bits >>>>> Optimized tree after step #1: >>>>> (X:5.1160179765,Y:5.1160180140)root; >>>>> EM iteration #2: log-likelihood = -56.0118 bits >>>>> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>>> Failed EM improvement threshold for the 1th time; stopping >>>>> Checking post-iteration log-likelihood >>>>> Post-iteration log-likelihood = -56.0118 bits >>>>> Restoring previous best branch lengths >>>>> Tree after optimization: >>>>> (X:5.0000000000,Y:5.0000000000)root; >>>>> Alignment database log-likelihood: -56.0101 bits >>>>> Processing alignment Alignment1 (1 of 1): 14 columns >>>>> # STOCKHOLM 1.0 >>>>> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>>> X AAAAAAaaUUUUUU >>>>> Y GGGGGGaaCCCCCC >>>>> // >>>>> >>>>> >>>>> >>>>> >>>>> marcin joachimiak wrote: >>>>> >>>>>> Hey there, >>>>>> >>>>>> The xrate tree job took a while but it seemed to finish without >>>>>> errors. Alas there isn't any tree output ... >>>>>> >>>>>> Here's what I get from stdout: >>>>>> >>>>>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>>> Warning -- used random number generator during initialisation >>>>>> Read 1 alignments >>>>>> Sequence_database: converting ASCII sequences to score profiles >>>>>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>>> >>>>>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>>> problem right? And this was run on a hefty machine ... >>>>>> >>>>>> >>>>>> marcin >>>>>> >>>>>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>>> <mar...@gm...> wrote: >>>>>> >>>>>>> much better, thanks! >>>>>>> I did notice the trees for each alignment in the output ... >>>>>>> >>>>>>> The tree estimation is running, I'll keep reporting ... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>>>>> > marcin joachimiak wrote: >>>>>>> > > Excellent, its working now! >>>>>>> > >>>>>>> > Cool! >>>>>>> > >>>>>>> > >>>>>>> > > I'll have the tree data soon and may try a run with as well, but >>>>> best >>>>>>> > > if I could know that the other tree is different from the xrate >>>>> one >>>>>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>>>> > > tree/distances from xrate? >>>>>>> > >>>>>>> > Yes, it will insert the trees into the alignment files it outputs, >>>>> using >>>>>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>>> email. >>>>>>> > >>>>>>> > One caveat. The way you're running it now, it is calculating a >>>>> separate >>>>>>> > tree *for every window* in the alignment. This may not be what you >>>>> want, >>>>>>> > and it's certainly going to slow things down. >>>>>>> > >>>>>>> > If you want to calculate the tree just once for the whole alignment, >>>>> type >>>>>>> > >>>>>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>>>> > >>>>>>> > >>>>>>> > The "--noannotate" just means "don't waste time trying to annotate >>>>> this >>>>>>> > alignment, I just want the tree". The "-log 5" will just print some >>>>> log >>>>>>> > messages on stderr, so you can watch the progress of the >>>>> tree-building >>>>>>> > algorithms. >>>>>>> > >>>>>>> > hth, >>>>>>> > >>>>>>> > I. >>>>>>> > >>>>>>> >>>>>>> > |