Re: [Dart-help] help with windowlicker
Brought to you by:
ihh
From: marcin j. <mar...@gm...> - 2008-05-13 21:10:33
|
OK, its running now! Thanks -- originally I couldn't get output from standard redirect '>' but it worked the second time. Oh yes, where does the "--maxrounds 0" argument go? Is that for windowlicker or just the initial tree? m On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote: > Yes -- your "&>" shell redirection is redirecting all xrate's error & > logging output to your "out" file. That is what the second xrate run > (called from windowlicker) is complaining about. If you use ">" instead > of "&>" then you will not mix stderr and stdout, and what you're doing > should work (fingers crossed) > > Ian > > marcin joachimiak wrote: >> hey there, >> >> So, I got the tree output after the cvs update, see attached. >> >> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg >> --noannotate -log 5 &> out >> >> I was only about to capture the xrate tree output using &> so there >> are likely extraneous lines there breaking things? >> >> I ran windowlicker as follows: >> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e >> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out >> >> And got this, seemingly complaining that xrate failed: >> >> Ignoring line: Checking post-iteration log-likelihood >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Tree after optimization: >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> xrate failed (check command-line options? they were: >> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e >> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >> >> I also tried w/o the -maxrounds argument and got: >> >> Ignoring line: Checking post-iteration log-likelihood >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Tree after optimization: >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >> [stderr] ERROR: Bad node specifier: >> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' >> xrate returned with nonzero exit status (invoked using: >> /usr2/people/gtl/bin/dart//bin/xrate -e >> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >> >> cheers, >> marcin >> >> >> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak >> <mar...@gm...> wrote: >>> Hey hey, >>> >>> Indeed, the alignment size limitation makes sense and was one of the >>> things I was suspecting. >>> >>> The job was run on a machine w.o queue management but I will >>> investigate whether it could have been killed by other means. >>> >>> The 0 iteration option sounds like it will do the trick -- so I just >>> need to checkout the DART code from cvs again, right? Thanks for >>> adding this in and I definitely understand the time constraints etc., >>> this should suffice for my purposes. I could of course try to obtain a >>> tree in other ways, but actually I'm trying to turn this into a >>> pipeline that can run on multi-genome alignments, its almost there. >>> >>> cheers, >>> marcin >>> >>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>>> Hi Marcin... >>>> >>>> So, I never ran xrate on an alignment this big before (6 million >>>> columns!) and it turned out there was a drastic inefficiency in the >>>> distance matrix code, that was causing it to scale as the square of the >>>> number of columns. Unnoticeable for most alignments (since the code >>>> inside the loop was pretty quick), but your example caused it to hang... >>>> >>>> I think it was quite possible that your xrate job was getting killed >>>> because it was running too long. Are you running queue management >>>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>>> long-running jobs. I couldn't find a specific bug that was crashing on >>>> this data, although I didn't finish running it because it was taking >>>> sooooo long. >>>> >>>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>>> neighbor-joining now finishes pretty fast. However, it now takes a long >>>> time to complete the second step of the tree-estimation procedure, which >>>> is to refine the branch lengths of the tree using the EM algorithm. >>>> >>>> There is less that I can do to optimize this. The neighbor-joining code >>>> is not heavily dependent on the alignment length, but the branch-length >>>> refinement is. I could optimize the branch-length code in the same way >>>> that the neighbor-joining is optimized, so it was less dependent on >>>> alignment length, but that would take a little while, and unfortunately >>>> I don't have the time right now. >>>> >>>> Instead, what I have done is introduced a subtle hack. If you specify >>>> the command-line option "--maxrounds 0", then this will bypass the >>>> branch-length refinement code (specifically, what it does is tell xrate >>>> that you want the EM algorithm to run for a maximum of zero iterations, >>>> or "rounds"). >>>> >>>> Thus, with "--maxrounds 0" the tree will be estimated using >>>> neighbor-joining only. >>>> >>>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>>> an alignment this big before (for our screens, we pre-specified the >>>> tree, so we went directly to windowlicker). >>>> >>>> Hopefully, this will now work for you, but let me know if you have >>>> further issues. >>>> >>>> Cheers, >>>> Ian >>>> >>>> >>>> >>>> >>>> marcin joachimiak wrote: >>>> > hey, >>>> > >>>> > Thanks for the help! >>>> > The short example worked just fine. >>>> > >>>> > log3 with my previous command so far produced this: >>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>> > Largest eigenvalue was 0; set to 0 >>>> > Updating cached M() & J() >>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>> > Largest eigenvalue was 0; set to 0 >>>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>> > Updating cached M() & J() >>>> > Sequence_database: converting ASCII sequences to score profiles >>>> > Processed 3 sequences >>>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> > Estimating distances of sequences 0 and 1 >>>> > >>>> > You can download the FASTA and Stock files here: >>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>> > >>>> > I'll have more updates on Monday -- enjoy the weekend, >>>> > marcin >>>> > - Show quoted text - >>>> > >>>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>> >> Thanks Marcin. >>>> >> >>>> >> Something has definitely gone wrong if no alignment is output. Can you >>>> >> check that the following outputs an alignment+tree? >>>> >> >>>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>> >> ~/dart/src/ecfg/t/notree.stk >>>> >> >>>> >> >>>> >> Looking at your log messages it seems like the program has crashed while >>>> >> trying to do neighbor-joining to estimate the tree. For the >> above command, I >>>> >> get log messages after the "Estimating tree..." message. See >> below (BTW many >>>> >> of these are on stderr, not stdout) >>>> >> >>>> >> If you can reproduce this on the short example pairwise >> alignment but not >>>> >> with the 3M alignments, there's something else wrong that needs >>>> >> investigation. In that case, can you first try running "-log 3" >> instead of >>>> >> "-log 5" (this will give more verbose log messages during the >>>> >> neighbor-joining algorithm), then mail the stderr log messages >> to me, and/or >>>> >> please post a URL to your large file so I can download it and attempt to >>>> >> reproduce the error. >>>> >> >>>> >> Sorry about this... >>>> >> Ian >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> Warning -- used random number generator during initialisation >>>> >> Read 1 alignments >>>> >> Sequence_database: converting ASCII sequences to score profiles >>>> >> Estimating tree for the following sequences: X Y >>>> >> Optimizing branch lengths of all trees in alignment database, using EM >>>> >> Optimizing tree branch lengths by EM. >>>> >> Tree before optimization: >>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>> >> EM iteration #1: log-likelihood = -56.0101 bits >>>> >> Optimized tree after step #1: >>>> >> (X:5.1160179765,Y:5.1160180140)root; >>>> >> EM iteration #2: log-likelihood = -56.0118 bits >>>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>>> >> Failed EM improvement threshold for the 1th time; stopping >>>> >> Checking post-iteration log-likelihood >>>> >> Post-iteration log-likelihood = -56.0118 bits >>>> >> Restoring previous best branch lengths >>>> >> Tree after optimization: >>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>> >> Alignment database log-likelihood: -56.0101 bits >>>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>>> >> # STOCKHOLM 1.0 >>>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>> >> X AAAAAAaaUUUUUU >>>> >> Y GGGGGGaaCCCCCC >>>> >> // >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> marcin joachimiak wrote: >>>> >> >>>> >>> Hey there, >>>> >>> >>>> >>> The xrate tree job took a while but it seemed to finish without >>>> >>> errors. Alas there isn't any tree output ... >>>> >>> >>>> >>> Here's what I get from stdout: >>>> >>> >>>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>> >>> Warning -- used random number generator during initialisation >>>> >>> Read 1 alignments >>>> >>> Sequence_database: converting ASCII sequences to score profiles >>>> >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>> >>> >>>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>> >>> problem right? And this was run on a hefty machine ... >>>> >>> >>>> >>> >>>> >>> marcin >>>> >>> >>>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>> >>> <mar...@gm...> wrote: >>>> >>> >>>> >>>> much better, thanks! >>>> >>>> I did notice the trees for each alignment in the output ... >>>> >>>> >>>> >>>> The tree estimation is running, I'll keep reporting ... >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>>> >>>> > marcin joachimiak wrote: >>>> >>>> > > Excellent, its working now! >>>> >>>> > >>>> >>>> > Cool! >>>> >>>> > >>>> >>>> > >>>> >>>> > > I'll have the tree data soon and may try a run with as well, but >>>> >> best >>>> >>>> > > if I could know that the other tree is different from the xrate >>>> >> one >>>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>> >>>> > > tree/distances from xrate? >>>> >>>> > >>>> >>>> > Yes, it will insert the trees into the alignment files it outputs, >>>> >> using >>>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>> >> email. >>>> >>>> > >>>> >>>> > One caveat. The way you're running it now, it is calculating a >>>> >> separate >>>> >>>> > tree *for every window* in the alignment. This may not be what you >>>> >> want, >>>> >>>> > and it's certainly going to slow things down. >>>> >>>> > >>>> >>>> > If you want to calculate the tree just once for the whole >> alignment, >>>> >> type >>>> >>>> > >>>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>> >>>> > >>>> >>>> > >>>> >>>> > The "--noannotate" just means "don't waste time trying to annotate >>>> >> this >>>> >>>> > alignment, I just want the tree". The "-log 5" will just print some >>>> >> log >>>> >>>> > messages on stderr, so you can watch the progress of the >>>> >> tree-building >>>> >>>> > algorithms. >>>> >>>> > >>>> >>>> > hth, >>>> >>>> > >>>> >>>> > I. >>>> >>>> > >>>> >>>> >>>> >>>> >>>> > |