Re: [Dart-help] help with windowlicker
Brought to you by:
ihh
From: Ian H. <ih...@be...> - 2008-05-13 21:03:45
|
Yes -- your "&>" shell redirection is redirecting all xrate's error & logging output to your "out" file. That is what the second xrate run (called from windowlicker) is complaining about. If you use ">" instead of "&>" then you will not mix stderr and stdout, and what you're doing should work (fingers crossed) Ian marcin joachimiak wrote: > hey there, > > So, I got the tree output after the cvs update, see attached. > > xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg > --noannotate -log 5 &> out > > I was only about to capture the xrate tree output using &> so there > are likely extraneous lines there breaking things? > > I ran windowlicker as follows: > perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl > DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e > $DARTDIR/grammars/jukescantor.eg &> windowlicker.out > > And got this, seemingly complaining that xrate failed: > > Ignoring line: Checking post-iteration log-likelihood > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Tree after optimization: > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > xrate failed (check command-line options? they were: > /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) > > I also tried w/o the -maxrounds argument and got: > > Ignoring line: Checking post-iteration log-likelihood > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Tree after optimization: > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns > at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 > [stderr] ERROR: Bad node specifier: > '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' > xrate returned with nonzero exit status (invoked using: > /usr2/people/gtl/bin/dart//bin/xrate -e > /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) > > cheers, > marcin > > > On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak > <mar...@gm...> wrote: >> Hey hey, >> >> Indeed, the alignment size limitation makes sense and was one of the >> things I was suspecting. >> >> The job was run on a machine w.o queue management but I will >> investigate whether it could have been killed by other means. >> >> The 0 iteration option sounds like it will do the trick -- so I just >> need to checkout the DART code from cvs again, right? Thanks for >> adding this in and I definitely understand the time constraints etc., >> this should suffice for my purposes. I could of course try to obtain a >> tree in other ways, but actually I'm trying to turn this into a >> pipeline that can run on multi-genome alignments, its almost there. >> >> cheers, >> marcin >> >> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>> Hi Marcin... >>> >>> So, I never ran xrate on an alignment this big before (6 million >>> columns!) and it turned out there was a drastic inefficiency in the >>> distance matrix code, that was causing it to scale as the square of the >>> number of columns. Unnoticeable for most alignments (since the code >>> inside the loop was pretty quick), but your example caused it to hang... >>> >>> I think it was quite possible that your xrate job was getting killed >>> because it was running too long. Are you running queue management >>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>> long-running jobs. I couldn't find a specific bug that was crashing on >>> this data, although I didn't finish running it because it was taking >>> sooooo long. >>> >>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>> neighbor-joining now finishes pretty fast. However, it now takes a long >>> time to complete the second step of the tree-estimation procedure, which >>> is to refine the branch lengths of the tree using the EM algorithm. >>> >>> There is less that I can do to optimize this. The neighbor-joining code >>> is not heavily dependent on the alignment length, but the branch-length >>> refinement is. I could optimize the branch-length code in the same way >>> that the neighbor-joining is optimized, so it was less dependent on >>> alignment length, but that would take a little while, and unfortunately >>> I don't have the time right now. >>> >>> Instead, what I have done is introduced a subtle hack. If you specify >>> the command-line option "--maxrounds 0", then this will bypass the >>> branch-length refinement code (specifically, what it does is tell xrate >>> that you want the EM algorithm to run for a maximum of zero iterations, >>> or "rounds"). >>> >>> Thus, with "--maxrounds 0" the tree will be estimated using >>> neighbor-joining only. >>> >>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>> an alignment this big before (for our screens, we pre-specified the >>> tree, so we went directly to windowlicker). >>> >>> Hopefully, this will now work for you, but let me know if you have >>> further issues. >>> >>> Cheers, >>> Ian >>> >>> >>> >>> >>> marcin joachimiak wrote: >>> > hey, >>> > >>> > Thanks for the help! >>> > The short example worked just fine. >>> > >>> > log3 with my previous command so far produced this: >>> > Finding eigenvectors using MatrixExpEigenPrepare >>> > Largest eigenvalue was 0; set to 0 >>> > Updating cached M() & J() >>> > Finding eigenvectors using MatrixExpEigenPrepare >>> > Largest eigenvalue was 0; set to 0 >>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>> > Updating cached M() & J() >>> > Sequence_database: converting ASCII sequences to score profiles >>> > Processed 3 sequences >>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> > Estimating distances of sequences 0 and 1 >>> > >>> > You can download the FASTA and Stock files here: >>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>> > >>> > I'll have more updates on Monday -- enjoy the weekend, >>> > marcin >>> > - Show quoted text - >>> > >>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>> >> Thanks Marcin. >>> >> >>> >> Something has definitely gone wrong if no alignment is output. Can you >>> >> check that the following outputs an alignment+tree? >>> >> >>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>> >> ~/dart/src/ecfg/t/notree.stk >>> >> >>> >> >>> >> Looking at your log messages it seems like the program has crashed while >>> >> trying to do neighbor-joining to estimate the tree. For the > above command, I >>> >> get log messages after the "Estimating tree..." message. See > below (BTW many >>> >> of these are on stderr, not stdout) >>> >> >>> >> If you can reproduce this on the short example pairwise > alignment but not >>> >> with the 3M alignments, there's something else wrong that needs >>> >> investigation. In that case, can you first try running "-log 3" > instead of >>> >> "-log 5" (this will give more verbose log messages during the >>> >> neighbor-joining algorithm), then mail the stderr log messages > to me, and/or >>> >> please post a URL to your large file so I can download it and attempt to >>> >> reproduce the error. >>> >> >>> >> Sorry about this... >>> >> Ian >>> >> >>> >> >>> >> >>> >> >>> >> Warning -- used random number generator during initialisation >>> >> Read 1 alignments >>> >> Sequence_database: converting ASCII sequences to score profiles >>> >> Estimating tree for the following sequences: X Y >>> >> Optimizing branch lengths of all trees in alignment database, using EM >>> >> Optimizing tree branch lengths by EM. >>> >> Tree before optimization: >>> >> (X:5.0000000000,Y:5.0000000000)root; >>> >> EM iteration #1: log-likelihood = -56.0101 bits >>> >> Optimized tree after step #1: >>> >> (X:5.1160179765,Y:5.1160180140)root; >>> >> EM iteration #2: log-likelihood = -56.0118 bits >>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 bits during EM >>> >> Failed EM improvement threshold for the 1th time; stopping >>> >> Checking post-iteration log-likelihood >>> >> Post-iteration log-likelihood = -56.0118 bits >>> >> Restoring previous best branch lengths >>> >> Tree after optimization: >>> >> (X:5.0000000000,Y:5.0000000000)root; >>> >> Alignment database log-likelihood: -56.0101 bits >>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>> >> # STOCKHOLM 1.0 >>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>> >> X AAAAAAaaUUUUUU >>> >> Y GGGGGGaaCCCCCC >>> >> // >>> >> >>> >> >>> >> >>> >> >>> >> marcin joachimiak wrote: >>> >> >>> >>> Hey there, >>> >>> >>> >>> The xrate tree job took a while but it seemed to finish without >>> >>> errors. Alas there isn't any tree output ... >>> >>> >>> >>> Here's what I get from stdout: >>> >>> >>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>> >>> Warning -- used random number generator during initialisation >>> >>> Read 1 alignments >>> >>> Sequence_database: converting ASCII sequences to score profiles >>> >>> Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>> >>> >>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>> >>> problem right? And this was run on a hefty machine ... >>> >>> >>> >>> >>> >>> marcin >>> >>> >>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>> >>> <mar...@gm...> wrote: >>> >>> >>> >>>> much better, thanks! >>> >>>> I did notice the trees for each alignment in the output ... >>> >>>> >>> >>>> The tree estimation is running, I'll keep reporting ... >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes <ih...@be...> wrote: >>> >>>> > marcin joachimiak wrote: >>> >>>> > > Excellent, its working now! >>> >>>> > >>> >>>> > Cool! >>> >>>> > >>> >>>> > >>> >>>> > > I'll have the tree data soon and may try a run with as well, but >>> >> best >>> >>>> > > if I could know that the other tree is different from the xrate >>> >> one >>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>> >>>> > > tree/distances from xrate? >>> >>>> > >>> >>>> > Yes, it will insert the trees into the alignment files it outputs, >>> >> using >>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>> >> email. >>> >>>> > >>> >>>> > One caveat. The way you're running it now, it is calculating a >>> >> separate >>> >>>> > tree *for every window* in the alignment. This may not be what you >>> >> want, >>> >>>> > and it's certainly going to slow things down. >>> >>>> > >>> >>>> > If you want to calculate the tree just once for the whole > alignment, >>> >> type >>> >>>> > >>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>> >>>> > >>> >>>> > >>> >>>> > The "--noannotate" just means "don't waste time trying to annotate >>> >> this >>> >>>> > alignment, I just want the tree". The "-log 5" will just print some >>> >> log >>> >>>> > messages on stderr, so you can watch the progress of the >>> >> tree-building >>> >>>> > algorithms. >>> >>>> > >>> >>>> > hth, >>> >>>> > >>> >>>> > I. >>> >>>> > >>> >>>> >>> >>>> >>> |