Re: [Dart-help] help with windowlicker
Brought to you by:
ihh
From: Ian H. <ih...@be...> - 2008-05-13 21:13:59
|
You only really need that (and the "-e $DARTDIR/grammars/jukescantor.eg" option) for the initial tree. You can pass the "-e $DARTDIR/grammars/jukescantor.eg" to windowlicker too if you like, but with the "--maxrounds 0" there it won't actually do anything, and you probably don't need it anyway. marcin joachimiak wrote: > OK, its running now! > Thanks -- originally I couldn't get output from standard redirect '>' > but it worked the second time. > > Oh yes, where does the "--maxrounds 0" argument go? Is that for > windowlicker or just the initial tree? > > m > > On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote: >> Yes -- your "&>" shell redirection is redirecting all xrate's error & >> logging output to your "out" file. That is what the second xrate run >> (called from windowlicker) is complaining about. If you use ">" instead >> of "&>" then you will not mix stderr and stdout, and what you're doing >> should work (fingers crossed) >> >> Ian >> >> marcin joachimiak wrote: >>> hey there, >>> >>> So, I got the tree output after the cvs update, see attached. >>> >>> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg >>> --noannotate -log 5 &> out >>> >>> I was only about to capture the xrate tree output using &> so there >>> are likely extraneous lines there breaking things? >>> >>> I ran windowlicker as follows: >>> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl >>> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e >>> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out >>> >>> And got this, seemingly complaining that xrate failed: >>> >>> Ignoring line: Checking post-iteration log-likelihood >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Tree after optimization: >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> xrate failed (check command-line options? they were: >>> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >>> >>> I also tried w/o the -maxrounds argument and got: >>> >>> Ignoring line: Checking post-iteration log-likelihood >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Tree after optimization: >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns >>> at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141 >>> [stderr] ERROR: Bad node specifier: >>> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;' >>> xrate returned with nonzero exit status (invoked using: >>> /usr2/people/gtl/bin/dart//bin/xrate -e >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg) >>> >>> cheers, >>> marcin >>> >>> >>> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak >>> <mar...@gm...> wrote: >>>> Hey hey, >>>> >>>> Indeed, the alignment size limitation makes sense and was one of the >>>> things I was suspecting. >>>> >>>> The job was run on a machine w.o queue management but I will >>>> investigate whether it could have been killed by other means. >>>> >>>> The 0 iteration option sounds like it will do the trick -- so I just >>>> need to checkout the DART code from cvs again, right? Thanks for >>>> adding this in and I definitely understand the time constraints etc., >>>> this should suffice for my purposes. I could of course try to obtain a >>>> tree in other ways, but actually I'm trying to turn this into a >>>> pipeline that can run on multi-genome alignments, its almost there. >>>> >>>> cheers, >>>> marcin >>>> >>>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote: >>>>> Hi Marcin... >>>>> >>>>> So, I never ran xrate on an alignment this big before (6 million >>>>> columns!) and it turned out there was a drastic inefficiency in the >>>>> distance matrix code, that was causing it to scale as the square of the >>>>> number of columns. Unnoticeable for most alignments (since the code >>>>> inside the loop was pretty quick), but your example caused it to hang... >>>>> >>>>> I think it was quite possible that your xrate job was getting killed >>>>> because it was running too long. Are you running queue management >>>>> software of some kind (e.g. Sun Grid Engine)? That can often kill >>>>> long-running jobs. I couldn't find a specific bug that was crashing on >>>>> this data, although I didn't finish running it because it was taking >>>>> sooooo long. >>>>> >>>>> Anyway, I've now fixed the inefficiency bug. The tree estimation by >>>>> neighbor-joining now finishes pretty fast. However, it now takes a long >>>>> time to complete the second step of the tree-estimation procedure, which >>>>> is to refine the branch lengths of the tree using the EM algorithm. >>>>> >>>>> There is less that I can do to optimize this. The neighbor-joining code >>>>> is not heavily dependent on the alignment length, but the branch-length >>>>> refinement is. I could optimize the branch-length code in the same way >>>>> that the neighbor-joining is optimized, so it was less dependent on >>>>> alignment length, but that would take a little while, and unfortunately >>>>> I don't have the time right now. >>>>> >>>>> Instead, what I have done is introduced a subtle hack. If you specify >>>>> the command-line option "--maxrounds 0", then this will bypass the >>>>> branch-length refinement code (specifically, what it does is tell xrate >>>>> that you want the EM algorithm to run for a maximum of zero iterations, >>>>> or "rounds"). >>>>> >>>>> Thus, with "--maxrounds 0" the tree will be estimated using >>>>> neighbor-joining only. >>>>> >>>>> Sorry this has taken so long to fix! Like I said, I never used xrate on >>>>> an alignment this big before (for our screens, we pre-specified the >>>>> tree, so we went directly to windowlicker). >>>>> >>>>> Hopefully, this will now work for you, but let me know if you have >>>>> further issues. >>>>> >>>>> Cheers, >>>>> Ian >>>>> >>>>> >>>>> >>>>> >>>>> marcin joachimiak wrote: >>>>> > hey, >>>>> > >>>>> > Thanks for the help! >>>>> > The short example worked just fine. >>>>> > >>>>> > log3 with my previous command so far produced this: >>>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>>> > Largest eigenvalue was 0; set to 0 >>>>> > Updating cached M() & J() >>>>> > Finding eigenvectors using MatrixExpEigenPrepare >>>>> > Largest eigenvalue was 0; set to 0 >>>>> > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0)) >>>>> > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0)) >>>>> > Updating cached M() & J() >>>>> > Sequence_database: converting ASCII sequences to score profiles >>>>> > Processed 3 sequences >>>>> > Estimating tree for the following sequences: CP000112 CP000527 AE017286 >>>>> > Estimating distances of sequences 0 and 1 >>>>> > >>>>> > You can download the FASTA and Stock files here: >>>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz >>>>> > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz >>>>> > >>>>> > I'll have more updates on Monday -- enjoy the weekend, >>>>> > marcin >>>>> > - Show quoted text - >>>>> > >>>>> > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote: >>>>> >> Thanks Marcin. >>>>> >> >>>>> >> Something has definitely gone wrong if no alignment is output. Can you >>>>> >> check that the following outputs an alignment+tree? >>>>> >> >>>>> >> xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg >>>>> >> ~/dart/src/ecfg/t/notree.stk >>>>> >> >>>>> >> >>>>> >> Looking at your log messages it seems like the program has > crashed while >>>>> >> trying to do neighbor-joining to estimate the tree. For the >>> above command, I >>>>> >> get log messages after the "Estimating tree..." message. See >>> below (BTW many >>>>> >> of these are on stderr, not stdout) >>>>> >> >>>>> >> If you can reproduce this on the short example pairwise >>> alignment but not >>>>> >> with the 3M alignments, there's something else wrong that needs >>>>> >> investigation. In that case, can you first try running "-log 3" >>> instead of >>>>> >> "-log 5" (this will give more verbose log messages during the >>>>> >> neighbor-joining algorithm), then mail the stderr log messages >>> to me, and/or >>>>> >> please post a URL to your large file so I can download it and > attempt to >>>>> >> reproduce the error. >>>>> >> >>>>> >> Sorry about this... >>>>> >> Ian >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> Warning -- used random number generator during initialisation >>>>> >> Read 1 alignments >>>>> >> Sequence_database: converting ASCII sequences to score profiles >>>>> >> Estimating tree for the following sequences: X Y >>>>> >> Optimizing branch lengths of all trees in alignment database, using EM >>>>> >> Optimizing tree branch lengths by EM. >>>>> >> Tree before optimization: >>>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>>> >> EM iteration #1: log-likelihood = -56.0101 bits >>>>> >> Optimized tree after step #1: >>>>> >> (X:5.1160179765,Y:5.1160180140)root; >>>>> >> EM iteration #2: log-likelihood = -56.0118 bits >>>>> >> Warning: log-likelihood dropped from -56.0101 to -56.0118 > bits during EM >>>>> >> Failed EM improvement threshold for the 1th time; stopping >>>>> >> Checking post-iteration log-likelihood >>>>> >> Post-iteration log-likelihood = -56.0118 bits >>>>> >> Restoring previous best branch lengths >>>>> >> Tree after optimization: >>>>> >> (X:5.0000000000,Y:5.0000000000)root; >>>>> >> Alignment database log-likelihood: -56.0101 bits >>>>> >> Processing alignment Alignment1 (1 of 1): 14 columns >>>>> >> # STOCKHOLM 1.0 >>>>> >> #=GF NH (X:5.0000000000,Y:5.0000000000)root; >>>>> >> X AAAAAAaaUUUUUU >>>>> >> Y GGGGGGaaCCCCCC >>>>> >> // >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> marcin joachimiak wrote: >>>>> >> >>>>> >>> Hey there, >>>>> >>> >>>>> >>> The xrate tree job took a while but it seemed to finish without >>>>> >>> errors. Alas there isn't any tree output ... >>>>> >>> >>>>> >>> Here's what I get from stdout: >>>>> >>> >>>>> >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e >>>>> >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5 >>>>> >>> Warning -- used random number generator during initialisation >>>>> >>> Read 1 alignments >>>>> >>> Sequence_database: converting ASCII sequences to score profiles >>>>> >>> Estimating tree for the following sequences: CP000112 > CP000527 AE017286 >>>>> >>> >>>>> >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a >>>>> >>> problem right? And this was run on a hefty machine ... >>>>> >>> >>>>> >>> >>>>> >>> marcin >>>>> >>> >>>>> >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak >>>>> >>> <mar...@gm...> wrote: >>>>> >>> >>>>> >>>> much better, thanks! >>>>> >>>> I did notice the trees for each alignment in the output ... >>>>> >>>> >>>>> >>>> The tree estimation is running, I'll keep reporting ... >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes > <ih...@be...> wrote: >>>>> >>>> > marcin joachimiak wrote: >>>>> >>>> > > Excellent, its working now! >>>>> >>>> > >>>>> >>>> > Cool! >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > > I'll have the tree data soon and may try a run with > as well, but >>>>> >> best >>>>> >>>> > > if I could know that the other tree is different from the xrate >>>>> >> one >>>>> >>>> > > with the Jukes-Cantor model. Can I get the Jukes-Cantor model >>>>> >>>> > > tree/distances from xrate? >>>>> >>>> > >>>>> >>>> > Yes, it will insert the trees into the alignment files > it outputs, >>>>> >> using >>>>> >>>> > the New-Hampshire-encoded-in-Stockholm format from my previous >>>>> >> email. >>>>> >>>> > >>>>> >>>> > One caveat. The way you're running it now, it is calculating a >>>>> >> separate >>>>> >>>> > tree *for every window* in the alignment. This may not > be what you >>>>> >> want, >>>>> >>>> > and it's certainly going to slow things down. >>>>> >>>> > >>>>> >>>> > If you want to calculate the tree just once for the whole >>> alignment, >>>>> >> type >>>>> >>>> > >>>>> >>>> > xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg >>>>> >>>> > --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > The "--noannotate" just means "don't waste time trying > to annotate >>>>> >> this >>>>> >>>> > alignment, I just want the tree". The "-log 5" will > just print some >>>>> >> log >>>>> >>>> > messages on stderr, so you can watch the progress of the >>>>> >> tree-building >>>>> >>>> > algorithms. >>>>> >>>> > >>>>> >>>> > hth, >>>>> >>>> > >>>>> >>>> > I. >>>>> >>>> > >>>>> >>>> >>>>> >>>> >>>>> |