Re: [Dart-help] help with windowlicker

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

You only really need that (and the "-e $DARTDIR/grammars/jukescantor.eg"
option) for the initial tree.

You can pass the "-e $DARTDIR/grammars/jukescantor.eg" to windowlicker
too if you like, but with the "--maxrounds 0" there it won't actually do
anything, and you probably don't need it anyway.

marcin joachimiak wrote:
> OK, its running now!
> Thanks -- originally I couldn't get output from standard redirect '>'
> but it worked the second time.
> 
> Oh yes, where does the "--maxrounds 0" argument go? Is that for
> windowlicker or just the initial tree?
> 
> m
> 
> On Tue, May 13, 2008 at 2:02 PM, Ian Holmes <ih...@be...> wrote:
>> Yes -- your "&>" shell redirection is redirecting all xrate's error &
>> logging output to your "out" file. That is what the second xrate run
>> (called from windowlicker) is complaining about. If you use ">" instead
>> of "&>" then you will not mix stderr and stdout, and what you're doing
>> should work (fingers crossed)
>>
>> Ian
>>
>> marcin joachimiak wrote:
>>> hey there,
>>>
>>> So, I got the tree output after the cvs update, see attached.
>>>
>>> xrate DvH_DP4_G20_mVISTA.stock -e $DARTDIR/grammars/jukescantor.eg
>>> --noannotate -log 5 &> out
>>>
>>> I was only about to capture the xrate tree output using &> so there
>>> are likely extraneous lines there breaking things?
>>>
>>> I ran windowlicker as follows:
>>> perl /usr2/people/gtl/bin/dart/perl/windowlicker.pl
>>> DvH_DP4_G20_mVISTA_wtree.stock -- -maxrounds 0 -e
>>> $DARTDIR/grammars/jukescantor.eg &> windowlicker.out
>>>
>>> And got this, seemingly complaining that xrate failed:
>>>
>>> Ignoring line: Checking post-iteration log-likelihood
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Tree after optimization:
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> xrate failed (check command-line options? they were:
>>> /usr2/people/gtl/bin/dart//bin/xrate -maxrounds 0 -e
>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg)
>>>
>>> I also tried w/o the -maxrounds argument and got:
>>>
>>> Ignoring line: Checking post-iteration log-likelihood
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Post-iteration log-likelihood = -1.40137e+07 bits
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Tree after optimization:
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Alignment database log-likelihood: -1.40137e+07 bits
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> Ignoring line: Processing alignment Alignment1 (1 of 1): 5973837 columns
>>>  at /usr2/people/gtl/bin/dart/perl//Stockholm/Database.pm line 141
>>> [stderr] ERROR: Bad node specifier:
>>> '(CP000527:0.1650439446,(CP000112:0.2607392479,AE017286:0.3620623361)node_3:0.1650439446)root;'
>>> xrate returned with nonzero exit status (invoked using:
>>> /usr2/people/gtl/bin/dart//bin/xrate -e
>>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg)
>>>
>>> cheers,
>>> marcin
>>>
>>>
>>> On Tue, Apr 29, 2008 at 10:48 AM, marcin joachimiak
>>> <mar...@gm...> wrote:
>>>> Hey hey,
>>>>
>>>> Indeed, the alignment size limitation makes sense and was one of the
>>>> things I was suspecting.
>>>>
>>>> The job was run on a machine w.o queue management but I will
>>>> investigate whether it could have been killed by other means.
>>>>
>>>> The 0 iteration option sounds like it will do the trick -- so I just
>>>> need to checkout the DART code from cvs again, right? Thanks for
>>>> adding this in and I definitely understand the time constraints etc.,
>>>> this should suffice for my purposes. I could of course try to obtain a
>>>> tree in other ways, but actually I'm trying to turn this into a
>>>> pipeline that can run on multi-genome alignments, its almost there.
>>>>
>>>> cheers,
>>>> marcin
>>>>
>>>> On Mon, Apr 28, 2008 at 12:28 PM, Ian Holmes <ih...@be...> wrote:
>>>>> Hi Marcin...
>>>>>
>>>>>  So, I never ran xrate on an alignment this big before (6 million
>>>>>  columns!) and it turned out there was a drastic inefficiency in the
>>>>>  distance matrix code, that was causing it to scale as the square of the
>>>>>  number of columns. Unnoticeable for most alignments (since the code
>>>>>  inside the loop was pretty quick), but your example caused it to hang...
>>>>>
>>>>>  I think it was quite possible that your xrate job was getting killed
>>>>>  because it was running too long. Are you running queue management
>>>>>  software of some kind (e.g. Sun Grid Engine)? That can often kill
>>>>>  long-running jobs. I couldn't find a specific bug that was crashing on
>>>>>  this data, although I didn't finish running it because it was taking
>>>>>  sooooo long.
>>>>>
>>>>>  Anyway, I've now fixed the inefficiency bug. The tree estimation by
>>>>>  neighbor-joining now finishes pretty fast. However, it now takes a long
>>>>>  time to complete the second step of the tree-estimation procedure, which
>>>>>  is to refine the branch lengths of the tree using the EM algorithm.
>>>>>
>>>>>  There is less that I can do to optimize this. The neighbor-joining code
>>>>>  is not heavily dependent on the alignment length, but the branch-length
>>>>>  refinement is. I could optimize the branch-length code in the same way
>>>>>  that the neighbor-joining is optimized, so it was less dependent on
>>>>>  alignment length, but that would take a little while, and unfortunately
>>>>>  I don't have the time right now.
>>>>>
>>>>>  Instead, what I have done is introduced a subtle hack. If you specify
>>>>>  the command-line option "--maxrounds 0", then this will bypass the
>>>>>  branch-length refinement code (specifically, what it does is tell xrate
>>>>>  that you want the EM algorithm to run for a maximum of zero iterations,
>>>>>  or "rounds").
>>>>>
>>>>>  Thus, with "--maxrounds 0" the tree will be estimated using
>>>>>  neighbor-joining only.
>>>>>
>>>>>  Sorry this has taken so long to fix! Like I said, I never used xrate on
>>>>>  an alignment this big before (for our screens, we pre-specified the
>>>>>  tree, so we went directly to windowlicker).
>>>>>
>>>>>  Hopefully, this will now work for you, but let me know if you have
>>>>>  further issues.
>>>>>
>>>>>  Cheers,
>>>>>  Ian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  marcin joachimiak wrote:
>>>>>  > hey,
>>>>>  >
>>>>>  > Thanks for the help!
>>>>>  > The short example worked just fine.
>>>>>  >
>>>>>  > log3 with my previous command so far produced this:
>>>>>  > Finding eigenvectors using MatrixExpEigenPrepare
>>>>>  > Largest eigenvalue was 0; set to 0
>>>>>  > Updating cached M() & J()
>>>>>  > Finding eigenvectors using MatrixExpEigenPrepare
>>>>>  > Largest eigenvalue was 0; set to 0
>>>>>  > Setting eigenvalue #2 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0))
>>>>>  > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #0 ((-1.3333,0))
>>>>>  > Setting eigenvalue #3 ((-1.3333,0)) equal to eigenvalue #2 ((-1.3333,0))
>>>>>  > Updating cached M() & J()
>>>>>  > Sequence_database: converting ASCII sequences to score profiles
>>>>>  > Processed 3 sequences
>>>>>  > Estimating tree for the following sequences: CP000112 CP000527 AE017286
>>>>>  > Estimating distances of sequences 0 and 1
>>>>>  >
>>>>>  > You can download the FASTA and Stock files here:
>>>>>  > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.fasta.gz
>>>>>  > http://www.microbesonline.org/tmp/marcin/DvH_DP4_G20_mVISTA.stock.gz
>>>>>  >
>>>>>  > I'll have more updates on Monday -- enjoy the weekend,
>>>>>  > marcin
>>>>>  > - Show quoted text -
>>>>>  >
>>>>>  > On Sat, Apr 26, 2008 at 1:25 PM, Ian Holmes <ih...@be...> wrote:
>>>>>  >> Thanks Marcin.
>>>>>  >>
>>>>>  >>  Something has definitely gone wrong if no alignment is output. Can you
>>>>>  >> check that the following outputs an alignment+tree?
>>>>>  >>
>>>>>  >>  xrate -log 5 --noannotate -e ~/dart/grammars/jukescantor.eg
>>>>>  >> ~/dart/src/ecfg/t/notree.stk
>>>>>  >>
>>>>>  >>
>>>>>  >>  Looking at your log messages it seems like the program has
> crashed while
>>>>>  >> trying to do neighbor-joining to estimate the tree. For the
>>> above command, I
>>>>>  >> get log messages after the "Estimating tree..." message. See
>>> below (BTW many
>>>>>  >> of these are on stderr, not stdout)
>>>>>  >>
>>>>>  >>  If you can reproduce this on the short example pairwise
>>> alignment but not
>>>>>  >> with the 3M alignments, there's something else wrong that needs
>>>>>  >> investigation. In that case, can you first try running "-log 3"
>>> instead of
>>>>>  >> "-log 5" (this will give more verbose log messages during the
>>>>>  >> neighbor-joining algorithm), then mail the stderr log messages
>>> to me, and/or
>>>>>  >> please post a URL to your large file so I can download it and
> attempt to
>>>>>  >> reproduce the error.
>>>>>  >>
>>>>>  >>  Sorry about this...
>>>>>  >>  Ian
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >>  Warning -- used random number generator during initialisation
>>>>>  >>  Read 1 alignments
>>>>>  >>  Sequence_database: converting ASCII sequences to score profiles
>>>>>  >>  Estimating tree for the following sequences: X Y
>>>>>  >>  Optimizing branch lengths of all trees in alignment database, using EM
>>>>>  >>  Optimizing tree branch lengths by EM.
>>>>>  >>  Tree before optimization:
>>>>>  >>  (X:5.0000000000,Y:5.0000000000)root;
>>>>>  >>  EM iteration #1: log-likelihood = -56.0101 bits
>>>>>  >>  Optimized tree after step #1:
>>>>>  >>  (X:5.1160179765,Y:5.1160180140)root;
>>>>>  >>  EM iteration #2: log-likelihood = -56.0118 bits
>>>>>  >>  Warning: log-likelihood dropped from -56.0101 to -56.0118
> bits during EM
>>>>>  >>  Failed EM improvement threshold for the 1th time; stopping
>>>>>  >>  Checking post-iteration log-likelihood
>>>>>  >>  Post-iteration log-likelihood = -56.0118 bits
>>>>>  >>  Restoring previous best branch lengths
>>>>>  >>  Tree after optimization:
>>>>>  >>  (X:5.0000000000,Y:5.0000000000)root;
>>>>>  >>  Alignment database log-likelihood: -56.0101 bits
>>>>>  >>  Processing alignment Alignment1 (1 of 1): 14 columns
>>>>>  >>  # STOCKHOLM 1.0
>>>>>  >>  #=GF NH (X:5.0000000000,Y:5.0000000000)root;
>>>>>  >>  X       AAAAAAaaUUUUUU
>>>>>  >>  Y       GGGGGGaaCCCCCC
>>>>>  >>  //
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >>  marcin joachimiak wrote:
>>>>>  >>
>>>>>  >>> Hey there,
>>>>>  >>>
>>>>>  >>> The xrate tree job took a while but it seemed to finish without
>>>>>  >>> errors. Alas there isn't any tree output ...
>>>>>  >>>
>>>>>  >>> Here's what I get from stdout:
>>>>>  >>>
>>>>>  >>> Parsed command line: xrate DvH_DP4_G20_mVISTA.stock -e
>>>>>  >>> /usr2/people/gtl/bin/dart//grammars/jukescantor.eg --noannotate -log 5
>>>>>  >>> Warning -- used random number generator during initialisation
>>>>>  >>> Read 1 alignments
>>>>>  >>> Sequence_database: converting ASCII sequences to score profiles
>>>>>  >>> Estimating tree for the following sequences: CP000112
> CP000527 AE017286
>>>>>  >>>
>>>>>  >>> The alignment is 3 bacterial genomes, ~3M each -- that wouldn't be a
>>>>>  >>> problem right? And this was run on a hefty machine ...
>>>>>  >>>
>>>>>  >>>
>>>>>  >>> marcin
>>>>>  >>>
>>>>>  >>> On Thu, Apr 24, 2008 at 7:36 PM, marcin joachimiak
>>>>>  >>> <mar...@gm...> wrote:
>>>>>  >>>
>>>>>  >>>> much better, thanks!
>>>>>  >>>>  I did notice the trees for each alignment in the output ...
>>>>>  >>>>
>>>>>  >>>>  The tree estimation is running, I'll keep reporting ...
>>>>>  >>>>
>>>>>  >>>>
>>>>>  >>>>
>>>>>  >>>>
>>>>>  >>>>  On Thu, Apr 24, 2008 at 7:22 PM, Ian Holmes
> <ih...@be...> wrote:
>>>>>  >>>>  > marcin joachimiak wrote:
>>>>>  >>>>  >  > Excellent, its working now!
>>>>>  >>>>  >
>>>>>  >>>>  >  Cool!
>>>>>  >>>>  >
>>>>>  >>>>  >
>>>>>  >>>>  >  > I'll have the tree data soon and may try a run with
> as well, but
>>>>>  >> best
>>>>>  >>>>  >  > if I could know that the other tree is different from the xrate
>>>>>  >> one
>>>>>  >>>>  >  > with the Jukes-Cantor model. Can I get the Jukes-Cantor model
>>>>>  >>>>  >  > tree/distances from xrate?
>>>>>  >>>>  >
>>>>>  >>>>  >  Yes, it will insert the trees into the alignment files
> it outputs,
>>>>>  >> using
>>>>>  >>>>  >  the New-Hampshire-encoded-in-Stockholm format from my previous
>>>>>  >> email.
>>>>>  >>>>  >
>>>>>  >>>>  >  One caveat. The way you're running it now, it is calculating a
>>>>>  >> separate
>>>>>  >>>>  >  tree *for every window* in the alignment. This may not
> be what you
>>>>>  >> want,
>>>>>  >>>>  >  and it's certainly going to slow things down.
>>>>>  >>>>  >
>>>>>  >>>>  >  If you want to calculate the tree just once for the whole
>>> alignment,
>>>>>  >> type
>>>>>  >>>>  >
>>>>>  >>>>  >  xrate MY_ALIGNMENT.stock -e $DARTDIR/grammars/jukescantor.eg
>>>>>  >>>>  >  --noannotate -log 5 > MY_ALIGNMENT_WITH_TREE.stock
>>>>>  >>>>  >
>>>>>  >>>>  >
>>>>>  >>>>  >  The "--noannotate" just means "don't waste time trying
> to annotate
>>>>>  >> this
>>>>>  >>>>  >  alignment, I just want the tree". The "-log 5" will
> just print some
>>>>>  >> log
>>>>>  >>>>  >  messages on stderr, so you can watch the progress of the
>>>>>  >> tree-building
>>>>>  >>>>  >  algorithms.
>>>>>  >>>>  >
>>>>>  >>>>  >  hth,
>>>>>  >>>>  >
>>>>>  >>>>  >  I.
>>>>>  >>>>  >
>>>>>  >>>>
>>>>>  >>>>
>>>>>