Re: [Treebase-devel] How do you align >100,000 sequences?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I started a wiki page for use-cases (relevant to MIAPA, TreeBASE, etc)  
that focus on replication, re-use, aggregation, re-purposing, meta- 
analysis, and integration:

   http://www.evoio.org/wiki/UseCases

You should be able to sign up for access if you do not have it already.

Right now the list is empty.  However, there have been several good  
discussions of use-cases over the past 2 months.  I encourage those  
who are interested to transfer some of that material to the wiki,  
where it can become a shared resource.

Arlin

On Feb 14, 2011, at 10:22 PM, Rutger Vos wrote:

> Hi guys,
>
> Brian has an interesting use case:
>
> ---------- Forwarded message ----------
> From: Brian T. Foley <bt...@la...>
> Date: Thu, Feb 10, 2011 at 11:44 PM
> Subject: Re: How do you align >100,000 sequences?
> To: Rutger Vos <rut...@gm...>
> Cc: bt...@la...
>
>
> OK.  I guess I can see some uses of such a thing, even if it is not  
> true
> phylogenetics, or the best method.
>
> Tree building is done in other fields besides biology, so there may be
> tools in use by computer scientists or librarians or something, that  
> could
> work better or fast than the tools I am used to in biology.
>
> I was quite interested after poking around in TreeBase a bit  
> yesterday.  I
> still don't find it easy to find my way to the data sets I'd like,  
> but the
> more I try the easier it is getting.
>
> Keep me, and the HIV Databases, in mind if you have questions about  
> large
> sets.  Viruses leave no fossils and they all look alike for the most  
> part,
> so all we have is phylogeny, and we do a lot of it.
>
> It looks like TreeBase is more about storing data produced by others,
> rather than building new trees or helping researchers put together a  
> new
> data set.  HIV Database is rather the opposite, but I've long  
> thought it
> would be very nice if we provided trees and NEXUS files.  Maybe it  
> would
> make sense to store them in TreeBase with a link to the TreeBase entry
> from our database.
>
> Brian
>
> > Dear Brian,
> >
> > I actually sent my query on behalf of someone else, so I can't  
> vouch for
> > how
> > or why he did things the way he did them. I know that he has
> > Smith-Waterman
> > distances between all pairs of proteins in the set, but that he  
> doesn't
> > actually have one multiple sequence alignment for the whole set. My
> > understanding is that the proteins are very, very divergent in  
> some cases,
> > so I doubt trying to align them would make any sense at all (and,  
> perhaps,
> > neither would using the SW distances as a metric on which to base  
> a tree,
> > but that's his business).
> >
> > Best wishes,
> >
> > Rutger
> >
>
>
>
> -- 
> Dr. Rutger A. Vos
> School of Biological Sciences
> Philip Lyle Building, Level 4
> University of Reading
> Reading
> RG6 6BX
> United Kingdom
> Tel: +44 (0) 118 378 7535
> http://www.nexml.org
> http://rutgervos.blogspot.com
>
> -- 
> You received this message because you are subscribed to the Google
> Groups "MIAPA" group.
> For more options, visit this group at
> http://groups.google.com/group/miapa-discuss?hl=en

-------
Arlin Stoltzfus (ar...@um...)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org