Re: [Denovoassembler-devel] Ray Cloud Browser - Feedbacks
Ray -- Parallel genome assemblies for parallel DNA sequencing
Brought to you by:
sebhtml
From: Sébastien B. <seb...@ul...> - 2012-11-09 17:07:57
|
On 11/09/2012 11:50 AM, Charles Joly wrote: > You said: > " > TATCG -> ATCGA -> TCGAC > AATCG -> -> TCGAC > I don't see how you can represent this with just one letter." > > I'm not sure I get your example. You seem to have 2 identical k-mers It's a typo. Should be: TATCG -> ATCGA -> TCGAC AATCG -> -> TCGAG > (TCGAC) that are linked with the same k-mer (ATCGA): > > TATCG--- > |---ATCGA---TCGAC > AATCG--- > > Here we have two sequences, right? > > TATCGAC and AATCGAC(?) > > If 2 k-mers are linked isn't it by definition because they only have a > single base pair that differ? > > Seems to me that every k-mer except those at the right end of the > graph can be represented by the leftmost nucleotide (since the only > information about a k-mer that is not present in the next k-mer is the > first nucleotide): > > T--- > |---A---TCGAC > A--- > Yes, you need to bootstrap from one full-sequence k-mer, then you can just add 1 letter. > For the rightmost k-mers, it's a bit more complicated because there > are multiple cases possible. > Hence one letter is not enough for any k-mer with more than 1 parent or more than 1 child. Otherwise, people would be using the 1-letter-per-kmer to store de de Bruijn graph, right ? If I take my (corrected) example: TATCG -> ATCGA -> TCGAC AATCG -> -> TCGAG Taking only the last is useless: G -> A -> C G -> G Taking only the first is useless: T -> A -> C A -> -> G Why not, I will display all k-mer with only their last letter. Full sequence will be shown on-mouse-over. As you said, more stuff will be packed on the screen with smaller objects. Mathematically, you can spell DNA from a de Bruijn graph by taking the ith letter of each k-mer, in order, in a path. You then have to deal with ends. If you take the last, the head must be fully-taken and only the last is required in tail. If you take the first, just swap definitions of first and last. Anything in between is a linear variation. Thanks for feedbacks. > 2012/11/9 Sébastien Boisvert <seb...@ul...>: >> On 11/09/2012 10:02 AM, Charles Joly wrote: >>> >>> Salut Seb, >>> >>> J'ai regardé ton démo pour le browser avec Fred ce matin. C'est très cool! >>> >>> Est-ce que ce serait très compliqué de ne représenter que le >>> nucléotide qui est unique à chaque k-mer plutôt que la longeur totale? >>> Ça permettrait de représenter un plus grand nombre de k-mer dans une >>> fenêtre. >> >> >> Does not make sense. >> >> Example of a knot: >> >> TATCG -> ATCGA -> TCGAC >> AATCG -> -> TCGAC >> >> I don't see how you can represent this with just one letter. >> >> However, for chains x1 -> x2 -> x3 -> x4, I can do that: >> >> TATCG -> ATCGA -> TCGAC -> CGACA -> GACAC >> >> Can become: >> >> TATCG -> A -> C -> A -> GACAC >> >> In fact, we can only display any k-mer with a single nucleotide without >> loss of information for those with 1 child and 1 parent (1-1 k-mers). >> >> Lukily, most of k-mers are like this ! >> >> So what'll do is: >> >> - render 1-1 k-mers with the last nucleotide >> - render other k-mers with full sequence >> -when the user moves its mouse over a k-mer, it will be rendered in >> full-sequence mode >> regardless if it is a 1-1. >> >> >>> >>> De plus, tu pourrais avoir une couleur par nucléotide et ça pourrait >>> aider à visualiser plus facilement. >> >> >> Sure. >> >> >>> >>> Fred a même proposé de ne pas dessiner ni les cercle et ni arête et de >>> ne représenter qu'une lettre pour chaque nucléotide qu'on pourrait >>> déplacer de la même manière. >>> >> >> No. It's a graph, we need relationships. >> >>> >>> Charles. >>> >> |