Thread: [Treesoft-treefam] How to extract 2 or more leaves with the same GID ?
Brought to you by:
lh3lh3
From: Sebastien M. <Seb...@un...> - 2008-08-25 12:22:44
|
Hi, we use your API which is very powerful. But there is a problem when we try to get the alignment for families where one GID (GeneID) is the same for 2 or more leaves: e.g. 1 Family TF101024 1 GID Smp_150040 2 leaves Smp_150040.1 & Smp_150040.2 the nhx method returns the proper tree with distinct Smp_150040.1 & Smp_150040.2 But the get_alignment method returns twice Smp_150040.1, with the same headers and sequences. Smp_150040.1 is duplicated. How to resolve this ? Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 12:49:31
|
> Hi Sebastien, > > The example you give looks strange because a gene should only be > represented by > one transcript in a given tree. I think I can find several other examples like this one. > Anyone got an idea why a gene can be represented by 2 transcripts in the > same > tree ? It seems these transcripts are too distant and that they cannot be merged as one single gene. > I'll see how to fix the API for cases like this. > > Cheers > > J-K Thanks Sébastien > Quoting Sebastien MORETTI <Seb...@un...>: > >> Hi, >> >> we use your API which is very powerful. >> But there is a problem when we try to get the alignment for families = >> >> where one GID (GeneID) is the same for 2 or more leaves: >> e.g. 1 Family TF101024 >> 1 GID Smp_150040 >> 2 leaves Smp_150040.1 & Smp_150040.2 >> >> the nhx method returns the proper tree with distinct Smp_150040.1 & = >> >> Smp_150040.2 >> But the get_alignment method returns twice Smp_150040.1, with the same = >> >> headers and sequences. Smp_150040.1 is duplicated. >> >> >> How to resolve this ? >> >> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 12:57:14
|
Maybe the best way will be to remove redundancy into the gene list (from leaves) because I get 2 Smp_150040 entries. Then extract alignment(s) for every geneID. What do you think about ? Cheers Sébastien >> Hi Sebastien, >> >> The example you give looks strange because a gene should only be >> represented by >> one transcript in a given tree. > > I think I can find several other examples like this one. > >> Anyone got an idea why a gene can be represented by 2 transcripts in the >> same >> tree ? > > It seems these transcripts are too distant and that they cannot be > merged as one single gene. > >> I'll see how to fix the API for cases like this. > > >> Cheers >> >> J-K > > Thanks > Sébastien > >> Quoting Sebastien MORETTI <Seb...@un...>: >> >>> Hi, >>> >>> we use your API which is very powerful. >>> But there is a problem when we try to get the alignment for families = >>> >>> where one GID (GeneID) is the same for 2 or more leaves: >>> e.g. 1 Family TF101024 >>> 1 GID Smp_150040 >>> 2 leaves Smp_150040.1 & Smp_150040.2 >>> >>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >>> >>> Smp_150040.2 >>> But the get_alignment method returns twice Smp_150040.1, with the same = >>> >>> headers and sequences. Smp_150040.1 is duplicated. >>> >>> >>> How to resolve this ? >>> >>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 13:05:57
|
Here is a list of duplicated entries I found. All of them seem to come from Schistosoma mansoni species. Smp_150040.1 Smp_062300 Smp_028100.1 Smp_059750 Smp_136850.1 Smp_165350.1 Smp_138920.1 Smp_176260.1 Smp_159900.1 Smp_074070.1 Smp_061920.1 Smp_093750.1 Smp_103460.1 Smp_060940 Smp_176230.1 Smp_041430.1 Smp_034840.1 Smp_034840.1 Smp_153760.1 Smp_055760 Smp_094050.1 Smp_009650.1 Smp_069130.1 Smp_106130.1 Smp_049300.1 Smp_049300.1 Smp_049300.1 Smp_144010.1 Smp_162800.1 Smp_059290.1 Smp_176200.1 Smp_163420.1 Smp_020920.1 Smp_035200.1 Smp_035200.1 Smp_035200.1 Smp_049600.1 Smp_049600.1 Smp_049600.1 Smp_005070.1 Smp_138680.1 Smp_158110.1 Smp_004470.1 Smp_004470.1 Smp_035720 Smp_147250.1 Smp_128200.1 Smp_149200.1 Smp_149200.1 Smp_076030.1 Smp_033050.1 Smp_033050.1 Smp_004780.1 Smp_158320.1 Smp_064040 Smp_161970.1 Smp_104680.1 Smp_034980.1 Smp_035430.1 Smp_025570.1 Smp_025570.1 Smp_042590.1 Smp_063120.1 Smp_063120.1 Smp_160770.1 Smp_160770.1 Smp_045430.1 Smp_054790.1 Smp_054340.1 Smp_054340.1 Smp_042270.1 Smp_042270.1 Smp_038080.1 Smp_005080.1 Smp_005080.1 Smp_066630.1 Smp_046880.1 Smp_053510.1 Smp_173180.1 Smp_141630.1 Smp_141630.1 Smp_141630.1 Smp_124820.1 Smp_046090.1 Smp_096450 Smp_120320 Smp_075220.1 Smp_075220.1 Smp_000030.1 Smp_000030.1 Smp_076630.1 Smp_030930.1 Smp_142010.1 Smp_038870.1 Smp_050360 Smp_033670.1 Smp_033670.1 Smp_033670.1 Smp_058150.1 Smp_031040 Smp_024060.1 Smp_001410 Smp_031950.1 Smp_009420 Smp_103930.1 Smp_103930.1 Smp_103930.1 Smp_103930.1 Smp_035270.1 Smp_048260.1 Smp_130790.1 Smp_042670.1 Smp_031000.1 Smp_138970.1 Smp_042430.1 Smp_064020.2 Smp_064020.2 Smp_064020.2 Smp_079050.1 Smp_069220 Smp_055420.1 Smp_007630.1 Smp_063580.1 Smp_063580.1 Smp_156440.2 Smp_096780.1 Smp_106080.1 Smp_066960.1 Smp_082560.1 Smp_175120.1 Smp_085310.1 Smp_045750 Smp_140450.1 Smp_010230 Smp_061310.1 Smp_061310.1 Smp_084140.1 Smp_084140.1 Smp_095910.1 Smp_017290.3 Smp_165810.2 Smp_024360 Smp_030920.1 Smp_032580.1 Smp_032580.1 Smp_121430.1 Smp_074500 Smp_000740.1 Smp_066340 Smp_048280.1 Smp_098890.1 Smp_098890.1 Smp_045950.1 Smp_079430.1 Smp_052280.1 Smp_065190 Smp_165140.1 Smp_150850.1 Smp_121610.1 Smp_121610.1 Smp_128890.1 Smp_014400.1 Smp_014400.1 Smp_014400.1 Smp_049730 Smp_168850.1 Smp_101370 Smp_006250.1 Smp_053220.1 Cheers Sébastien > Maybe the best way will be to remove redundancy into the gene list (from > leaves) because I get 2 Smp_150040 entries. > > Then extract alignment(s) for every geneID. > > > What do you think about ? > > Cheers > Sébastien > >>> Hi Sebastien, >>> >>> The example you give looks strange because a gene should only be >>> represented by >>> one transcript in a given tree. >> I think I can find several other examples like this one. >> >>> Anyone got an idea why a gene can be represented by 2 transcripts in the >>> same >>> tree ? >> It seems these transcripts are too distant and that they cannot be >> merged as one single gene. >> >>> I'll see how to fix the API for cases like this. >> > >>> Cheers >>> >>> J-K >> Thanks >> Sébastien >> >>> Quoting Sebastien MORETTI <Seb...@un...>: >>> >>>> Hi, >>>> >>>> we use your API which is very powerful. >>>> But there is a problem when we try to get the alignment for families = >>>> >>>> where one GID (GeneID) is the same for 2 or more leaves: >>>> e.g. 1 Family TF101024 >>>> 1 GID Smp_150040 >>>> 2 leaves Smp_150040.1 & Smp_150040.2 >>>> >>>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >>>> >>>> Smp_150040.2 >>>> But the get_alignment method returns twice Smp_150040.1, with the same = >>>> >>>> headers and sequences. Smp_150040.1 is duplicated. >>>> >>>> >>>> How to resolve this ? >>>> >>>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 13:49:47
|
It works properly for families I tested. Thanks Best regards Sébastien > Hi, > > I have fixed the API. You'll need to check out the new Tree.pm module > from the > Subversion repository. As I am on holidays I have only just quickly > tested the > changes. Let me know if it doesn't work. > > Cheers > > J-K > > Quoting Sebastien MORETTI <Seb...@un...>: > >> Here is a list of duplicated entries I found. >> All of them seem to come from Schistosoma mansoni species. >> >> Smp_150040.1 >> Smp_062300 >> Smp_028100.1 >> Smp_059750 >> Smp_136850.1 >> Smp_165350.1 >> Smp_138920.1 >> Smp_176260.1 >> Smp_159900.1 >> Smp_074070.1 >> Smp_061920.1 >> Smp_093750.1 >> Smp_103460.1 >> Smp_060940 >> Smp_176230.1 >> Smp_041430.1 >> Smp_034840.1 >> Smp_034840.1 >> Smp_153760.1 >> Smp_055760 >> Smp_094050.1 >> Smp_009650.1 >> Smp_069130.1 >> Smp_106130.1 >> Smp_049300.1 >> Smp_049300.1 >> Smp_049300.1 >> Smp_144010.1 >> Smp_162800.1 >> Smp_059290.1 >> Smp_176200.1 >> Smp_163420.1 >> Smp_020920.1 >> Smp_035200.1 >> Smp_035200.1 >> Smp_035200.1 >> Smp_049600.1 >> Smp_049600.1 >> Smp_049600.1 >> Smp_005070.1 >> Smp_138680.1 >> Smp_158110.1 >> Smp_004470.1 >> Smp_004470.1 >> Smp_035720 >> Smp_147250.1 >> Smp_128200.1 >> Smp_149200.1 >> Smp_149200.1 >> Smp_076030.1 >> Smp_033050.1 >> Smp_033050.1 >> Smp_004780.1 >> Smp_158320.1 >> Smp_064040 >> Smp_161970.1 >> Smp_104680.1 >> Smp_034980.1 >> Smp_035430.1 >> Smp_025570.1 >> Smp_025570.1 >> Smp_042590.1 >> Smp_063120.1 >> Smp_063120.1 >> Smp_160770.1 >> Smp_160770.1 >> Smp_045430.1 >> Smp_054790.1 >> Smp_054340.1 >> Smp_054340.1 >> Smp_042270.1 >> Smp_042270.1 >> Smp_038080.1 >> Smp_005080.1 >> Smp_005080.1 >> Smp_066630.1 >> Smp_046880.1 >> Smp_053510.1 >> Smp_173180.1 >> Smp_141630.1 >> Smp_141630.1 >> Smp_141630.1 >> Smp_124820.1 >> Smp_046090.1 >> Smp_096450 >> Smp_120320 >> Smp_075220.1 >> Smp_075220.1 >> Smp_000030.1 >> Smp_000030.1 >> Smp_076630.1 >> Smp_030930.1 >> Smp_142010.1 >> Smp_038870.1 >> Smp_050360 >> Smp_033670.1 >> Smp_033670.1 >> Smp_033670.1 >> Smp_058150.1 >> Smp_031040 >> Smp_024060.1 >> Smp_001410 >> Smp_031950.1 >> Smp_009420 >> Smp_103930.1 >> Smp_103930.1 >> Smp_103930.1 >> Smp_103930.1 >> Smp_035270.1 >> Smp_048260.1 >> Smp_130790.1 >> Smp_042670.1 >> Smp_031000.1 >> Smp_138970.1 >> Smp_042430.1 >> Smp_064020.2 >> Smp_064020.2 >> Smp_064020.2 >> Smp_079050.1 >> Smp_069220 >> Smp_055420.1 >> Smp_007630.1 >> Smp_063580.1 >> Smp_063580.1 >> Smp_156440.2 >> Smp_096780.1 >> Smp_106080.1 >> Smp_066960.1 >> Smp_082560.1 >> Smp_175120.1 >> Smp_085310.1 >> Smp_045750 >> Smp_140450.1 >> Smp_010230 >> Smp_061310.1 >> Smp_061310.1 >> Smp_084140.1 >> Smp_084140.1 >> Smp_095910.1 >> Smp_017290.3 >> Smp_165810.2 >> Smp_024360 >> Smp_030920.1 >> Smp_032580.1 >> Smp_032580.1 >> Smp_121430.1 >> Smp_074500 >> Smp_000740.1 >> Smp_066340 >> Smp_048280.1 >> Smp_098890.1 >> Smp_098890.1 >> Smp_045950.1 >> Smp_079430.1 >> Smp_052280.1 >> Smp_065190 >> Smp_165140.1 >> Smp_150850.1 >> Smp_121610.1 >> Smp_121610.1 >> Smp_128890.1 >> Smp_014400.1 >> Smp_014400.1 >> Smp_014400.1 >> Smp_049730 >> Smp_168850.1 >> Smp_101370 >> Smp_006250.1 >> Smp_053220.1 >> >> Cheers >> S=E9bastien >> >> >>> Maybe the best way will be to remove redundancy into the gene list >>> (from = >> >>> leaves) because I get 2 Smp_150040 entries. >>> = >> >>> Then extract alignment(s) for every geneID. >>> = >> >>> = >> >>> What do you think about ? >>> = >> >>> Cheers >>> S=E9bastien >>> = >> >>>>> Hi Sebastien, >>>>> >>>>> The example you give looks strange because a gene should only be = >> >>>>> represented by >>>>> one transcript in a given tree. >>>> I think I can find several other examples like this one. >>>> >>>>> Anyone got an idea why a gene can be represented by 2 transcripts >>>>> in th= >> e = >> >>>>> same >>>>> tree ? >>>> It seems these transcripts are too distant and that they cannot be = >> >>>> merged as one single gene. >>>> >>>>> I'll see how to fix the API for cases like this. >>>> > >>>>> Cheers >>>>> >>>>> J-K >>>> Thanks >>>> S=E9bastien >>>> >>>>> Quoting Sebastien MORETTI <Seb...@un...>: >>>>> >>>>>> Hi, >>>>>> >>>>>> we use your API which is very powerful. >>>>>> But there is a problem when we try to get the alignment for >>>>>> families = >> =3D >>>>>> >>>>>> where one GID (GeneID) is the same for 2 or more leaves: >>>>>> e.g. 1 Family TF101024 >>>>>> 1 GID Smp_150040 >>>>>> 2 leaves Smp_150040.1 & Smp_150040.2 >>>>>> >>>>>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >> =3D >>>>>> >>>>>> Smp_150040.2 >>>>>> But the get_alignment method returns twice Smp_150040.1, with the >>>>>> same= >> =3D >>>>>> >>>>>> headers and sequences. Smp_150040.1 is duplicated. >>>>>> >>>>>> >>>>>> How to resolve this ? >>>>>> >>>>>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |