Re: [Treesoft-treefam] How to extract 2 or more leaves with the same GID ?
Brought to you by:
lh3lh3
From: Sebastien M. <Seb...@un...> - 2008-08-25 13:05:57
|
Here is a list of duplicated entries I found. All of them seem to come from Schistosoma mansoni species. Smp_150040.1 Smp_062300 Smp_028100.1 Smp_059750 Smp_136850.1 Smp_165350.1 Smp_138920.1 Smp_176260.1 Smp_159900.1 Smp_074070.1 Smp_061920.1 Smp_093750.1 Smp_103460.1 Smp_060940 Smp_176230.1 Smp_041430.1 Smp_034840.1 Smp_034840.1 Smp_153760.1 Smp_055760 Smp_094050.1 Smp_009650.1 Smp_069130.1 Smp_106130.1 Smp_049300.1 Smp_049300.1 Smp_049300.1 Smp_144010.1 Smp_162800.1 Smp_059290.1 Smp_176200.1 Smp_163420.1 Smp_020920.1 Smp_035200.1 Smp_035200.1 Smp_035200.1 Smp_049600.1 Smp_049600.1 Smp_049600.1 Smp_005070.1 Smp_138680.1 Smp_158110.1 Smp_004470.1 Smp_004470.1 Smp_035720 Smp_147250.1 Smp_128200.1 Smp_149200.1 Smp_149200.1 Smp_076030.1 Smp_033050.1 Smp_033050.1 Smp_004780.1 Smp_158320.1 Smp_064040 Smp_161970.1 Smp_104680.1 Smp_034980.1 Smp_035430.1 Smp_025570.1 Smp_025570.1 Smp_042590.1 Smp_063120.1 Smp_063120.1 Smp_160770.1 Smp_160770.1 Smp_045430.1 Smp_054790.1 Smp_054340.1 Smp_054340.1 Smp_042270.1 Smp_042270.1 Smp_038080.1 Smp_005080.1 Smp_005080.1 Smp_066630.1 Smp_046880.1 Smp_053510.1 Smp_173180.1 Smp_141630.1 Smp_141630.1 Smp_141630.1 Smp_124820.1 Smp_046090.1 Smp_096450 Smp_120320 Smp_075220.1 Smp_075220.1 Smp_000030.1 Smp_000030.1 Smp_076630.1 Smp_030930.1 Smp_142010.1 Smp_038870.1 Smp_050360 Smp_033670.1 Smp_033670.1 Smp_033670.1 Smp_058150.1 Smp_031040 Smp_024060.1 Smp_001410 Smp_031950.1 Smp_009420 Smp_103930.1 Smp_103930.1 Smp_103930.1 Smp_103930.1 Smp_035270.1 Smp_048260.1 Smp_130790.1 Smp_042670.1 Smp_031000.1 Smp_138970.1 Smp_042430.1 Smp_064020.2 Smp_064020.2 Smp_064020.2 Smp_079050.1 Smp_069220 Smp_055420.1 Smp_007630.1 Smp_063580.1 Smp_063580.1 Smp_156440.2 Smp_096780.1 Smp_106080.1 Smp_066960.1 Smp_082560.1 Smp_175120.1 Smp_085310.1 Smp_045750 Smp_140450.1 Smp_010230 Smp_061310.1 Smp_061310.1 Smp_084140.1 Smp_084140.1 Smp_095910.1 Smp_017290.3 Smp_165810.2 Smp_024360 Smp_030920.1 Smp_032580.1 Smp_032580.1 Smp_121430.1 Smp_074500 Smp_000740.1 Smp_066340 Smp_048280.1 Smp_098890.1 Smp_098890.1 Smp_045950.1 Smp_079430.1 Smp_052280.1 Smp_065190 Smp_165140.1 Smp_150850.1 Smp_121610.1 Smp_121610.1 Smp_128890.1 Smp_014400.1 Smp_014400.1 Smp_014400.1 Smp_049730 Smp_168850.1 Smp_101370 Smp_006250.1 Smp_053220.1 Cheers Sébastien > Maybe the best way will be to remove redundancy into the gene list (from > leaves) because I get 2 Smp_150040 entries. > > Then extract alignment(s) for every geneID. > > > What do you think about ? > > Cheers > Sébastien > >>> Hi Sebastien, >>> >>> The example you give looks strange because a gene should only be >>> represented by >>> one transcript in a given tree. >> I think I can find several other examples like this one. >> >>> Anyone got an idea why a gene can be represented by 2 transcripts in the >>> same >>> tree ? >> It seems these transcripts are too distant and that they cannot be >> merged as one single gene. >> >>> I'll see how to fix the API for cases like this. >> > >>> Cheers >>> >>> J-K >> Thanks >> Sébastien >> >>> Quoting Sebastien MORETTI <Seb...@un...>: >>> >>>> Hi, >>>> >>>> we use your API which is very powerful. >>>> But there is a problem when we try to get the alignment for families = >>>> >>>> where one GID (GeneID) is the same for 2 or more leaves: >>>> e.g. 1 Family TF101024 >>>> 1 GID Smp_150040 >>>> 2 leaves Smp_150040.1 & Smp_150040.2 >>>> >>>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >>>> >>>> Smp_150040.2 >>>> But the get_alignment method returns twice Smp_150040.1, with the same = >>>> >>>> headers and sequences. Smp_150040.1 is duplicated. >>>> >>>> >>>> How to resolve this ? >>>> >>>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |