Re: [Treesoft-treefam] How to extract 2 or more leaves with the same GID ?
Brought to you by:
lh3lh3
|
From: Sebastien M. <Seb...@un...> - 2008-08-25 13:49:47
|
It works properly for families I tested. Thanks Best regards Sébastien > Hi, > > I have fixed the API. You'll need to check out the new Tree.pm module > from the > Subversion repository. As I am on holidays I have only just quickly > tested the > changes. Let me know if it doesn't work. > > Cheers > > J-K > > Quoting Sebastien MORETTI <Seb...@un...>: > >> Here is a list of duplicated entries I found. >> All of them seem to come from Schistosoma mansoni species. >> >> Smp_150040.1 >> Smp_062300 >> Smp_028100.1 >> Smp_059750 >> Smp_136850.1 >> Smp_165350.1 >> Smp_138920.1 >> Smp_176260.1 >> Smp_159900.1 >> Smp_074070.1 >> Smp_061920.1 >> Smp_093750.1 >> Smp_103460.1 >> Smp_060940 >> Smp_176230.1 >> Smp_041430.1 >> Smp_034840.1 >> Smp_034840.1 >> Smp_153760.1 >> Smp_055760 >> Smp_094050.1 >> Smp_009650.1 >> Smp_069130.1 >> Smp_106130.1 >> Smp_049300.1 >> Smp_049300.1 >> Smp_049300.1 >> Smp_144010.1 >> Smp_162800.1 >> Smp_059290.1 >> Smp_176200.1 >> Smp_163420.1 >> Smp_020920.1 >> Smp_035200.1 >> Smp_035200.1 >> Smp_035200.1 >> Smp_049600.1 >> Smp_049600.1 >> Smp_049600.1 >> Smp_005070.1 >> Smp_138680.1 >> Smp_158110.1 >> Smp_004470.1 >> Smp_004470.1 >> Smp_035720 >> Smp_147250.1 >> Smp_128200.1 >> Smp_149200.1 >> Smp_149200.1 >> Smp_076030.1 >> Smp_033050.1 >> Smp_033050.1 >> Smp_004780.1 >> Smp_158320.1 >> Smp_064040 >> Smp_161970.1 >> Smp_104680.1 >> Smp_034980.1 >> Smp_035430.1 >> Smp_025570.1 >> Smp_025570.1 >> Smp_042590.1 >> Smp_063120.1 >> Smp_063120.1 >> Smp_160770.1 >> Smp_160770.1 >> Smp_045430.1 >> Smp_054790.1 >> Smp_054340.1 >> Smp_054340.1 >> Smp_042270.1 >> Smp_042270.1 >> Smp_038080.1 >> Smp_005080.1 >> Smp_005080.1 >> Smp_066630.1 >> Smp_046880.1 >> Smp_053510.1 >> Smp_173180.1 >> Smp_141630.1 >> Smp_141630.1 >> Smp_141630.1 >> Smp_124820.1 >> Smp_046090.1 >> Smp_096450 >> Smp_120320 >> Smp_075220.1 >> Smp_075220.1 >> Smp_000030.1 >> Smp_000030.1 >> Smp_076630.1 >> Smp_030930.1 >> Smp_142010.1 >> Smp_038870.1 >> Smp_050360 >> Smp_033670.1 >> Smp_033670.1 >> Smp_033670.1 >> Smp_058150.1 >> Smp_031040 >> Smp_024060.1 >> Smp_001410 >> Smp_031950.1 >> Smp_009420 >> Smp_103930.1 >> Smp_103930.1 >> Smp_103930.1 >> Smp_103930.1 >> Smp_035270.1 >> Smp_048260.1 >> Smp_130790.1 >> Smp_042670.1 >> Smp_031000.1 >> Smp_138970.1 >> Smp_042430.1 >> Smp_064020.2 >> Smp_064020.2 >> Smp_064020.2 >> Smp_079050.1 >> Smp_069220 >> Smp_055420.1 >> Smp_007630.1 >> Smp_063580.1 >> Smp_063580.1 >> Smp_156440.2 >> Smp_096780.1 >> Smp_106080.1 >> Smp_066960.1 >> Smp_082560.1 >> Smp_175120.1 >> Smp_085310.1 >> Smp_045750 >> Smp_140450.1 >> Smp_010230 >> Smp_061310.1 >> Smp_061310.1 >> Smp_084140.1 >> Smp_084140.1 >> Smp_095910.1 >> Smp_017290.3 >> Smp_165810.2 >> Smp_024360 >> Smp_030920.1 >> Smp_032580.1 >> Smp_032580.1 >> Smp_121430.1 >> Smp_074500 >> Smp_000740.1 >> Smp_066340 >> Smp_048280.1 >> Smp_098890.1 >> Smp_098890.1 >> Smp_045950.1 >> Smp_079430.1 >> Smp_052280.1 >> Smp_065190 >> Smp_165140.1 >> Smp_150850.1 >> Smp_121610.1 >> Smp_121610.1 >> Smp_128890.1 >> Smp_014400.1 >> Smp_014400.1 >> Smp_014400.1 >> Smp_049730 >> Smp_168850.1 >> Smp_101370 >> Smp_006250.1 >> Smp_053220.1 >> >> Cheers >> S=E9bastien >> >> >>> Maybe the best way will be to remove redundancy into the gene list >>> (from = >> >>> leaves) because I get 2 Smp_150040 entries. >>> = >> >>> Then extract alignment(s) for every geneID. >>> = >> >>> = >> >>> What do you think about ? >>> = >> >>> Cheers >>> S=E9bastien >>> = >> >>>>> Hi Sebastien, >>>>> >>>>> The example you give looks strange because a gene should only be = >> >>>>> represented by >>>>> one transcript in a given tree. >>>> I think I can find several other examples like this one. >>>> >>>>> Anyone got an idea why a gene can be represented by 2 transcripts >>>>> in th= >> e = >> >>>>> same >>>>> tree ? >>>> It seems these transcripts are too distant and that they cannot be = >> >>>> merged as one single gene. >>>> >>>>> I'll see how to fix the API for cases like this. >>>> > >>>>> Cheers >>>>> >>>>> J-K >>>> Thanks >>>> S=E9bastien >>>> >>>>> Quoting Sebastien MORETTI <Seb...@un...>: >>>>> >>>>>> Hi, >>>>>> >>>>>> we use your API which is very powerful. >>>>>> But there is a problem when we try to get the alignment for >>>>>> families = >> =3D >>>>>> >>>>>> where one GID (GeneID) is the same for 2 or more leaves: >>>>>> e.g. 1 Family TF101024 >>>>>> 1 GID Smp_150040 >>>>>> 2 leaves Smp_150040.1 & Smp_150040.2 >>>>>> >>>>>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >> =3D >>>>>> >>>>>> Smp_150040.2 >>>>>> But the get_alignment method returns twice Smp_150040.1, with the >>>>>> same= >> =3D >>>>>> >>>>>> headers and sequences. Smp_150040.1 is duplicated. >>>>>> >>>>>> >>>>>> How to resolve this ? >>>>>> >>>>>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |