treesoft-treefam Mailing List for softwares for phylogenetic trees
Brought to you by:
lh3lh3
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(5) |
Sep
(8) |
Oct
(3) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Sebastien M. <seb...@un...> - 2009-06-24 06:41:52
|
Hi, We noticed that some species were missing many times in CLEAN trees (and maybe in other kind of trees). This is the case for Ornithorhynchus anatinus (ORNAN) or Oryzias latipes (ORYLA). And Takifugu rubripes (FUGRU) is always displayed as loss in trees, and is never present. Is there an explanation ? Thanks -- Sébastien Moretti SIB Vital-IT EMBnet, Quartier Sorge - Genopode CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4056/4221 http://ch.embnet.org/ http://myhits.vital-it.ch/ |
From: Sebastien M. <seb...@un...> - 2009-04-14 11:34:42
|
Hi, It is the same for this family also: TF101053. It seems that some leaf sequences are not returned, and some cigar sequences also. But they are in the tree ! No cigar and aa sequences return for cr01.sctg9.wum.65.1 id. cr01.sctg9.wum.65.1 is missing from aa_full_align table. Something missing in TreeFam-7 database ? Or a problem with stop codon in sequences (*) upstream ? Here is the error message I got (for Treefam-7/Treefam/Tree.pm): Use of uninitialized value in length at Treefam/Tree.pm line 1000 (#1) (W uninitialized) An undefined value was used as if it were already defined. It was interpreted as a "" or a 0, but maybe it was a mistake. To suppress this warning assign a defined value to your variables. To help you figure out what was undefined, perl tells you what operation you used the undefined value in. Note, however, that perl optimizes your program and the operation displayed in the warning may not necessarily appear literally in your program. For example, "that $foo" is usually optimized into "that " . $foo, and the warning will refer to the concatenation (.) operator, even though there is no . in your program. Use of uninitialized value in substitution (s///) at Treefam/Tree.pm line 1006 (#1) Use of uninitialized value in substitution (s///) at Treefam/Tree.pm line 1007 (#1) Use of uninitialized value in concatenation (.) or string at Treefam/Tree.pm line 960 (#1) Use of uninitialized value in length at Treefam/Tree.pm line 1027 (#1) Use of uninitialized value in concatenation (.) or string at Treefam/Tree.pm line 961 (#1) Use of uninitialized value in substitution (s///) at Treefam/Tree.pm line 1002 (#1) Use of uninitialized value in substitution (s///) at Treefam/Tree.pm line 1003 (#1) Use of uninitialized value in concatenation (.) or string at Treefam/Tree.pm line 969 (#1) Use of uninitialized value in concatenation (.) or string at Treefam/Tree.pm line 970 (#1) Thanks > Hi, > > Have you been able to solve this problem ? > I have other families with, it seems, this kind of problem. > > Regards > Sébastien > >> Hi Sebastien, >> >> I am away this week but I'll have a look when I am back, unless >> someone comes up >> with an answer before. >> >> Cheers >> >> J-K >> >> Quoting Sebastien MORETTI <seb...@un...>: >> >>> Hi, >>> >>> It seems that TreeFam API fails to return some genes for this family: = >>> >>> TF105900. >>> >>> Both from the web site and in command line, the API returns >>> 58 protein sequences for the 'clean' protein alignment, >>> 58 external branches for the 'clean' tree >>> BUT 56 nucleotides sequences for the 'clean' nucleotide alignment. >>> >>> BGIOSIFCE019517.1_ORYSA & BGIBMGA012703_BOMMO are not in the 'clean' = >>> >>> nucleotide alignment. >>> >>> Do you have an idea why it happens ? >>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 http://bioinfo.unil.ch/ |
From: Sebastien M. <seb...@un...> - 2009-03-11 14:51:15
|
Hi, It seems that TreeFam API fails to return some genes for this family: TF105900. Both from the web site and in command line, the API returns 58 protein sequences for the 'clean' protein alignment, 58 external branches for the 'clean' tree BUT 56 nucleotides sequences for the 'clean' nucleotide alignment. BGIOSIFCE019517.1_ORYSA & BGIBMGA012703_BOMMO are not in the 'clean' nucleotide alignment. Do you have an idea why it happens ? Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 http://bioinfo.unil.ch/ |
From: Sebastien M. <seb...@un...> - 2009-03-03 12:48:00
|
Hi I wonder how TreeFam7 deals with new species (e.g. cat). I have found some examples (e.g. TF105900, TF105200) with new species available in clean and full trees but without taxonomy information or bootstrap values for internal nodes. Is it normal ? Thanks -- Sébastien Moretti SIB EMBnet, Quartier Sorge - Genopode CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4056/4221 http://ch.embnet.org/ http://myhits.vital-it.ch/ |
From: Sebastien M. <seb...@un...> - 2009-02-16 13:48:25
|
Hello, when do you plan to release MySQL dumps, and other data, from TreeFam 7 ? FTP data and links from www.treefam.org are always TreeFam 6 related. Thanks -- Sébastien Moretti SIB EMBnet, Quartier Sorge - Genopode CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4056/4221 http://ch.embnet.org/ http://myhits.vital-it.ch/ |
From: Sebastien M. <seb...@un...> - 2009-02-09 08:17:59
|
Hello, when do you plan to release TreeFam 7 ? A TreeFam_7 database is now available on TreeFAM MySQL. Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 http://bioinfo.unil.ch/ |
From: Sebastien M. <Seb...@un...> - 2008-10-22 08:06:57
|
> Hi Sebastien, > > The current behaviour is to die on bad trees. I think this is the safest > procedure as not doing so would allow further processing of bad trees. > This would generate errors difficult to trace both in other parts of the > API and in any code using the API. > However, the error message could be more explicit and report on the > family AC and type of tree causing the error. Would you also have the > message say 'bad tree' instead of 'unrecognized format' ? > > J-K Yes it is safer to die on bad trees. The 'bad tree' message would be more consistent with the web site message. Thanks Sébastien > On Tue, 2008-10-21 at 12:15 +0200, Sebastien MORETTI wrote: >> Hi, >> >> the API should return the same thing than the web site for 'Bad Tree' >> like TF352160. >> >> It should be clearer for users. >> >> What do you think about it ? -- Sébastien Moretti |
From: Jean-Karim H. <jk...@sa...> - 2008-10-22 07:22:34
|
Hi Sebastien, The current behaviour is to die on bad trees. I think this is the safest procedure as not doing so would allow further processing of bad trees. This would generate errors difficult to trace both in other parts of the API and in any code using the API. However, the error message could be more explicit and report on the family AC and type of tree causing the error. Would you also have the message say 'bad tree' instead of 'unrecognized format' ? J-K On Tue, 2008-10-21 at 12:15 +0200, Sebastien MORETTI wrote: > Hi, > > the API should return the same thing than the web site for 'Bad Tree' > like TF352160. > > It should be clearer for users. > > What do you think about it ? > |
From: Sebastien M. <seb...@un...> - 2008-10-21 10:15:37
|
Hi, the API should return the same thing than the web site for 'Bad Tree' like TF352160. It should be clearer for users. What do you think about it ? -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 http://bioinfo.unil.ch/ |
From: Sebastien M. <seb...@un...> - 2008-09-30 15:03:28
|
Hi, we explore a lot of TreeFam trees to get, by example, Euteleostomi sub-trees. $tree->get_nodes_by_tag_value(-S=>'Euteleostomi'); It works perfectly with the API except when there is no Euteleostomi node in the whole tree. The API returns nothing (that is the right thing I think so) but sometimes, a Tetrapoda node is available. So, my question is how to explore trees with child taxa of Euteleostomi when Euteleostomi is absent ? With Tetrapoda, else with Theria, else with Eutheria, ... Via the API or via MySQL. We could hard-code child taxa names but this will not be clean. An idea ? Thanks -- Sébastien Moretti |
From: Sebastien M. <seb...@un...> - 2008-09-30 14:50:48
|
> Hi Sebastien, > > Historically, the seed trees were derived from the PhIGs clusters and > used to create the families. > Now, seed trees should only appear for curated families and are used as > a constraint on building the clean trees for these families. The > families are now built using the previous version of Treefam as seeds. > Clean trees use only fully sequenced species while full trees use all > the other available sequences. > The difference also lies in the building process since DNA alignments > are available and used for clean trees but not for full trees. > So the choice depends on the requirements of your project. We will need TreeFam B soon. Thus, using clean trees will be more consistent between TreeFam A & B. And we need DNA alignments also. > In the case of TF106228, there is clearly something gone wrong as the > genes of the seed tree appear in TF352211. I believe that TF352211 > should have been mapped to TF106228 and a new family created for the 2 > orphan worm genes but it seems that the reverse happened. > Jue, do you have any idea about this ? > > Cheers > > J-K > > On Thu, 2008-09-25 at 10:45 +0200, Sebastien Moretti wrote: >> Hi >> >> Following 'Instructions' link on family pages I thought that Clean trees >> are clean because they use only sequences from sequenced species. And >> they are not manually curated compared to seed trees. >> Most of the time clean trees are larger than seed trees (e.g. TF101001). >> >> In some cases, e.g. TF106228, clean trees are smaller than seed trees. >> Although only sequenced species are available in the seed tree AND in >> the clean tree. >> >> >> So, what are the real differences between Clean, Seed and Full trees in >> TreeFam ? >> >> Only Clean trees in TreeFam B ? >> >> What should be the best kind of tree for large scale phylogenetic studies ? >> >> Thanks -- Sébastien Moretti |
From: Jean-Karim H. <jk...@sa...> - 2008-09-25 16:05:54
|
Hi Sebastien, Historically, the seed trees were derived from the PhIGs clusters and used to create the families. Now, seed trees should only appear for curated families and are used as a constraint on building the clean trees for these families. The families are now built using the previous version of Treefam as seeds. Clean trees use only fully sequenced species while full trees use all the other available sequences. The difference also lies in the building process since DNA alignments are available and used for clean trees but not for full trees. So the choice depends on the requirements of your project. In the case of TF106228, there is clearly something gone wrong as the genes of the seed tree appear in TF352211. I believe that TF352211 should have been mapped to TF106228 and a new family created for the 2 orphan worm genes but it seems that the reverse happened. Jue, do you have any idea about this ? Cheers J-K On Thu, 2008-09-25 at 10:45 +0200, Sebastien Moretti wrote: > Hi > > Following 'Instructions' link on family pages I thought that Clean trees > are clean because they use only sequences from sequenced species. And > they are not manually curated compared to seed trees. > Most of the time clean trees are larger than seed trees (e.g. TF101001). > > In some cases, e.g. TF106228, clean trees are smaller than seed trees. > Although only sequenced species are available in the seed tree AND in > the clean tree. > > > So, what are the real differences between Clean, Seed and Full trees in > TreeFam ? > > Only Clean trees in TreeFam B ? > > What should be the best kind of tree for large scale phylogenetic studies ? > > Thanks > |
From: Sebastien M. <Seb...@un...> - 2008-09-25 15:10:47
|
Hi Following 'Instructions' link on family pages I thought that Clean trees are clean because they use only sequences from sequenced species. And they are not manually curated compared to seed trees. Most of the time clean trees are larger than seed trees (e.g. TF101001). In some cases, e.g. TF106228, clean trees are smaller than seed trees. Although only sequenced species are available in the seed tree AND in the clean tree. So, what are the real differences between Clean, Seed and Full trees in TreeFam ? Only Clean trees in TreeFam B ? What should be the best kind of tree for large scale phylogenetic studies ? Thanks -- Sébastien Moretti |
From: Sebastien M. <Seb...@un...> - 2008-09-05 14:39:03
|
The same kind of thing for family TF105088 with 1.3 distance >> Hi Sebastien, >> >> Is the API not working because of this ? > > The API works for this family. > But it should cause some problems with some tree viewers. > >> I believe that the API produces a correct tree and that the database is >> possibly wrong: the database tree should have an empty [&&NHX] (this was >> the case in previous versions of Treefam). Having [&&NHX] makes the >> nodes consistent and I think this doesn't violate the specifications of >> the format. >> >> This is down to the interpretation of the format: Should a node with no >> information get an empty [&&NHX] or nothing ? >> What do other treefamers think ? >> >> J-K >> >> >> On Fri, 2008-09-05 at 11:18 +0200, Sebastien Moretti wrote: >>> Hi, >>> It seems that API produces an invalid HNX tree for TF102048 family. >>> >>> From the database (TreeFam6), the tree looks proper. >>> Here is a view: >>> ... >>> T=8090:G=ENSORLG00000004203]):1.7):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX >>> ... >>> >>> There is no NHX information for the 1.7 branch long. >>> Nevertheless, the API returns this: >>> ... >>> T=8090:G=ENSORLG00000004203]):1.7[&&NHX]):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX >>> ... >>> >>> >>> 1.7[&&NHX] instead of 1.7 >>> >>> How to fix this ? >>> Thanks -- Sébastien Moretti SIB EMBnet, Quartier Sorge - Genopode CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4056/4221 |
From: Sebastien M. <Seb...@un...> - 2008-09-05 13:49:55
|
> Hi Sebastien, > > Is the API not working because of this ? The API works for this family. But it should cause some problems with some tree viewers. > I believe that the API produces a correct tree and that the database is > possibly wrong: the database tree should have an empty [&&NHX] (this was > the case in previous versions of Treefam). Having [&&NHX] makes the > nodes consistent and I think this doesn't violate the specifications of > the format. > > This is down to the interpretation of the format: Should a node with no > information get an empty [&&NHX] or nothing ? > What do other treefamers think ? > > J-K > > > On Fri, 2008-09-05 at 11:18 +0200, Sebastien Moretti wrote: >> Hi, >> It seems that API produces an invalid HNX tree for TF102048 family. >> >> From the database (TreeFam6), the tree looks proper. >> Here is a view: >> ... >> T=8090:G=ENSORLG00000004203]):1.7):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX >> ... >> >> There is no NHX information for the 1.7 branch long. >> Nevertheless, the API returns this: >> ... >> T=8090:G=ENSORLG00000004203]):1.7[&&NHX]):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX >> ... >> >> >> 1.7[&&NHX] instead of 1.7 >> >> How to fix this ? >> Thanks -- Sébastien Moretti SIB EMBnet, Quartier Sorge - Genopode CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4056/4221 |
From: Jean-Karim H. <jk...@sa...> - 2008-09-05 12:16:03
|
Hi Sebastien, Is the API not working because of this ? I believe that the API produces a correct tree and that the database is possibly wrong: the database tree should have an empty [&&NHX] (this was the case in previous versions of Treefam). Having [&&NHX] makes the nodes consistent and I think this doesn't violate the specifications of the format. This is down to the interpretation of the format: Should a node with no information get an empty [&&NHX] or nothing ? What do other treefamers think ? J-K On Fri, 2008-09-05 at 11:18 +0200, Sebastien Moretti wrote: > Hi, > It seems that API produces an invalid HNX tree for TF102048 family. > > From the database (TreeFam6), the tree looks proper. > Here is a view: > ... > T=8090:G=ENSORLG00000004203]):1.7):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX > ... > > There is no NHX information for the 1.7 branch long. > Nevertheless, the API returns this: > ... > T=8090:G=ENSORLG00000004203]):1.7[&&NHX]):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX > ... > > > 1.7[&&NHX] instead of 1.7 > > How to fix this ? > Thanks > |
From: Sebastien M. <Seb...@un...> - 2008-09-05 09:18:58
|
Hi, It seems that API produces an invalid HNX tree for TF102048 family. From the database (TreeFam6), the tree looks proper. Here is a view: ... T=8090:G=ENSORLG00000004203]):1.7):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX ... There is no NHX information for the 1.7 branch long. Nevertheless, the API returns this: ... T=8090:G=ENSORLG00000004203]):1.7[&&NHX]):0.070715[&&NHX:S=Percomorpha:D=N:B=23]):0.104119[&&NHX ... 1.7[&&NHX] instead of 1.7 How to fix this ? Thanks -- Sébastien Moretti SIB EMBnet, Quartier Sorge - Genopode CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4056/4221 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 13:49:47
|
It works properly for families I tested. Thanks Best regards Sébastien > Hi, > > I have fixed the API. You'll need to check out the new Tree.pm module > from the > Subversion repository. As I am on holidays I have only just quickly > tested the > changes. Let me know if it doesn't work. > > Cheers > > J-K > > Quoting Sebastien MORETTI <Seb...@un...>: > >> Here is a list of duplicated entries I found. >> All of them seem to come from Schistosoma mansoni species. >> >> Smp_150040.1 >> Smp_062300 >> Smp_028100.1 >> Smp_059750 >> Smp_136850.1 >> Smp_165350.1 >> Smp_138920.1 >> Smp_176260.1 >> Smp_159900.1 >> Smp_074070.1 >> Smp_061920.1 >> Smp_093750.1 >> Smp_103460.1 >> Smp_060940 >> Smp_176230.1 >> Smp_041430.1 >> Smp_034840.1 >> Smp_034840.1 >> Smp_153760.1 >> Smp_055760 >> Smp_094050.1 >> Smp_009650.1 >> Smp_069130.1 >> Smp_106130.1 >> Smp_049300.1 >> Smp_049300.1 >> Smp_049300.1 >> Smp_144010.1 >> Smp_162800.1 >> Smp_059290.1 >> Smp_176200.1 >> Smp_163420.1 >> Smp_020920.1 >> Smp_035200.1 >> Smp_035200.1 >> Smp_035200.1 >> Smp_049600.1 >> Smp_049600.1 >> Smp_049600.1 >> Smp_005070.1 >> Smp_138680.1 >> Smp_158110.1 >> Smp_004470.1 >> Smp_004470.1 >> Smp_035720 >> Smp_147250.1 >> Smp_128200.1 >> Smp_149200.1 >> Smp_149200.1 >> Smp_076030.1 >> Smp_033050.1 >> Smp_033050.1 >> Smp_004780.1 >> Smp_158320.1 >> Smp_064040 >> Smp_161970.1 >> Smp_104680.1 >> Smp_034980.1 >> Smp_035430.1 >> Smp_025570.1 >> Smp_025570.1 >> Smp_042590.1 >> Smp_063120.1 >> Smp_063120.1 >> Smp_160770.1 >> Smp_160770.1 >> Smp_045430.1 >> Smp_054790.1 >> Smp_054340.1 >> Smp_054340.1 >> Smp_042270.1 >> Smp_042270.1 >> Smp_038080.1 >> Smp_005080.1 >> Smp_005080.1 >> Smp_066630.1 >> Smp_046880.1 >> Smp_053510.1 >> Smp_173180.1 >> Smp_141630.1 >> Smp_141630.1 >> Smp_141630.1 >> Smp_124820.1 >> Smp_046090.1 >> Smp_096450 >> Smp_120320 >> Smp_075220.1 >> Smp_075220.1 >> Smp_000030.1 >> Smp_000030.1 >> Smp_076630.1 >> Smp_030930.1 >> Smp_142010.1 >> Smp_038870.1 >> Smp_050360 >> Smp_033670.1 >> Smp_033670.1 >> Smp_033670.1 >> Smp_058150.1 >> Smp_031040 >> Smp_024060.1 >> Smp_001410 >> Smp_031950.1 >> Smp_009420 >> Smp_103930.1 >> Smp_103930.1 >> Smp_103930.1 >> Smp_103930.1 >> Smp_035270.1 >> Smp_048260.1 >> Smp_130790.1 >> Smp_042670.1 >> Smp_031000.1 >> Smp_138970.1 >> Smp_042430.1 >> Smp_064020.2 >> Smp_064020.2 >> Smp_064020.2 >> Smp_079050.1 >> Smp_069220 >> Smp_055420.1 >> Smp_007630.1 >> Smp_063580.1 >> Smp_063580.1 >> Smp_156440.2 >> Smp_096780.1 >> Smp_106080.1 >> Smp_066960.1 >> Smp_082560.1 >> Smp_175120.1 >> Smp_085310.1 >> Smp_045750 >> Smp_140450.1 >> Smp_010230 >> Smp_061310.1 >> Smp_061310.1 >> Smp_084140.1 >> Smp_084140.1 >> Smp_095910.1 >> Smp_017290.3 >> Smp_165810.2 >> Smp_024360 >> Smp_030920.1 >> Smp_032580.1 >> Smp_032580.1 >> Smp_121430.1 >> Smp_074500 >> Smp_000740.1 >> Smp_066340 >> Smp_048280.1 >> Smp_098890.1 >> Smp_098890.1 >> Smp_045950.1 >> Smp_079430.1 >> Smp_052280.1 >> Smp_065190 >> Smp_165140.1 >> Smp_150850.1 >> Smp_121610.1 >> Smp_121610.1 >> Smp_128890.1 >> Smp_014400.1 >> Smp_014400.1 >> Smp_014400.1 >> Smp_049730 >> Smp_168850.1 >> Smp_101370 >> Smp_006250.1 >> Smp_053220.1 >> >> Cheers >> S=E9bastien >> >> >>> Maybe the best way will be to remove redundancy into the gene list >>> (from = >> >>> leaves) because I get 2 Smp_150040 entries. >>> = >> >>> Then extract alignment(s) for every geneID. >>> = >> >>> = >> >>> What do you think about ? >>> = >> >>> Cheers >>> S=E9bastien >>> = >> >>>>> Hi Sebastien, >>>>> >>>>> The example you give looks strange because a gene should only be = >> >>>>> represented by >>>>> one transcript in a given tree. >>>> I think I can find several other examples like this one. >>>> >>>>> Anyone got an idea why a gene can be represented by 2 transcripts >>>>> in th= >> e = >> >>>>> same >>>>> tree ? >>>> It seems these transcripts are too distant and that they cannot be = >> >>>> merged as one single gene. >>>> >>>>> I'll see how to fix the API for cases like this. >>>> > >>>>> Cheers >>>>> >>>>> J-K >>>> Thanks >>>> S=E9bastien >>>> >>>>> Quoting Sebastien MORETTI <Seb...@un...>: >>>>> >>>>>> Hi, >>>>>> >>>>>> we use your API which is very powerful. >>>>>> But there is a problem when we try to get the alignment for >>>>>> families = >> =3D >>>>>> >>>>>> where one GID (GeneID) is the same for 2 or more leaves: >>>>>> e.g. 1 Family TF101024 >>>>>> 1 GID Smp_150040 >>>>>> 2 leaves Smp_150040.1 & Smp_150040.2 >>>>>> >>>>>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >> =3D >>>>>> >>>>>> Smp_150040.2 >>>>>> But the get_alignment method returns twice Smp_150040.1, with the >>>>>> same= >> =3D >>>>>> >>>>>> headers and sequences. Smp_150040.1 is duplicated. >>>>>> >>>>>> >>>>>> How to resolve this ? >>>>>> >>>>>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 13:05:57
|
Here is a list of duplicated entries I found. All of them seem to come from Schistosoma mansoni species. Smp_150040.1 Smp_062300 Smp_028100.1 Smp_059750 Smp_136850.1 Smp_165350.1 Smp_138920.1 Smp_176260.1 Smp_159900.1 Smp_074070.1 Smp_061920.1 Smp_093750.1 Smp_103460.1 Smp_060940 Smp_176230.1 Smp_041430.1 Smp_034840.1 Smp_034840.1 Smp_153760.1 Smp_055760 Smp_094050.1 Smp_009650.1 Smp_069130.1 Smp_106130.1 Smp_049300.1 Smp_049300.1 Smp_049300.1 Smp_144010.1 Smp_162800.1 Smp_059290.1 Smp_176200.1 Smp_163420.1 Smp_020920.1 Smp_035200.1 Smp_035200.1 Smp_035200.1 Smp_049600.1 Smp_049600.1 Smp_049600.1 Smp_005070.1 Smp_138680.1 Smp_158110.1 Smp_004470.1 Smp_004470.1 Smp_035720 Smp_147250.1 Smp_128200.1 Smp_149200.1 Smp_149200.1 Smp_076030.1 Smp_033050.1 Smp_033050.1 Smp_004780.1 Smp_158320.1 Smp_064040 Smp_161970.1 Smp_104680.1 Smp_034980.1 Smp_035430.1 Smp_025570.1 Smp_025570.1 Smp_042590.1 Smp_063120.1 Smp_063120.1 Smp_160770.1 Smp_160770.1 Smp_045430.1 Smp_054790.1 Smp_054340.1 Smp_054340.1 Smp_042270.1 Smp_042270.1 Smp_038080.1 Smp_005080.1 Smp_005080.1 Smp_066630.1 Smp_046880.1 Smp_053510.1 Smp_173180.1 Smp_141630.1 Smp_141630.1 Smp_141630.1 Smp_124820.1 Smp_046090.1 Smp_096450 Smp_120320 Smp_075220.1 Smp_075220.1 Smp_000030.1 Smp_000030.1 Smp_076630.1 Smp_030930.1 Smp_142010.1 Smp_038870.1 Smp_050360 Smp_033670.1 Smp_033670.1 Smp_033670.1 Smp_058150.1 Smp_031040 Smp_024060.1 Smp_001410 Smp_031950.1 Smp_009420 Smp_103930.1 Smp_103930.1 Smp_103930.1 Smp_103930.1 Smp_035270.1 Smp_048260.1 Smp_130790.1 Smp_042670.1 Smp_031000.1 Smp_138970.1 Smp_042430.1 Smp_064020.2 Smp_064020.2 Smp_064020.2 Smp_079050.1 Smp_069220 Smp_055420.1 Smp_007630.1 Smp_063580.1 Smp_063580.1 Smp_156440.2 Smp_096780.1 Smp_106080.1 Smp_066960.1 Smp_082560.1 Smp_175120.1 Smp_085310.1 Smp_045750 Smp_140450.1 Smp_010230 Smp_061310.1 Smp_061310.1 Smp_084140.1 Smp_084140.1 Smp_095910.1 Smp_017290.3 Smp_165810.2 Smp_024360 Smp_030920.1 Smp_032580.1 Smp_032580.1 Smp_121430.1 Smp_074500 Smp_000740.1 Smp_066340 Smp_048280.1 Smp_098890.1 Smp_098890.1 Smp_045950.1 Smp_079430.1 Smp_052280.1 Smp_065190 Smp_165140.1 Smp_150850.1 Smp_121610.1 Smp_121610.1 Smp_128890.1 Smp_014400.1 Smp_014400.1 Smp_014400.1 Smp_049730 Smp_168850.1 Smp_101370 Smp_006250.1 Smp_053220.1 Cheers Sébastien > Maybe the best way will be to remove redundancy into the gene list (from > leaves) because I get 2 Smp_150040 entries. > > Then extract alignment(s) for every geneID. > > > What do you think about ? > > Cheers > Sébastien > >>> Hi Sebastien, >>> >>> The example you give looks strange because a gene should only be >>> represented by >>> one transcript in a given tree. >> I think I can find several other examples like this one. >> >>> Anyone got an idea why a gene can be represented by 2 transcripts in the >>> same >>> tree ? >> It seems these transcripts are too distant and that they cannot be >> merged as one single gene. >> >>> I'll see how to fix the API for cases like this. >> > >>> Cheers >>> >>> J-K >> Thanks >> Sébastien >> >>> Quoting Sebastien MORETTI <Seb...@un...>: >>> >>>> Hi, >>>> >>>> we use your API which is very powerful. >>>> But there is a problem when we try to get the alignment for families = >>>> >>>> where one GID (GeneID) is the same for 2 or more leaves: >>>> e.g. 1 Family TF101024 >>>> 1 GID Smp_150040 >>>> 2 leaves Smp_150040.1 & Smp_150040.2 >>>> >>>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >>>> >>>> Smp_150040.2 >>>> But the get_alignment method returns twice Smp_150040.1, with the same = >>>> >>>> headers and sequences. Smp_150040.1 is duplicated. >>>> >>>> >>>> How to resolve this ? >>>> >>>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 12:57:14
|
Maybe the best way will be to remove redundancy into the gene list (from leaves) because I get 2 Smp_150040 entries. Then extract alignment(s) for every geneID. What do you think about ? Cheers Sébastien >> Hi Sebastien, >> >> The example you give looks strange because a gene should only be >> represented by >> one transcript in a given tree. > > I think I can find several other examples like this one. > >> Anyone got an idea why a gene can be represented by 2 transcripts in the >> same >> tree ? > > It seems these transcripts are too distant and that they cannot be > merged as one single gene. > >> I'll see how to fix the API for cases like this. > > >> Cheers >> >> J-K > > Thanks > Sébastien > >> Quoting Sebastien MORETTI <Seb...@un...>: >> >>> Hi, >>> >>> we use your API which is very powerful. >>> But there is a problem when we try to get the alignment for families = >>> >>> where one GID (GeneID) is the same for 2 or more leaves: >>> e.g. 1 Family TF101024 >>> 1 GID Smp_150040 >>> 2 leaves Smp_150040.1 & Smp_150040.2 >>> >>> the nhx method returns the proper tree with distinct Smp_150040.1 & = >>> >>> Smp_150040.2 >>> But the get_alignment method returns twice Smp_150040.1, with the same = >>> >>> headers and sequences. Smp_150040.1 is duplicated. >>> >>> >>> How to resolve this ? >>> >>> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 12:49:31
|
> Hi Sebastien, > > The example you give looks strange because a gene should only be > represented by > one transcript in a given tree. I think I can find several other examples like this one. > Anyone got an idea why a gene can be represented by 2 transcripts in the > same > tree ? It seems these transcripts are too distant and that they cannot be merged as one single gene. > I'll see how to fix the API for cases like this. > > Cheers > > J-K Thanks Sébastien > Quoting Sebastien MORETTI <Seb...@un...>: > >> Hi, >> >> we use your API which is very powerful. >> But there is a problem when we try to get the alignment for families = >> >> where one GID (GeneID) is the same for 2 or more leaves: >> e.g. 1 Family TF101024 >> 1 GID Smp_150040 >> 2 leaves Smp_150040.1 & Smp_150040.2 >> >> the nhx method returns the proper tree with distinct Smp_150040.1 & = >> >> Smp_150040.2 >> But the get_alignment method returns twice Smp_150040.1, with the same = >> >> headers and sequences. Smp_150040.1 is duplicated. >> >> >> How to resolve this ? >> >> Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Sebastien M. <Seb...@un...> - 2008-08-25 12:22:44
|
Hi, we use your API which is very powerful. But there is a problem when we try to get the alignment for families where one GID (GeneID) is the same for 2 or more leaves: e.g. 1 Family TF101024 1 GID Smp_150040 2 leaves Smp_150040.1 & Smp_150040.2 the nhx method returns the proper tree with distinct Smp_150040.1 & Smp_150040.2 But the get_alignment method returns twice Smp_150040.1, with the same headers and sequences. Smp_150040.1 is duplicated. How to resolve this ? Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |
From: Hardip P. <pat...@gm...> - 2008-07-21 04:22:28
|
hi there, seems like wrong place to submit the query that i have with treebest program but anyways, i will try. i want to use modified version of the default species tree obtained by "treebest spec" command. For that i have included SBIRD species (songbird, zebra finch) species in aves class which also includes chick species. somehow when inferring orthologs, the program does not seem to recognise the _SBIRD species and is ignoring it in the process. Any help on that is welcome. please. Cheers original species tree is : ((ORYSA*-4530.rice,ARATH*-3702)Magnoliophyta-3398, (SCHPO*-4896.S_pombe,YEAST*-4932)Ascomycota-4890, ((((((((((((HUMAN*-9606,PANTR*-9598.chimpanzee)Homo/Pan/ Gorilla-207598, MACMU*-9544.monkey)Catarrhini-9526, OTOGA-*30611.galago)Primates-9443, ((MOUSE*-10090,RAT*-10116)Murinae-39107,RABIT-9986) Glires-314147)Euarchontoglires-314146, ((BOVIN*-9913.cow,PIG-*9823)Cetartiodactyla-91561, (CANFA*-9615.dog,FELCA-*9685.cat)Carnivora-33554, SORAR-*42254.shrew, MYOLU-*59463.bat)Laurasiatheria-314145, (ECHTE-9371.tenrec,LOXAF-9785.elephant)Afrotheria-311790, DASNO-9361.armadillo)Eutheria-9347,MONDO*-13616.opossum) Theria-32525, ORNAN-*9258.platypus)Mammalia-40674, CHICK*-9031)Amniota-32524, XENTR*-8364.frog)Tetrapoda-32523, (BRARE*-7955.zebrafish, ((TETNG*-99883.pufferfish,FUGRU*-31033.pufferfish) Tetraodontidae-31031, (GASAC*-69293.stickleback,ORYLA*-8090.ricefish) Smegmamorpha-129949)Percomorpha-32485)Clupeocephala-186625) Euteleostomi-117571, (CIOIN*-7719,CIOSA*-51511)Ciona-7718)Chordata-7711, (((DROME*-7227.fly,DROPS*-7237.fly)Sophophora-32341, (AEDAE*-7159.mosquito,ANOGA*-7165.mosquito)Culicidae-7157) Diptera-7147, APIME-*7460.honeybee)Endopterygota-33392, SCHMA*-6183.fluke, (CAEEL*-6239.worm,CAEBR*-6238.worm,CAERE*-31234.worm) Caenorhabditis-6237)Bilateria-33213)Eukaryota-2759; ---------------------------------------------------- modified species tree including SBIRD is : ((ORYSA*-4530.rice,ARATH*-3702)Magnoliophyta-3398, (SCHPO*-4896.S_pombe,YEAST*-4932)Ascomycota-4890, ((((((((((((HUMAN*-9606,PANTR*-9598.chimpanzee)Homo/Pan/ Gorilla-207598, MACMU*-9544.monkey)Catarrhini-9526, OTOGA-*30611.galago)Primates-9443, ((MOUSE*-10090,RAT*-10116)Murinae-39107,RABIT-9986) Glires-314147)Euarchontoglires-314146, ((BOVIN*-9913.cow,PIG-*9823)Cetartiodactyla-91561, (CANFA*-9615.dog,FELCA-*9685.cat)Carnivora-33554, SORAR-*42254.shrew, MYOLU-*59463.bat)Laurasiatheria-314145, (ECHTE-9371.tenrec,LOXAF-9785.elephant)Afrotheria-311790, DASNO-9361.armadillo)Eutheria-9347,MONDO*-13616.opossum) Theria-32525, ORNAN-*9258.platypus)Mammalia-40674, (CHICK*-9031,SBIRD)Aves)Amniota-32524, XENTR*-8364.frog)Tetrapoda-32523, (BRARE*-7955.zebrafish, ((TETNG*-99883.pufferfish,FUGRU*-31033.pufferfish) Tetraodontidae-31031, (GASAC*-69293.stickleback,ORYLA*-8090.ricefish) Smegmamorpha-129949)Percomorpha-32485)Clupeocephala-186625) Euteleostomi-117571, (CIOIN*-7719,CIOSA*-51511)Ciona-7718)Chordata-7711, (((DROME*-7227.fly,DROPS*-7237.fly)Sophophora-32341, (AEDAE*-7159.mosquito,ANOGA*-7165.mosquito)Culicidae-7157) Diptera-7147, APIME-*7460.honeybee)Endopterygota-33392, SCHMA*-6183.fluke, (CAEEL*-6239.worm,CAEBR*-6238.worm,CAERE*-31234.worm) Caenorhabditis-6237)Bilateria-33213)Eukaryota-2759; -------------------------------------------------------------- the difference is highlighted in both trees. i have also made sure that the header line alignment is correctly correlating with the requirements of inference of species. cheers Hardip Patel CGG, Research School of Biological Sciences The Australian National University Acton-2601, Australia |
From: Sebastien M. <Seb...@un...> - 2008-06-24 07:40:46
|
Hi, It seems that MySQL dumps for TreeFam 6.0 are not the good one. When I connect to TreeFam.org (mysql -u anonymous -h db.treefam.org -P3308 treefam_6) I get 50 species with this request: SELECT * FROM species WHERE FLAG=1; With the dump from ftp://ftp.sanger.ac.uk/pub/treefam/release-6.0/MySQL/ or ftp://ftp.sanger.ac.uk/pub/treefam/release-current/MySQL/ locally, I get only 29 species with the same request. It is the same number than in TreeFam 4.0 or 5.0 . Do I do something wrong ? Or only the species table is fake ? Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 http://bioinfo.unil.ch/ |
From: Sebastien M. <Seb...@un...> - 2008-05-13 14:31:01
|
Hi, We are trying to install and use TreeFam locally. Everything is fine on database side but we encounter some problems with the APIs. First, when we try to use e.g. tfscripts/cgi-bin/search2.pl, a Perl module is missing (SangerWeb.pm) and not provided from the SVN. Is this module mandatory ? Where can we get it ? Second, the SVN repository does not seem to provide the configuration file required to run tfscripts/treefam.pl Third, when we try to use tfscripts/cgi-bin/treeview.pl a Perl module is missing (private.pm) and not provided from the SVN in the developer-oriented perl APIs. Thanks -- Sébastien Moretti Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland Tel.: +41 (21) 692 4221/4056 |