You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(41) |
May
(41) |
Jun
(50) |
Jul
(14) |
Aug
(21) |
Sep
(37) |
Oct
(8) |
Nov
(4) |
Dec
(135) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(145) |
Feb
(110) |
Mar
(216) |
Apr
(101) |
May
(42) |
Jun
(42) |
Jul
(23) |
Aug
(17) |
Sep
(33) |
Oct
(15) |
Nov
(18) |
Dec
(6) |
2011 |
Jan
(8) |
Feb
(10) |
Mar
(8) |
Apr
(41) |
May
(48) |
Jun
(62) |
Jul
(7) |
Aug
(9) |
Sep
(7) |
Oct
(11) |
Nov
(49) |
Dec
(1) |
2012 |
Jan
(17) |
Feb
(63) |
Mar
(4) |
Apr
(13) |
May
(17) |
Jun
(21) |
Jul
(10) |
Aug
(10) |
Sep
|
Oct
|
Nov
|
Dec
(16) |
2013 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: Hilmar L. <hl...@du...> - 2009-06-05 02:25:49
|
Hi Arlin & Enrico - the OBO Foundry ontologies use the SourceForge- provided trackers to request and document new terms or relationships or changes to existing ones. Do you anticipate CDAO to become accepted under the OBO Foundry too, and if so, do you think the SourcForge term tracker approach will not be suitable for CDAO? -hilmar On Jun 4, 2009, at 11:20 AM, Arlin Stoltzfus wrote: > We're listening. Brandon has developed a term-request server to > process requests for concepts and relations, but its still in an early > stage. In a few weeks we will be in a better position to have this > discussion. > > Arlin > > On May 29, 2009, at 10:31 AM, Rutger Vos wrote: > >> Hi, >> >> I've done enough experimentation to establish the correct syntax for >> attaching valid RDFa attachments to nexml so that standard RDFa >> extractors can turn <meta/> elements - the new dictionaries - into >> RDF >> triples. I've implemented this in the java and perl APIs (Jeet: I >> hope >> that the examples I've mailed out give you enough of a template to do >> this in python too, but please let me know of I can help - I know >> that >> the wiki needs updating, for starters). >> >> The key issue now is the definition of predicates, i.e. the value of >> the @property and @rel attributes. Over the course of many EvoInfo >> discussions it's been decided that CDAO will be the principal >> artifact >> for their mediation - so what's the community process for inclusion >> of >> new predicates? >> >> Val and I have sketched out a couple of TreeBASE services whose >> search >> keys should be part of a controlled vocabulary (things like tree.id, >> tree.label, etc.), and this is just one use case of a project >> having a >> potentially large number of predicates (other example: Mesquite). >> >> It would be great if team CDAO could tell us where to send our list >> of >> proposed terms and where we can download an amended version of CDAO >> that includes them :-) >> >> I note that there is a wiki page about this >> (https://www.nescent.org/wg_evoinfo/CDAO_term_request), but ideally >> there would be some sort of issue tracker with structured input >> fields >> (e.g. subject/predicate/object name="XXX", suggested >> superclass="YYY", >> suggested datatype(s)="ZZZ", description="..."). Behind this tracker >> would be a team of curators that will promptly pick up a posted issue >> and work towards a solution. >> >> I realize that this involves a support commitment from team CDAO, but >> I think that's what we agreed to over free-form key/value pairs, >> homegrown vocabularies or a BioMoby-like free-for-all. >> >> Any comments? >> >> Rutger >> >> -- >> Dr. Rutger A. Vos >> Department of zoology >> University of British Columbia >> http://www.nexml.org >> http://rutgervos.blogspot.com > > > ------------------------------------------------------------------------------ > OpenSolaris 2009.06 is a cutting edge operating system for enterprises > looking to deploy the next generation of Solaris that includes the > latest > innovations from Sun and the OpenSource community. Download a copy and > enjoy capabilities such as Networking, Storage and Virtualization. > Go to: http://p.sf.net/sfu/opensolaris-get > _______________________________________________ > Nexml-discuss mailing list > Nex...@li... > https://lists.sourceforge.net/lists/listinfo/nexml-discuss -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mj...@ge...> - 2009-06-04 20:13:20
|
On Thu, 2009-06-04 at 11:27 -0400, Mark Dominus wrote: > Did we pick a date and time? The phone call will be Friday, 5 June at 14:30 EDT, 11:30 PDT. Bill, Rutger, Val and I will attend. If anyone else wants to be involved, email Val. -- Mark Jason Dominus mj...@ge... Penn Genome Frontiers Institute +1 215 573 5387 |
From: Arlin S. <sto...@um...> - 2009-06-04 15:40:19
|
We're listening. Brandon has developed a term-request server to process requests for concepts and relations, but its still in an early stage. In a few weeks we will be in a better position to have this discussion. Arlin On May 29, 2009, at 10:31 AM, Rutger Vos wrote: > Hi, > > I've done enough experimentation to establish the correct syntax for > attaching valid RDFa attachments to nexml so that standard RDFa > extractors can turn <meta/> elements - the new dictionaries - into RDF > triples. I've implemented this in the java and perl APIs (Jeet: I hope > that the examples I've mailed out give you enough of a template to do > this in python too, but please let me know of I can help - I know that > the wiki needs updating, for starters). > > The key issue now is the definition of predicates, i.e. the value of > the @property and @rel attributes. Over the course of many EvoInfo > discussions it's been decided that CDAO will be the principal artifact > for their mediation - so what's the community process for inclusion of > new predicates? > > Val and I have sketched out a couple of TreeBASE services whose search > keys should be part of a controlled vocabulary (things like tree.id, > tree.label, etc.), and this is just one use case of a project having a > potentially large number of predicates (other example: Mesquite). > > It would be great if team CDAO could tell us where to send our list of > proposed terms and where we can download an amended version of CDAO > that includes them :-) > > I note that there is a wiki page about this > (https://www.nescent.org/wg_evoinfo/CDAO_term_request), but ideally > there would be some sort of issue tracker with structured input fields > (e.g. subject/predicate/object name="XXX", suggested superclass="YYY", > suggested datatype(s)="ZZZ", description="..."). Behind this tracker > would be a team of curators that will promptly pick up a posted issue > and work towards a solution. > > I realize that this involves a support commitment from team CDAO, but > I think that's what we agreed to over free-form key/value pairs, > homegrown vocabularies or a BioMoby-like free-for-all. > > Any comments? > > Rutger > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com |
From: Mark D. <mj...@ge...> - 2009-06-04 15:27:29
|
On Thu, 2009-05-28 at 17:20 -0400, Rutger Vos wrote: > Val & I have concluded we should have a telecon (or other discussion > format) about taxonomic queries. > Here's a doodle poll to pick a date/time: http://doodle.com/7x5aykszup954ysa Did we pick a date and time? -- Mark Jason Dominus mj...@ge... Penn Genome Frontiers Institute +1 215 573 5387 |
From: Jon A. <jon...@du...> - 2009-06-03 14:35:41
|
Yes to both questions. I've allowed for a week of testing and have cleared my schedule for that week. We can start the data import at any time, however. We do have capacity on our our current dev Postgresql server. There is no real need to wait for the installation of the new hardware. That being said, if we want to start with the new hardware off the bat, it should be ready the week starting June 23rd. -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ On Jun 3, 2009, at 6:08 AM, Hilmar Lapp wrote: > Jon - don't we need to run our own test suites before we will deploy > any TreeBASE instance to this? Also, for testing the import don't we > have enough storage on the dev server right now? > > -hilmar > > On Jun 2, 2009, at 5:38 PM, Jon Auman wrote: > >> I just received confirmation from Dell that the server and storage we >> ordered is expected to arrive the end of next week. I expect we could >> be ready to start importing data by June 23rd. >> >> -Jon >> >> ------------------------------------------------------- >> Jon Auman >> Systems Administrator >> National Evolutionary Synthesis Center >> Duke University >> http:www.nescent.org >> jon...@ne... >> ------------------------------------------------------ >> >> >> ------------------------------------------------------------------------------ >> OpenSolaris 2009.06 is a cutting edge operating system for >> enterprises >> looking to deploy the next generation of Solaris that includes the >> latest >> innovations from Sun and the OpenSource community. Download a copy >> and >> enjoy capabilities such as Networking, Storage and Virtualization. >> Go to: http://p.sf.net/sfu/opensolaris-get >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > |
From: Hilmar L. <hl...@du...> - 2009-06-03 10:08:59
|
Jon - don't we need to run our own test suites before we will deploy any TreeBASE instance to this? Also, for testing the import don't we have enough storage on the dev server right now? -hilmar On Jun 2, 2009, at 5:38 PM, Jon Auman wrote: > I just received confirmation from Dell that the server and storage we > ordered is expected to arrive the end of next week. I expect we could > be ready to start importing data by June 23rd. > > -Jon > > ------------------------------------------------------- > Jon Auman > Systems Administrator > National Evolutionary Synthesis Center > Duke University > http:www.nescent.org > jon...@ne... > ------------------------------------------------------ > > > ------------------------------------------------------------------------------ > OpenSolaris 2009.06 is a cutting edge operating system for enterprises > looking to deploy the next generation of Solaris that includes the > latest > innovations from Sun and the OpenSource community. Download a copy and > enjoy capabilities such as Networking, Storage and Virtualization. > Go to: http://p.sf.net/sfu/opensolaris-get > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Jon A. <jon...@du...> - 2009-06-02 16:38:22
|
I just received confirmation from Dell that the server and storage we ordered is expected to arrive the end of next week. I expect we could be ready to start importing data by June 23rd. -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: Mark D. <mj...@ge...> - 2009-05-29 16:26:48
|
On Fri, 2009-05-29 at 11:12 -0400, Jon Auman wrote: > > rsync will work fine for me if we need incremental updates, otherwise > scp is faster. I like rsync because it's easier to restart if the transmission is interrupted partway through. With scp you get to start over again, or keep track separately of how far you got. But I guess you already know that. We can probably do the initial transfer by scp'ing compressed files, and then use rsync for any later updates. > Unless I hear otherwise from Hilmar, I'm assuming that person will be > me. Excellent, thanks. > If the SQL inserts all contain UTF8 characters, then there should be > no problem with the import into a UTF8 postgresql database. If there > are non-UTF8 characters in the SQL file, I'm not sure what that would mean. What is a "non-UTF8 character"? -- Mark Jason Dominus mj...@ge... Penn Genome Frontiers Institute +1 215 573 5387 |
From: William P. <wil...@ya...> - 2009-05-29 16:20:19
|
On May 29, 2009, at 11:12 AM, Jon Auman wrote: > If the SQL inserts all contain UTF8 characters, then there should be > no problem with the import into a UTF8 postgresql database. If there > are non-UTF8 characters in the SQL file, they can be stripped out with > iconv or converted with a shell program called "recode" Given the history of the legacy TreeBASE data, I believe that the vast majority of diacriticals will be properly formed in utf8, but there will be some malformed ones (1) dating from when we were entering data through a Mac application (Apple8 characters) and (2) as a result of people submitting data via web browsers that don't comply with our meta tags regarding character codings. I think it's fine to leave these malformed ones in (rather than auto-stripping them out) because we will want to fix them by hand later on, and they help alert us to where things need fixing. bp |
From: William P. <wil...@ya...> - 2009-05-29 16:10:55
|
On May 29, 2009, at 10:52 AM, Rutger Vos wrote: >> It's my understanding that we have not yet imported ncbi's >> classification >> into TreeBASE2? Is this correct? If so, then "containing any >> Primates" is >> not an option at the moment -- although it should be eventually >> (I'm putting >> it in our poster!) -- and it should be an easy thing to add. > > Absolutely, not yet - but very important. A propos, what would be the > right way to do this? I've thought about it a little bit and imagined > importing the NCBI taxonomy as a very large tree against whose > topology we run queries would be a way to do this that doesn't require > schema changes. Reasonable? Silly? That was my original proposal (before you joined the team) -- but it got nixed by others. My rationale was that we would develop a system so that users could specify any TreeBASE tree to use as a classification, with the idea that we would not be locked into any particular classification system using dedicated tables. On the other hand, seeing as we already have ncbi_taxids, it makes sense to be wedded to ncbi because any other classification tree (e.g. ToLWeb) would have weaker links among taxon labels (it would have to be done by string matching). Plus, I'm a bit skeptical that our tree parsing and importing system can really handle 500k-node trees (can headless Mesquite handle that?). At any rate, importing and indexing the ncbi tables is easily done and explained here. Ideally we want a one-click process for downloading, refreshing and reindexing these tables using the latest version from ncbi. The two tables can have fields that exactly mirror the fields in the download, plus two more fields (left_index and right_index). Optionally, we can build a path table for transitive closure searches, but seeing as there is only one tree, it is probably sufficient to use the left/right id system. >> PhyloWS seems to be missing a specification on how to search on a >> tree >> topology. > > The wiki floats the idea of using PhyloCode for that, but I'm not sure > if it can satisfy all our requirements. There is some verbiage here (see point 3 below), but it doesn't specify how to input a query tree. bp Find/search examples: Task: Find trees by nodes Input: a list of node specifiers, and a designation of what the specifiers should match (node label, sequence ID, taxon, gene name) Task: Find trees by clade Input: clade specification (phylocode) Task: Find, or filter trees matching a query topology. The query topology might have polytomies, of which matching trees may be a specialization. Input: A database (or result set) of trees, a query tree, and a distance metric Output: The matching trees (names, identifiers), or alternatively the subtrees of matching trees projected onto the query topology |
From: Rutger V. <rut...@gm...> - 2009-05-29 15:49:28
|
We're just starting to write the data dumper and we'll probably need some time to test/debug that (which we can do on a local pg instance, I'd think) so it seems to me this timeline is good enough. On Fri, May 29, 2009 at 11:24 AM, Jon Auman <jon...@du...> wrote: > As things stand now, I expect to be open and ready for business in > about two weeks. > > The PO for the equipment was submitted to Duke purchasing two weeks > ago. We followed up with purchasing this week and were told it is > "almost" ready to be sent to Dell. > > If the PO gets sent out today or Monday, I expect the hardware in > sometime the week starting June 9th. I'll then need about a week to > set up things and do some performance testing to determine the best > setup for Postgresql performance. > > Do we need to start data imports before then? > > -Jon > > ------------------------------------------------------- > Jon Auman > Systems Administrator > National Evolutionary Synthesis Center > Duke University > http:www.nescent.org > jon...@ne... > ------------------------------------------------------ > > On May 27, 2009, at 1:37 PM, Mark Dominus wrote: > >> >> On Fri, 2009-05-01 at 09:40 -0400, Hilmar Lapp wrote: >>> There'll be more on this next week. I'm tied up during the day (as I >>> have been the whole week), more on this tonight and tomorrow. >> >> What is the status of this? Do we have an ETA? If not, do we have an >> ETA for the ETA? > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Jon A. <jon...@du...> - 2009-05-29 15:24:40
|
As things stand now, I expect to be open and ready for business in about two weeks. The PO for the equipment was submitted to Duke purchasing two weeks ago. We followed up with purchasing this week and were told it is "almost" ready to be sent to Dell. If the PO gets sent out today or Monday, I expect the hardware in sometime the week starting June 9th. I'll then need about a week to set up things and do some performance testing to determine the best setup for Postgresql performance. Do we need to start data imports before then? -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ On May 27, 2009, at 1:37 PM, Mark Dominus wrote: > > On Fri, 2009-05-01 at 09:40 -0400, Hilmar Lapp wrote: >> There'll be more on this next week. I'm tied up during the day (as I >> have been the whole week), more on this tonight and tomorrow. > > What is the status of this? Do we have an ETA? If not, do we have an > ETA for the ETA? |
From: Jon A. <jon...@du...> - 2009-05-29 15:12:32
|
On May 28, 2009, at 5:10 PM, Mark Dominus wrote: > > Files can be copied from SDSC to NESCent with any of several methods, > including rsync. rsync will work fine for me if we need incremental updates, otherwise scp is faster. > Once at NESCent, someone (who?) will be responsible for executing > these > large SQL batch files to import the data into the Pg database. If > done > properly, the Unicode data will be transferred faithfully. Unless I hear otherwise from Hilmar, I'm assuming that person will be me. I'll be setting up the rsync transfers and the SQL imports. If the SQL inserts all contain UTF8 characters, then there should be no problem with the import into a UTF8 postgresql database. If there are non-UTF8 characters in the SQL file, they can be stripped out with iconv or converted with a shell program called "recode" > If this sounds like a bad idea, or if there are unanswered questions, > now is the time to speak up. I also agree that this is the best option to try first. -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: Rutger V. <rut...@gm...> - 2009-05-29 14:52:37
|
On Thu, May 28, 2009 at 7:56 PM, William Piel <wil...@ya...> wrote: > > On May 28, 2009, at 5:20 PM, Rutger Vos wrote: > > Here's a doodle poll to pick a date/time: http://doodle.com/7x5aykszup954ysa > > I will be in London until the 5th. Conceivably I can Skype from London, but > not knowing the conference schedule, I can't commit. Obviously you're a key participant so let's just play it by ear; I suppose we can start the discussion by email and either call in next week or whenever you can make it. > It's my understanding that we have not yet imported ncbi's classification > into TreeBASE2? Is this correct? If so, then "containing any Primates" is > not an option at the moment -- although it should be eventually (I'm putting > it in our poster!) -- and it should be an easy thing to add. Absolutely, not yet - but very important. A propos, what would be the right way to do this? I've thought about it a little bit and imagined importing the NCBI taxonomy as a very large tree against whose topology we run queries would be a way to do this that doesn't require schema changes. Reasonable? Silly? > Hilmar objects to the use of multiple terms, and would rather that I just > use "taxonIdentifier", but then have some special namespace for what I'm > searching on. (e.g. "taxonIdentifier any ncbi_taxid:12345" vs > "taxonIdentifier any ubio_namebankid:12345"). I can see the point of that: it would make inclusion into CDAO easier if all we need is a generic taxonIdentifier object. On the other hand, it would imply overloading the identifier string, with the namespacing smuggling some amount of extra semantics into something that really ought to be an opaque string. > PhyloWS seems to be missing a specification on how to search on a tree > topology. The wiki floats the idea of using PhyloCode for that, but I'm not sure if it can satisfy all our requirements. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Rutger V. <rv...@in...> - 2009-05-29 14:32:03
|
Hi, I've done enough experimentation to establish the correct syntax for attaching valid RDFa attachments to nexml so that standard RDFa extractors can turn <meta/> elements - the new dictionaries - into RDF triples. I've implemented this in the java and perl APIs (Jeet: I hope that the examples I've mailed out give you enough of a template to do this in python too, but please let me know of I can help - I know that the wiki needs updating, for starters). The key issue now is the definition of predicates, i.e. the value of the @property and @rel attributes. Over the course of many EvoInfo discussions it's been decided that CDAO will be the principal artifact for their mediation - so what's the community process for inclusion of new predicates? Val and I have sketched out a couple of TreeBASE services whose search keys should be part of a controlled vocabulary (things like tree.id, tree.label, etc.), and this is just one use case of a project having a potentially large number of predicates (other example: Mesquite). It would be great if team CDAO could tell us where to send our list of proposed terms and where we can download an amended version of CDAO that includes them :-) I note that there is a wiki page about this (https://www.nescent.org/wg_evoinfo/CDAO_term_request), but ideally there would be some sort of issue tracker with structured input fields (e.g. subject/predicate/object name="XXX", suggested superclass="YYY", suggested datatype(s)="ZZZ", description="..."). Behind this tracker would be a team of curators that will promptly pick up a posted issue and work towards a solution. I realize that this involves a support commitment from team CDAO, but I think that's what we agreed to over free-form key/value pairs, homegrown vocabularies or a BioMoby-like free-for-all. Any comments? Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2009-05-28 23:56:51
|
On May 28, 2009, at 5:20 PM, Rutger Vos wrote: > Here's a doodle poll to pick a date/time: http://doodle.com/7x5aykszup954ysa I will be in London until the 5th. Conceivably I can Skype from London, but not knowing the conference schedule, I can't commit. > Val & I have concluded we should have a telecon (or other discussion > format) about taxonomic queries. > > The proliferation of possibilities is too big for us to solve. If a > client says "[...] where taxon=Primates [...]", what does that mean? > Match only against leaf nodes called Primates? Only match trees > containing only Primates? Etc. What about lists of taxa, or > (topologically) disjoint sets? It's my understanding that we have not yet imported ncbi's classification into TreeBASE2? Is this correct? If so, then "containing any Primates" is not an option at the moment -- although it should be eventually (I'm putting it in our poster!) -- and it should be an easy thing to add. Anyway, this is why my prototype PhyloWS API has a proliferation of terms to search on: taxon_name (the fullnamestring in the taxon_variants table) taxon_label (the label string on either a tree leaf or on a matrix row) h.taxon_name (the higher taxon name in the ncbi tables -- i.e. get all trees that have any kind of descendant from this higer name) ncbi_taxid (the ncbi taxid) h.ncbi_taxid (the ncbi taxid but one that searches for all descendants of this node in the ncbi classification) ubio_namebankid (the namebankid from ubio) taxon_id (TreeBASE's own taxon_id from the taxon table) Hilmar objects to the use of multiple terms, and would rather that I just use "taxonIdentifier", but then have some special namespace for what I'm searching on. (e.g. "taxonIdentifier any ncbi_taxid:12345" vs "taxonIdentifier any ubio_namebankid:12345"). He would probably also protest about having separate "taxon_name" and "taxon_label". In this instance, I don't mind using only "name" but then having the server know to treat this as "taxon_name OR taxon_label". In terms of h.taxon_name, I don't see any way around it: we really need a separate term to mean "any kind of Primates" instead of exactly matching "Primates". > ...what does that mean? Match only against leaf nodes called > Primates? Only match trees containing only Primates "taxon_name any Primates" means "find any tree that has a node that maps to the name Primates". I think it is difficult to express "match trees containing only Primates", but I could approximate it like so: h.taxon_name any Primates NOT (h.taxon_name any Scandentia OR h.taxon_name any Glires OR h.taxon_name any Dermoptera) To do it exactly right, we need a special syntax so that the database knows to search the ncbi classification for the opposite of a subclade (i.e. everything except the specified subclade). PhyloWS seems to be missing a specification on how to search on a tree topology. One possible solution is to take advantage of the query tree structure supported by CQL. For example: /phylows/find/tree/?query=%28%28name+any+Homo+and+name+any+Pan%29+and +name+any+Gorilla%29 ... returns any tree that has Homo and Pan and Gorilla in it. Whereas this query: /phylows/find/topology/?query=%28%28name+any+Homo+and+name+any+Pan %29+and+name+any+Gorilla%29 ... returns any tree that matches the topology: ((Homo, Pan),Gorilla) bp |
From: Hilmar L. <hl...@du...> - 2009-05-28 21:32:19
|
This sounds great! -hilmar On May 28, 2009, at 5:10 PM, Mark Dominus wrote: > Rutger, Val and I just had a meeting in which we decided on a plan for > how to do this. > > Either Rutger or I will write a program that dumps out the database > tables as a series of SQL 'insert' commands. We will be responsible > for > making sure that the data is properly escaped. The files will be > UTF-8 > encoded, so that the Unicode data that is currently in the database > will > be properly represented. > > The same program can also emit some SQL commands that set the sequence > numbers, grants, or whatever. > > SDSC has assured us that they can provide enough disk space at SDSC to > do this. > > Files can be copied from SDSC to NESCent with any of several methods, > including rsync. > > Once at NESCent, someone (who?) will be responsible for executing > these > large SQL batch files to import the data into the Pg database. If > done > properly, the Unicode data will be transferred faithfully. > > If this sounds like a bad idea, or if there are unanswered questions, > now is the time to speak up. > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: William P. <wil...@ya...> - 2009-05-28 21:22:38
|
On May 28, 2009, at 5:00 PM, Rutger Vos wrote: > * tree.label => this is the token captured thusly from a nexus tree > description: /U?TREE\s+(.\S+?)\s*=/, i.e. a token that is always > present in all nexus versions, but it's short and needs to be > nexus-safe (w.r.t. quotes, spaces, comments) Correct. It is also user-editable during the submission process. It is usually short, and typically takes the form of "Fig. 2" or "Appendix A" or "PAUP 1". > * tree.title => entered by users in the TreeBASE1 (and 2) interface, > not parsed or serialized to/from nexus. (The TITLE token that occurs > in mesquite-nexus is optional and in any case applies to the enclosing > block.) Correct. This is usually longer, and often reflects the title legend for the figure in the paper. > * matrix.title => analogous to tree.title, i.e. entered by users. It's > not the TITLE token from characters blocks (again, that's a > mesquite-ism; we could use that as a default value during the initial > upload, but in general it's user supplied) Correct, although I think we made it so that when matrices are downloaded (= reconstructed), each character block is assigned a Mesquite-style TITLE with the contents of matrix.title. > * matrix.description => a longer, user-supplied description that > wasn't used in TreeBASE1, hence we've populated it with the legacy > identifiers. Correct. The remaining confusing thing in matrix.x is that there are two different things similar to "Data Type". One is a user-entered data type (e.g. "Morphological", or "Nucleic Acid") and the other is a nexus-entered DATATYPE (e.g. "Standard" or "DNA"). While they look similar, they are not redundant -- e.g. there are many "Nucleic Acid" matrices that are still coded as "DATATYPE=STANDARD". There's also a "Tree Type", "Tree Kind" and "Tree Quality" -- should be self-evident from browsing the data. bp |
From: Rutger V. <rut...@gm...> - 2009-05-28 21:20:22
|
Hi, Val & I have concluded we should have a telecon (or other discussion format) about taxonomic queries. The proliferation of possibilities is too big for us to solve. If a client says "[...] where taxon=Primates [...]", what does that mean? Match only against leaf nodes called Primates? Only match trees containing only Primates? Etc. What about lists of taxa, or (topologically) disjoint sets? Also, what syntax do we use for this. Perhaps CQL isn't expressive enough for this so we may need another mini-syntax inside a CQL query (ideas floated: PhyloCode, NHX conventions). Here's a doodle poll to pick a date/time: http://doodle.com/7x5aykszup954ysa Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Mark D. <mj...@ge...> - 2009-05-28 21:10:39
|
Rutger, Val and I just had a meeting in which we decided on a plan for how to do this. Either Rutger or I will write a program that dumps out the database tables as a series of SQL 'insert' commands. We will be responsible for making sure that the data is properly escaped. The files will be UTF-8 encoded, so that the Unicode data that is currently in the database will be properly represented. The same program can also emit some SQL commands that set the sequence numbers, grants, or whatever. SDSC has assured us that they can provide enough disk space at SDSC to do this. Files can be copied from SDSC to NESCent with any of several methods, including rsync. Once at NESCent, someone (who?) will be responsible for executing these large SQL batch files to import the data into the Pg database. If done properly, the Unicode data will be transferred faithfully. If this sounds like a bad idea, or if there are unanswered questions, now is the time to speak up. |
From: Rutger V. <rut...@gm...> - 2009-05-28 21:00:51
|
Hi, Val, MJD and I just had a meeting where we tried to reconstruct the origins and meaning of the different metadata text strings attached to treebase objects: tree.label, tree.title, matrix.title, matrix.description. I theorized the following, and am now looking for confirmation from Bill (this is all in the context of identifying fields that clients may search on through the web interface): * tree.label => this is the token captured thusly from a nexus tree description: /U?TREE\s+(.\S+?)\s*=/, i.e. a token that is always present in all nexus versions, but it's short and needs to be nexus-safe (w.r.t. quotes, spaces, comments) * tree.title => entered by users in the TreeBASE1 (and 2) interface, not parsed or serialized to/from nexus. (The TITLE token that occurs in mesquite-nexus is optional and in any case applies to the enclosing block.) * matrix.title => analogous to tree.title, i.e. entered by users. It's not the TITLE token from characters blocks (again, that's a mesquite-ism; we could use that as a default value during the initial upload, but in general it's user supplied) * matrix.description => a longer, user-supplied description that wasn't used in TreeBASE1, hence we've populated it with the legacy identifiers. The reason this question comes up is that we're trying to sketch out a small controlled vocabulary of search keys (the actual key strings to be matched to CDAO, DC and others (perhaps including an ontology of TB2-specific subclasses of CDAO terms)). We probably don't expect clients to know the difference between label, title and description so we might lump them into a more generic "description" key that is matched against all of them. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Mark D. <mj...@ge...> - 2009-05-28 16:42:32
|
On Thu, 2009-05-28 at 12:00 -0400, Mark Dominus wrote: > It seems that some files may be missing. I notice you got rid of the top-level /mjd directory, which I think was a good move, and also the top-level /doc directory. Getting rid of the stuff in /doc doesn't bother me at all, but I wanted to check to make sure you had done it on purpose. |
From: Mark D. <mj...@ge...> - 2009-05-28 16:24:26
|
On Thu, 2009-05-28 at 12:17 -0400, Rutger Vos wrote: > What, do you think, is the exhaustive way to > do this? How about "svn ls -R" in the old repository, "svn ls -R -r 1" in the new repository, and then diff the two outputs? Or to be really thorough you could check out complete fresh copies of both repositories and then "diff -R" the two directories; they should be identical. |
From: Rutger V. <rut...@gm...> - 2009-05-28 16:17:53
|
Mmmm... OK, that's odd. What, do you think, is the exhaustive way to do this? Merge my eclipse project based on the sdsc code base *sans .svn folders* with the sf.net version, then check individually for each file not under svn whether it should be? On Thu, May 28, 2009 at 12:13 PM, Mark Dominus <mj...@ge...> wrote: > On Thu, 2009-05-28 at 12:07 -0400, Rutger Vos wrote: >> Is it possible that the pom.xml belongs in that category? > > No. In fact, you were the last person to commit it. > > -- > Mark Jason Dominus mj...@ge... > Penn Genome Frontiers Institute +1 215 573 5387 > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Mark D. <mj...@ge...> - 2009-05-28 16:13:47
|
On Thu, 2009-05-28 at 12:07 -0400, Rutger Vos wrote: > Is it possible that the pom.xml belongs in that category? No. In fact, you were the last person to commit it. -- Mark Jason Dominus mj...@ge... Penn Genome Frontiers Institute +1 215 573 5387 |