From: Brian P. <bri...@in...> - 2007-10-06 06:53:06
|
There are a handful of other cases where it appears that the authors have gotten "is a" and "part_of" confused. My proposed corrections (IN CAPS) inline: MS:1000025 "magnetic field strength" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000024 "final MS exponent" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000022 "TOF Total Path Length" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000014 "accuracy" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000106 "on" is a MS:1000021 "reflectron state" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000105 "off" is a MS:1000021 "reflectron state" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" The following changes would make the Thermo and ABI stuff look like all the other vendors: MS:1000495 "Applied Biosystems" part of (IS_A) MS:1000121 "ABI / SCIEX" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000176 "MAT95XP Trap" is a (IS_A) MS:1000493 "Finnigan MAT" part of MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000175 "MAT95XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000174 "MAT900XP Trap" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000173 "MAT900XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000172 "MAT253" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" I still think there's a schema in there, albeit jammed in slightly sideways at the moment. - Brian |
From: Matt C. <mat...@va...> - 2007-10-06 18:35:23
|
Good catches in the CV. Who is in charge of maintaining it and are they reading this list? :) I agree with auto-generating a XML schema with full semantic relationships encoded in it, direct from the CV, but you haven't addressed the issue I mentioned earlier. To do the auto-generation into CV params (if we choose method A) will be very ugly but it will allow for synonyms on the category names and value names. To implement the cvParam categories as XML elements though, you lose the ability to have synonyms for category names (unless you use the accession number of the category as the element name, which makes me shudder), but the final schema would look a lot nicer. -Matt Brian Pratt wrote: > > There are a handful of other cases where it appears that the authors > have gotten “is a” and “part_of” confused. My proposed corrections (IN > CAPS) inline: > > MS:1000025 "magnetic field strength" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000024 "final MS exponent" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000022 "TOF Total Path Length" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000014 "accuracy" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000106 "on" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000105 "off" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > The following changes would make the Thermo and ABI stuff look like > all the other vendors: > > MS:1000495 "Applied Biosystems" > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000176 "MAT95XP Trap" > > is a (IS_A) MS:1000493 "Finnigan MAT" > > part of MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000175 "MAT95XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000174 "MAT900XP Trap" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000173 "MAT900XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000172 "MAT253" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > I still think there’s a schema in there, albeit jammed in slightly > sideways at the moment. > > - Brian > |
From: Angel P. <an...@ma...> - 2007-10-07 00:16:54
|
I wouldn't spend too much time trying to parse OBO files into XML schema. The format grew out of a need for quick and dirty CV with some ontology structure editing and there is really only one library editor that works with it, namely the author's tools of the OBO format itself. As a side note, and completely my own opinion, but if mzML were to use RDF schema for the schema and RDF for the CV, validation and everything else would fall into place. I believe that there is an OBO to RDF perl tools someplace. - angel On 10/6/07, Matt Chambers <mat...@va...> wrote: > > Good catches in the CV. Who is in charge of maintaining it and are they > reading this list? :) I agree with auto-generating a XML schema with > full semantic relationships encoded in it, direct from the CV, but you > haven't addressed the issue I mentioned earlier. To do the > auto-generation into CV params (if we choose method A) will be very ugly > but it will allow for synonyms on the category names and value names. To > implement the cvParam categories as XML elements though, you lose the > ability to have synonyms for category names (unless you use the > accession number of the category as the element name, which makes me > shudder), but the final schema would look a lot nicer. > > -Matt > > Brian Pratt wrote: > > > > There are a handful of other cases where it appears that the authors > > have gotten "is a" and "part_of" confused. My proposed corrections (IN > > CAPS) inline: > > > > MS:1000025 "magnetic field strength" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000024 "final MS exponent" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000022 "TOF Total Path Length" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000014 "accuracy" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000106 "on" > > > > is a MS:1000021 "reflectron state" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000105 "off" > > > > is a MS:1000021 "reflectron state" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > The following changes would make the Thermo and ABI stuff look like > > all the other vendors: > > > > MS:1000495 "Applied Biosystems" > > > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000176 "MAT95XP Trap" > > > > is a (IS_A) MS:1000493 "Finnigan MAT" > > > > part of MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000175 "MAT95XP" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000174 "MAT900XP Trap" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000173 "MAT900XP" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000172 "MAT253" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > I still think there's a schema in there, albeit jammed in slightly > > sideways at the moment. > > > > - Brian > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Brian P. <bri...@in...> - 2007-10-08 23:15:47
|
Hi Angel, This may be a bit esoteric, but I wanted to ask what advantage RDF might have over the older W3C XML schema (.xsd). I'm unfamiliar with RDF, and from my 20 minutes of googling it appears rather more complex than .xsd - certainly more complex than it would need to be to handle the kinds of things mzData and mzXML do today, but I'm sure I'm flaunting my ignorance. I see that there are (but don't completely understand the nature of) relationships between RDF, OWL, OBO, and CV. Presumably you see some means of exploiting these relationships? I have a lot to learn if we go this route, but it sounds interesting. At least we'd get to say "semantic web" a lot, which sounds cool. >> I believe that there is an OBO to RDF perl tools someplace. Maybe this (java, I think): http://www.cs.utexas.edu/~hamid/research/obo2owl.cgi Thanks, Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Saturday, October 06, 2007 5:17 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] more is_a vs. part_of errors? I wouldn't spend too much time trying to parse OBO files into XML schema. The format grew out of a need for quick and dirty CV with some ontology structure editing and there is really only one library editor that works with it, namely the author's tools of the OBO format itself. As a side note, and completely my own opinion, but if mzML were to use RDF schema for the schema and RDF for the CV, validation and everything else would fall into place. I believe that there is an OBO to RDF perl tools someplace. - angel On 10/6/07, Matt Chambers <mat...@va...> wrote: Good catches in the CV. Who is in charge of maintaining it and are they reading this list? :) I agree with auto-generating a XML schema with full semantic relationships encoded in it, direct from the CV, but you haven't addressed the issue I mentioned earlier. To do the auto-generation into CV params (if we choose method A) will be very ugly but it will allow for synonyms on the category names and value names. To implement the cvParam categories as XML elements though, you lose the ability to have synonyms for category names (unless you use the accession number of the category as the element name, which makes me shudder), but the final schema would look a lot nicer. -Matt Brian Pratt wrote: > > There are a handful of other cases where it appears that the authors > have gotten "is a" and "part_of" confused. My proposed corrections (IN > CAPS) inline: > > MS:1000025 "magnetic field strength" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000024 "final MS exponent" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000022 "TOF Total Path Length" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000014 "accuracy" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000106 "on" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000105 "off" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > The following changes would make the Thermo and ABI stuff look like > all the other vendors: > > MS:1000495 "Applied Biosystems" > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000176 "MAT95XP Trap" > > is a (IS_A) MS:1000493 "Finnigan MAT" > > part of MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000175 "MAT95XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000174 "MAT900XP Trap" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000173 "MAT900XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000172 "MAT253" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > I still think there's a schema in there, albeit jammed in slightly > sideways at the moment. > > - Brian > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Mike C. <tu...@gm...> - 2007-10-07 01:20:54
|
On 10/6/07, Matt Chambers <mat...@va...> wrote: > Good catches in the CV. Who is in charge of maintaining it and are they > reading this list? :) I'm not sure I understand the implications of this. If the CV gets rearranged after the spec is out and people are using mzML file, will this be a problem? Does it matter if the CV usage in the mzML in front of me does not match my CV database? Or do I need to have all versions of the CV database? Mike |
From: Matt C. <mat...@va...> - 2007-10-07 04:25:23
|
The backward compatibility problem is not unique to CVs; backwards compatibility is just as much a problem with XML schema. If there is ever a change to the schema or CV which moves or deletes an existing term, old files are likely to be invalidated. This can be avoided by never moving or deleting existing terms, or if the appearance of moving and deleting terms can't be avoided, then adding support for deprecating instead of deleting. -Matt Mike Coleman wrote: > On 10/6/07, Matt Chambers <mat...@va...> wrote: > >> Good catches in the CV. Who is in charge of maintaining it and are they >> reading this list? :) >> > > I'm not sure I understand the implications of this. If the CV gets > rearranged after the spec is out and people are using mzML file, will > this be a problem? Does it matter if the CV usage in the mzML in > front of me does not match my CV database? Or do I need to have all > versions of the CV database? > > Mike > > |
From: Chris T. <chr...@eb...> - 2007-10-07 21:37:29
|
Hiya. I think deprecation is pretty standard and is implemented for PSI CVs iirc. That way a term is not substantively changed, just flagged as deprecated with a pointer to the new preferred term and therefore nothing breaks (although terms are obviously added elsewhere as part of the process, but then they will be being added all the time anyway so a frozen CV is not that great an idea but updates need not be nightly as say OLS is (again, iirc)). Cheers, Chris. Matt Chambers wrote: > The backward compatibility problem is not unique to CVs; backwards > compatibility is just as much a problem with XML schema. If there is > ever a change to the schema or CV which moves or deletes an existing > term, old files are likely to be invalidated. This can be avoided by > never moving or deleting existing terms, or if the appearance of moving > and deleting terms can't be avoided, then adding support for deprecating > instead of deleting. > > -Matt > > Mike Coleman wrote: >> On 10/6/07, Matt Chambers <mat...@va...> wrote: >> >>> Good catches in the CV. Who is in charge of maintaining it and are they >>> reading this list? :) >>> >> I'm not sure I understand the implications of this. If the CV gets >> rearranged after the spec is out and people are using mzML file, will >> this be a problem? Does it matter if the CV usage in the mzML in >> front of me does not match my CV database? Or do I need to have all >> versions of the CV database? >> >> Mike >> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Eric D. <ede...@sy...> - 2007-10-08 23:37:06
|
Regarding, RDF I would like to suggest that this is not an option at this time. RDF was in fact suggested at the DC meeting, and it was concluded that it is such a departure from current formats, that we cannot support it at this time. We do not have the resources to pull it off. =20 Having said that, I would summarize RDF as the antithesis of everything you want out of mzML. RDF can be oversimplified (by me) as essentially a listing of facts of: Subject verb predicate wherein each noun and verb is carefully defined in an ontology (not just a controlled vocabulary) such that true meaning can be inferred from unstructured data. So, in pseuoRDF, our documents would go like this: =20 Eric has_produced this_mzML_document Eric is_a contact Eric has_full_name Eric Deutsch Eric has_email_address ede...@fu...o This_mzML_document is_a mzML_document This_mzML_document contains_a_run run1 Spectrum1 was_generated_in_run run1 Spectrum1 has_type precursor_ion_scan =20 The structure is that there is no structure. You are free to list every fact that is relevant in any order. However, each noun and verb must be defined in the context of an ontology (or probably multiple ontologies). =20 The beauty is that no one ever needs to argue about xsd schemas or two different formats for the same thing any more. Wheee! =20 The um, downside, is that your software to deal (effectively) with it needs to be 10x more brilliant than the best piece of code you've written so far. =20 Cheers, Eric =20 =20 =20 ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Brian Pratt Sent: Monday, October 08, 2007 4:14 PM To: psi...@li... Subject: Re: [Psidev-ms-dev] more is_a vs. part_of errors? =20 Hi Angel, =20 This may be a bit esoteric, but I wanted to ask what advantage RDF might have over the older W3C XML schema (.xsd). I'm unfamiliar with RDF, and from my 20 minutes of googling it appears rather more complex than .xsd - certainly more complex than it would need to be to handle the kinds of things mzData and mzXML do today, but I'm sure I'm flaunting my ignorance. =20 =20 I see that there are (but don't completely understand the nature of) relationships between RDF, OWL, OBO, and CV. Presumably you see some means of exploiting these relationships? I have a lot to learn if we go this route, but it sounds interesting. At least we'd get to say "semantic web" a lot, which sounds cool. =20 >> I believe that there is an OBO to RDF perl tools someplace. Maybe this (java, I think): http://www.cs.utexas.edu/~hamid/research/obo2owl.cgi =20 =20 Thanks, =20 Brian =20 ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Saturday, October 06, 2007 5:17 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] more is_a vs. part_of errors? =20 I wouldn't spend too much time trying to parse OBO files into XML schema. The format grew out of a need for quick and dirty CV with some ontology structure editing and there is really only one library editor that works with it, namely the author's tools of the OBO format itself.=20 As a side note, and completely my own opinion, but if mzML were to use RDF schema for the schema and RDF for the CV, validation and everything else would fall into place. I believe that there is an OBO to RDF perl tools someplace.=20 - angel On 10/6/07, Matt Chambers <mat...@va...> wrote: Good catches in the CV. Who is in charge of maintaining it and are they reading this list? :) I agree with auto-generating a XML schema with full semantic relationships encoded in it, direct from the CV, but you haven't addressed the issue I mentioned earlier. To do the auto-generation into CV params (if we choose method A) will be very ugly but it will allow for synonyms on the category names and value names. To implement the cvParam categories as XML elements though, you lose the=20 ability to have synonyms for category names (unless you use the accession number of the category as the element name, which makes me shudder), but the final schema would look a lot nicer. -Matt Brian Pratt wrote:=20 > > There are a handful of other cases where it appears that the authors > have gotten "is a" and "part_of" confused. My proposed corrections (IN > CAPS) inline: > > MS:1000025 "magnetic field strength"=20 > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000024 "final MS exponent" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description"=20 > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000022 "TOF Total Path Length" > > part of MS:1000480 "analyzer attribute"=20 > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000014 "accuracy" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description"=20 > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000106 "on" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute"=20 > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000105 "off" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description"=20 > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > The following changes would make the Thermo and ABI stuff look like=20 > all the other vendors: > > MS:1000495 "Applied Biosystems" > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > is a MS:1000031 "model by vendor" >=20 > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000176 "MAT95XP Trap" > > is a (IS_A) MS:1000493 "Finnigan MAT"=20 > > part of MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies"=20 > > MS:1000175 "MAT95XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor"=20 > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000174 "MAT900XP Trap" > > is a MS:1000493 "Finnigan MAT"=20 > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000173 "MAT900XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific"=20 > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000172 "MAT253"=20 > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description"=20 > > part of MS:0000000 "MZ controlled vocabularies" > > I still think there's a schema in there, albeit jammed in slightly > sideways at the moment. > > - Brian >=20 ------------------------------------------------------------------------ - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser.=20 Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev --=20 Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736=20 F: 215-573-9004=20 |
From: Angel P. <an...@ma...> - 2007-10-09 01:12:01
|
All good points by Eric, RDF can indeed be simplified to just the triplet tuple he mentions. The part he glosses over though is RDF schema, an this is where the main structure of an ontology is defined and constraints are defined on what (and maybe when / where, but don't quote me) you can put terms and values in instance RDF documents. Notice I used "ontology", not standard or data format. While we can bastardize an mzML written as RDFS an= d name a version as a standard, this is not really done in the RDF world. RDFS are released on fairly short timelines as new terms or changes are approved by a committee, so RDFS/RDF is more likely to be of use to the CV development team, certainly not the schema development efforts. An overriding concern of mine, though, is time. The mzML schema specification should in no way be held up by any issues dealing with RDFS and RDF, even the CV. Everyone is to invested in getting the product out th= e door. Standards don't need to be perfect, they just have to work (credit to Norman) So I am sorry to have brought it up, but hopefully Eric has shut the door o= n that option for mzML v1 -angel On 10/8/07, Eric Deutsch <ede...@sy...> wrote: > > Regarding, RDF I would like to suggest that this is not an option at thi= s > time. RDF was in fact suggested at the DC meeting, and it was concluded t= hat > it is such a departure from current formats, that we cannot support it at > this time. We do not have the resources to pull it off. > > > > Having said that, I would summarize RDF as the antithesis of everything > you want out of mzML. RDF can be oversimplified (by me) as essentially a > listing of facts of: > > Subject verb predicate > > wherein each noun and verb is carefully defined in an ontology (not just = a > controlled vocabulary) such that true meaning can be inferred from > unstructured data. So, in pseuoRDF, our documents would go like this: > > > > Eric has_produced this_mzML_document > > Eric is_a contact > > Eric has_full_name Eric Deutsch > > Eric has_email_address ede...@fu...o > > This_mzML_document is_a mzML_document > > This_mzML_document contains_a_run run1 > > Spectrum1 was_generated_in_run run1 > > Spectrum1 has_type precursor_ion_scan > > > > The structure is that there is no structure. You are free to list every > fact that is relevant in any order. However, each noun and verb must be > defined in the context of an ontology (or probably multiple ontologies). > > > > The beauty is that no one ever needs to argue about xsd schemas or two > different formats for the same thing any more. Wheee! > > > > The um, downside, is that your software to deal (effectively) with it > needs to be 10x more brilliant than the best piece of code you've written= so > far. > > > > Cheers, > > Eric > > > > > > > ------------------------------ > > *From:* psi...@li... [mailto: > psi...@li...] *On Behalf Of *Brian Pratt > *Sent:* Monday, October 08, 2007 4:14 PM > *To:* psi...@li... > *Subject:* Re: [Psidev-ms-dev] more is_a vs. part_of errors? > > > > Hi Angel, > > > > This may be a bit esoteric, but I wanted to ask what advantage RDF might > have over the older W3C XML schema (.xsd). I'm unfamiliar with RDF, and > from my 20 minutes of googling it appears rather more complex than .xsd = =96 > certainly more complex than it would need to be to handle the kinds of > things mzData and mzXML do today, but I'm sure I'm flaunting my ignorance= . > > > > I see that there are (but don't completely understand the nature of) > relationships between RDF, OWL, OBO, and CV. Presumably you see some mea= ns > of exploiting these relationships? I have a lot to learn if we go this > route, but it sounds interesting. At least we'd get to say "semantic web= " a > lot, which sounds cool. > > > > >> I believe that there is an OBO to RDF perl tools someplace. > > Maybe this (java, I think): > > http://www.cs.utexas.edu/~hamid/research/obo2owl.cgi<http://www.cs.utexas= .edu/%7Ehamid/research/obo2owl.cgi> > > > > > > Thanks, > > > > Brian > > > ------------------------------ > > *From:* psi...@li... [mailto: > psi...@li...] *On Behalf Of *Angel Pizarro > *Sent:* Saturday, October 06, 2007 5:17 PM > *To:* Mass spectrometry standard development > *Subject:* Re: [Psidev-ms-dev] more is_a vs. part_of errors? > > > > I wouldn't spend too much time trying to parse OBO files into XML schema. > The format grew out of a need for quick and dirty CV with some ontology > structure editing and there is really only one library editor that works > with it, namely the author's tools of the OBO format itself. > > As a side note, and completely my own opinion, but if mzML were to use RD= F > schema for the schema and RDF for the CV, validation and everything else > would fall into place. I believe that there is an OBO to RDF perl tools > someplace. > > - angel > > On 10/6/07, *Matt Chambers* <mat...@va...> wrote: > > Good catches in the CV. Who is in charge of maintaining it and are they > reading this list? :) I agree with auto-generating a XML schema with > full semantic relationships encoded in it, direct from the CV, but you > haven't addressed the issue I mentioned earlier. To do the > auto-generation into CV params (if we choose method A) will be very ugly > but it will allow for synonyms on the category names and value names. To > implement the cvParam categories as XML elements though, you lose the > ability to have synonyms for category names (unless you use the > accession number of the category as the element name, which makes me > shudder), but the final schema would look a lot nicer. > > -Matt > > Brian Pratt wrote: > > > > There are a handful of other cases where it appears that the authors > > have gotten "is a" and "part_of" confused. My proposed corrections (IN > > CAPS) inline: > > > > MS:1000025 "magnetic field strength" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000024 "final MS exponent" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000022 "TOF Total Path Length" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000014 "accuracy" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000106 "on" > > > > is a MS:1000021 "reflectron state" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000105 "off" > > > > is a MS:1000021 "reflectron state" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > The following changes would make the Thermo and ABI stuff look like > > all the other vendors: > > > > MS:1000495 "Applied Biosystems" > > > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000176 "MAT95XP Trap" > > > > is a (IS_A) MS:1000493 "Finnigan MAT" > > > > part of MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000175 "MAT95XP" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000174 "MAT900XP Trap" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000173 "MAT900XP" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000172 "MAT253" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > I still think there's a schema in there, albeit jammed in slightly > > sideways at the moment. > > > > - Brian > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > -- > Angel Pizarro > Director, Bioinformatics Facility > Institute for Translational Medicine and Therapeutics > University of Pennsylvania > 806 BRB II/III > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > > P: 215-573-3736 > F: 215-573-9004 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > --=20 Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |