From: Brian P. <bri...@in...> - 2007-08-08 15:49:39
|
If ionSelection is just one of many things that are too complicated and varied and dynamic to actually specify, then just off the top of my head I think it's going to be pretty hard to do a good job of parsing mzML. I take your point about mzXML being too specific, but there's such a thing as too general as well. My fear is that we'll see it balkanized, with most parsers only really able to deal with the mode of mzML usage that the author really cares about, which just leaves us with a bunch of ad hoc standards. The instrument name example (wherein a parser cannot be made robust enough to read future versions) makes me think that not enough mental energy has gone into considering the practicalities of being a consumer of mzML. I've seen this in other standards efforts I've been involved with in other industries (internet security, circuit board manufacturing) - writers (mostly hardware vendors) love the fexibility because they can just do it their way, but readers (software vendors) bear the brunt of what amounts to one format per vendor, and finally just fall back onto the per-vendor solutions they have already invested in. >> it is the same amount of work as if everything was in the schema. There actually *is* an advantage of specifying via schema instead of ontology, which I've already pointed out - W3C schema is itself a standard with a host of tools built up around it that will generate readers and writers from properly formed schemas. If mzML just used elements for everything and each element had an attribute pointing at the ontololgy I think we'd be better off. The schema and the ontology would need to evolve together, of course. But, as you say, this thing is more or less nailed down at this point, so I'm wasting the list's time with this schema talk, and I do apologise. I don't blame anyone for being annoyed at me dredging up these fundamental objections yet again so late in the process. Anyway, off for vacation until the end of next week. Sorry to start a flame then abandon it. Cheers, Brian _____ From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel Pizarro Sent: Wednesday, August 08, 2007 6:01 AM To: Brian Pratt Cc: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value On 8/7/07, Brian Pratt <bri...@in...> wrote: Hi Angel, If I understand your question to be about identifying current mismatches between terminology in the schema and the ontology, I'm not sure there are any - but probably only because the schema has so little actual terminology in it. My question was more of a pragmatic one, about where would you add specificity into the mzML schema. Your selecitonWindow example below is a good one, in that the specification of of selectWindow is probably a range value and we should have two sub-elements that corresponding to type the cvParam values to define the window (or just a well defined range sub-element, skipping cvParam altogether). I don't think your second example is a good one tho, since there are so many permutations of an ionSelection protocol and that more are certainly one the way, t is better handled by an ontology specification. Yes this does make parsers slightly harder, since now you must pay attention to the incoming ontology, but it is the same amount of work as if everything was in the schema. mzXML could get away with tight specification of these complex and changing annotations, since its sole purpose was support of the ISB pipeline. Its open source status only served to increase the user base, but the schema changes were solely driven by the needs of that pipeline and solely by the community that used it. Tryin to build consensus across many different groups has led to the current version of mzML and that major structure of mzML will not change at this point, so please let's just get to the specifics of going through the schema and identifying where you think an annotation should be promoted to the level of a schema element, and we'll discuss as a group. -angel Consider this example: <xs:element name="selectionWindow" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="cvParam" type="dx:CVParamType" minOccurs="2" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> which says absolutely nothing at all about what a selectionWindow element can be expected to contain when you encounter it. It just says it will contain at least two "parameters". Not much of an aid to software development. The schema, if we can call it that, doesn't even specify what some of the most fundamental information about a scan looks like. For example, it specifies that a scan may have a list of precursors, each of which will contain an ionSelection, but stops short of telling you what an ionSelection looks like: <xs:element name="ionSelection" type="dx:ParamGroupType"> <xs:annotation> <xs:documentation>This captures the type of ion selection being performed, and trigger m/z (or m/z's), neutral loss criteria etc. for tandem-MS or data dependent scans.</xs:documentation> </xs:annotation> </xs:element> Nearly all the details of nearly all the elements are just unspecified blobs. Normally with an XML format you can expect to at least start your work by running it through something like XMLSpy that will autogenerate a reader and a writer that you can then polish up (to handle, for example, the necessary weirdness of base64+zlib in the peaklists). But with this, you get no kind of a head start at all, since the vast majority of the syntax is hidden behind blobs like dx:CVParamType and dx:ParamGroupType . It's just not a specification. The statement that led to your question, I think, was just me saying that if we *did* create an actual schema, we'd want its terminology to agree with the ontology where ever possible. But it has to actually contain some terminology, unlike the current schema. Brian _____ From: del...@gm... [mailto:del...@gm...] On Behalf Of Angel Pizarro Sent: Tuesday, August 07, 2007 1:10 PM To: Brian Pratt Cc: psi...@li... Subject: Re: [Psidev-ms-dev] cvParams using name attribute as value On 8/7/07, Brian Pratt <bri...@in...> wrote: Hey, the horse just twitched: by placing CVparam information in attributes of the elements of a conventionally structured XML schema (ala mzXML) we can make use of the OBO work without adding a lot of unwanted complexity to software systems that aren't really interested in it. An mzML that integrates well with OBO-aware systems is an excellent idea, but an mzML that demands you BE an OBO-aware system seems less likely to achieve widespread adoption. Can you name specific attributes that you want to have cv terms be the value for that are currently not in the schema? -angel ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |