You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Angel P. <an...@ma...> - 2007-10-08 19:14:52
|
On 10/8/07, Eric Deutsch <ede...@sy...> wrote: > > > Hi everyone, since the flurry is starting to subside a little, I've > started trying to summarize the conversation thus far so converge on > some consensus. > > Regarding this count attribute issue, I tally: > > - Angel discourages them "discourage" is a good word ;) From Matt and Brain's later replies, it seems that they would rather have them, and I really don't mind if the count att stays. -angel - Matt is neutral > - Mike does not want them > - silence from everyone else > > |
From: Brian P. <bri...@in...> - 2007-10-08 18:39:10
|
Eh, it's even more broken than I thought. I've amended my amendments inline below, new changes in double parenthesis. After a day so of messing with this, it is now: MANIFESTO TIME! RESOLVED: The mzML specification process should be schema-centric, and the CV should be generated from the schema (should be a fairly simple matter of XSLT, since XSD is itself XML). REASON 1: THE CV-CENTRIC APPROACH IS ERROR PRONE. The kinds of inheritance errors shown below are, if not actually impossible, much harder to make in the context of a W3C schema when using readily available software tools to create and maintain the schema. REASON 2: OBO/CV IS AN INSUFFICIENT TOOL FOR THE JOB OF PRODUCING A READILY AND THOROUGHLY VALIDATABLE DATA FORMAT. CV apparently provides no means for specifying range or formatting of instance values. An "isolation width" (MS:1000023) could happily have a value of "-2", "2", "two", or "extra sprinkles, please". You could (and should) certainly put some text in the description along the lines of "this is a non-negative floating point value" but that's no help to a validating parser. XSD on the other hand has standardized syntax for enforcing precisely these kinds of restrictions, meaning that validating parsers and code generators (for both read and write) don't need any special-purpose logic added. There are a handful of places where value range restrictions have been attempted in the MS CV, but these are awkward because of the tools. The reflectron_state, for example, has two children "on" and "off", but this only confuses things, since these are not *values* of reflectron state but rather *are* reflectron states, a distinction which may be meaningless in English but significant when attempting to create a data structure. Picture how this looks in an instance doc: <cvParam cvLabel="MS" accession="MS:1000105" name="off" value="" /> I can't think of anything nice to say about that. Better it should read: <reflectronState accession="MS:1000021" off/> CONCLUSION: THE CV WORK TO DATE IS IMPORTANT AND USEFUL, BUT SHOULD BE RECAST AS SCHEMA WORK The CV should not attempt to be a replacement for the schema - it just hasn't got the requisite mechanisms to do the job. The information CV can convey is only a subset of the information that is needed to fully specify a data format. The information in the CV as it stands should be folded into the mzML schema, and maintained therein moving forward. An actual OBO/CV file can be generated as needed. - Brian _____ From: Brian Pratt [mailto:bri...@in...] Sent: Friday, October 05, 2007 11:52 PM To: 'Mass spectrometry standard development' Subject: more is_a vs. part_of errors? There are a handful of other cases where it appears that the authors have gotten "is a" and "part_of" confused. My proposed corrections (IN CAPS) inline: MS:1000025 "magnetic field strength" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000024 "final MS exponent" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000022 "TOF Total Path Length" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000014 "accuracy" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" ((note, these next two are just ugly, see notes at top of message)) MS:1000106 "on" is a MS:1000021 "reflectron state" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000105 "off" is a MS:1000021 "reflectron state" part of ((IS_A)) MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" The following changes would make the Thermo and ABI stuff look like all the other vendors: MS:1000495 "Applied Biosystems" part of (IS_A) MS:1000121 "ABI / SCIEX" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000176 "MAT95XP Trap" is a (IS_A) MS:1000493 "Finnigan MAT" part of MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000175 "MAT95XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000174 "MAT900XP Trap" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000173 "MAT900XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000172 "MAT253" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" I still think there's a schema in there, albeit jammed in slightly sideways at the moment. (( I don't think that anymore. I think there's a subset of a schema in there. )) - Brian |
From: Brian P. <bri...@in...> - 2007-10-08 18:26:23
|
I'm strongly "pro" counts, both for human readability and for performance reasons. They can reduce heap thrash since you can preallocate. Yeah, you can't really trust them, but they *tend* to be right so it *tends* to be the right allocation, but one must program defensively all the same. - Brian >> I'm interested to hear what Brian's found with his explorations >> in CV schema generation. :) Results on the way! Batten down the hatches. -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Monday, October 08, 2007 11:11 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] mzML 0.99 remarks Cons: none (it can be harmlessly ignored regardless of whether it's accurate or not) Pros: * shows humans counts without a 'find all' program (especially useful for spectral counts) * allows for implementing a simple preallocating reader with poor error handling (who are we to say that users shouldn't be able to do it?) I'm interested to hear what Brian's found with his explorations in CV schema generation. :) -Matt Eric Deutsch wrote: > Hi everyone, since the flurry is starting to subside a little, I've > started trying to summarize the conversation thus far so converge on > some consensus. > > Regarding this count attribute issue, I tally: > > - Angel discourages them > - Matt is neutral > - Mike does not want them > - silence from everyone else > > We had included them at the recommendation from someone who was > programming a reader in a language that requires explicit memory > allocation. He felt it would be very helpful to have these so that such > software could preallocate memory. > > Obviously, such software would have to be careful about overruns and > either generate an error when counts are wrong or gracefully adapt to > the reality. If the code would have to gracefully adapt, then perhaps > that's no harder than not knowing in the first place. > > So let's here it from anyone who will want to use count attributes when > reading mzML. If no one wants them, we may as well drop them! > > Thanks, > Eric > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2007-10-08 18:10:59
|
Cons: none (it can be harmlessly ignored regardless of whether it's accurate or not) Pros: * shows humans counts without a 'find all' program (especially useful for spectral counts) * allows for implementing a simple preallocating reader with poor error handling (who are we to say that users shouldn't be able to do it?) I'm interested to hear what Brian's found with his explorations in CV schema generation. :) -Matt Eric Deutsch wrote: > Hi everyone, since the flurry is starting to subside a little, I've > started trying to summarize the conversation thus far so converge on > some consensus. > > Regarding this count attribute issue, I tally: > > - Angel discourages them > - Matt is neutral > - Mike does not want them > - silence from everyone else > > We had included them at the recommendation from someone who was > programming a reader in a language that requires explicit memory > allocation. He felt it would be very helpful to have these so that such > software could preallocate memory. > > Obviously, such software would have to be careful about overruns and > either generate an error when counts are wrong or gracefully adapt to > the reality. If the code would have to gracefully adapt, then perhaps > that's no harder than not knowing in the first place. > > So let's here it from anyone who will want to use count attributes when > reading mzML. If no one wants them, we may as well drop them! > > Thanks, > Eric > > |
From: Eric D. <ede...@sy...> - 2007-10-08 18:00:24
|
Hi everyone, since the flurry is starting to subside a little, I've started trying to summarize the conversation thus far so converge on some consensus. Regarding this count attribute issue, I tally: - Angel discourages them - Matt is neutral - Mike does not want them - silence from everyone else We had included them at the recommendation from someone who was programming a reader in a language that requires explicit memory allocation. He felt it would be very helpful to have these so that such software could preallocate memory. Obviously, such software would have to be careful about overruns and either generate an error when counts are wrong or gracefully adapt to the reality. If the code would have to gracefully adapt, then perhaps that's no harder than not knowing in the first place. So let's here it from anyone who will want to use count attributes when reading mzML. If no one wants them, we may as well drop them! Thanks, Eric > From: psi...@li... [mailto:psidev-ms-dev- >=20 > Mike Coleman wrote: > > I see what you're saying, but I'm very sympathetic to Angel's point as > > well. If nothing else, a well-written piece of software needs to emit > > a warning upon seeing an inconsistent count field. This means that it > > needs to calculate the correct value, which mostly eliminates the > > value of having it in the file in the first place. > > > Steps to implement a reader dealing with invalid count attributes: > 1) Ignore the count > 2) Keep counters of how many list elements (whatever is being counted) > were actually read in > 3) If you write out mzML, use your counters' values (subtracting the # > of elements filtered out and adding the # of elements added) > 4) Voila, valid mzML by ignoring count attributes! :) >=20 > It's considerably more difficult to implement readers which work with > references to previously parsed elements (i.e. ParamGroups and > references to them). And that's NOT something that an implementation > can ignore! >=20 > -Matt >=20 > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Chris T. <chr...@eb...> - 2007-10-07 21:37:29
|
Hiya. I think deprecation is pretty standard and is implemented for PSI CVs iirc. That way a term is not substantively changed, just flagged as deprecated with a pointer to the new preferred term and therefore nothing breaks (although terms are obviously added elsewhere as part of the process, but then they will be being added all the time anyway so a frozen CV is not that great an idea but updates need not be nightly as say OLS is (again, iirc)). Cheers, Chris. Matt Chambers wrote: > The backward compatibility problem is not unique to CVs; backwards > compatibility is just as much a problem with XML schema. If there is > ever a change to the schema or CV which moves or deletes an existing > term, old files are likely to be invalidated. This can be avoided by > never moving or deleting existing terms, or if the appearance of moving > and deleting terms can't be avoided, then adding support for deprecating > instead of deleting. > > -Matt > > Mike Coleman wrote: >> On 10/6/07, Matt Chambers <mat...@va...> wrote: >> >>> Good catches in the CV. Who is in charge of maintaining it and are they >>> reading this list? :) >>> >> I'm not sure I understand the implications of this. If the CV gets >> rearranged after the spec is out and people are using mzML file, will >> this be a problem? Does it matter if the CV usage in the mzML in >> front of me does not match my CV database? Or do I need to have all >> versions of the CV database? >> >> Mike >> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Matt C. <mat...@va...> - 2007-10-07 04:25:23
|
The backward compatibility problem is not unique to CVs; backwards compatibility is just as much a problem with XML schema. If there is ever a change to the schema or CV which moves or deletes an existing term, old files are likely to be invalidated. This can be avoided by never moving or deleting existing terms, or if the appearance of moving and deleting terms can't be avoided, then adding support for deprecating instead of deleting. -Matt Mike Coleman wrote: > On 10/6/07, Matt Chambers <mat...@va...> wrote: > >> Good catches in the CV. Who is in charge of maintaining it and are they >> reading this list? :) >> > > I'm not sure I understand the implications of this. If the CV gets > rearranged after the spec is out and people are using mzML file, will > this be a problem? Does it matter if the CV usage in the mzML in > front of me does not match my CV database? Or do I need to have all > versions of the CV database? > > Mike > > |
From: Mike C. <tu...@gm...> - 2007-10-07 01:20:54
|
On 10/6/07, Matt Chambers <mat...@va...> wrote: > Good catches in the CV. Who is in charge of maintaining it and are they > reading this list? :) I'm not sure I understand the implications of this. If the CV gets rearranged after the spec is out and people are using mzML file, will this be a problem? Does it matter if the CV usage in the mzML in front of me does not match my CV database? Or do I need to have all versions of the CV database? Mike |
From: Angel P. <an...@ma...> - 2007-10-07 00:16:54
|
I wouldn't spend too much time trying to parse OBO files into XML schema. The format grew out of a need for quick and dirty CV with some ontology structure editing and there is really only one library editor that works with it, namely the author's tools of the OBO format itself. As a side note, and completely my own opinion, but if mzML were to use RDF schema for the schema and RDF for the CV, validation and everything else would fall into place. I believe that there is an OBO to RDF perl tools someplace. - angel On 10/6/07, Matt Chambers <mat...@va...> wrote: > > Good catches in the CV. Who is in charge of maintaining it and are they > reading this list? :) I agree with auto-generating a XML schema with > full semantic relationships encoded in it, direct from the CV, but you > haven't addressed the issue I mentioned earlier. To do the > auto-generation into CV params (if we choose method A) will be very ugly > but it will allow for synonyms on the category names and value names. To > implement the cvParam categories as XML elements though, you lose the > ability to have synonyms for category names (unless you use the > accession number of the category as the element name, which makes me > shudder), but the final schema would look a lot nicer. > > -Matt > > Brian Pratt wrote: > > > > There are a handful of other cases where it appears that the authors > > have gotten "is a" and "part_of" confused. My proposed corrections (IN > > CAPS) inline: > > > > MS:1000025 "magnetic field strength" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000024 "final MS exponent" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000022 "TOF Total Path Length" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000014 "accuracy" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000106 "on" > > > > is a MS:1000021 "reflectron state" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000105 "off" > > > > is a MS:1000021 "reflectron state" > > > > part of MS:1000480 "analyzer attribute" > > > > is a (PART_OF) MS:1000451 "analyzer description" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > The following changes would make the Thermo and ABI stuff look like > > all the other vendors: > > > > MS:1000495 "Applied Biosystems" > > > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000176 "MAT95XP Trap" > > > > is a (IS_A) MS:1000493 "Finnigan MAT" > > > > part of MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000175 "MAT95XP" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000174 "MAT900XP Trap" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000173 "MAT900XP" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > MS:1000172 "MAT253" > > > > is a MS:1000493 "Finnigan MAT" > > > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > > > is a MS:1000031 "model by vendor" > > > > part of MS:1000463 "instrument description" > > > > part of MS:0000000 "MZ controlled vocabularies" > > > > I still think there's a schema in there, albeit jammed in slightly > > sideways at the moment. > > > > - Brian > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Matt C. <mat...@va...> - 2007-10-06 18:35:23
|
Good catches in the CV. Who is in charge of maintaining it and are they reading this list? :) I agree with auto-generating a XML schema with full semantic relationships encoded in it, direct from the CV, but you haven't addressed the issue I mentioned earlier. To do the auto-generation into CV params (if we choose method A) will be very ugly but it will allow for synonyms on the category names and value names. To implement the cvParam categories as XML elements though, you lose the ability to have synonyms for category names (unless you use the accession number of the category as the element name, which makes me shudder), but the final schema would look a lot nicer. -Matt Brian Pratt wrote: > > There are a handful of other cases where it appears that the authors > have gotten “is a” and “part_of” confused. My proposed corrections (IN > CAPS) inline: > > MS:1000025 "magnetic field strength" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000024 "final MS exponent" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000022 "TOF Total Path Length" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000014 "accuracy" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000106 "on" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000105 "off" > > is a MS:1000021 "reflectron state" > > part of MS:1000480 "analyzer attribute" > > is a (PART_OF) MS:1000451 "analyzer description" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > The following changes would make the Thermo and ABI stuff look like > all the other vendors: > > MS:1000495 "Applied Biosystems" > > part of (IS_A) MS:1000121 "ABI / SCIEX" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000176 "MAT95XP Trap" > > is a (IS_A) MS:1000493 "Finnigan MAT" > > part of MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000175 "MAT95XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000174 "MAT900XP Trap" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000173 "MAT900XP" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > MS:1000172 "MAT253" > > is a MS:1000493 "Finnigan MAT" > > part of (IS_A) MS:1000483 "Thermo Fisher Scientific" > > is a MS:1000031 "model by vendor" > > part of MS:1000463 "instrument description" > > part of MS:0000000 "MZ controlled vocabularies" > > I still think there’s a schema in there, albeit jammed in slightly > sideways at the moment. > > - Brian > |
From: Brian P. <bri...@in...> - 2007-10-06 06:53:06
|
There are a handful of other cases where it appears that the authors have gotten "is a" and "part_of" confused. My proposed corrections (IN CAPS) inline: MS:1000025 "magnetic field strength" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000024 "final MS exponent" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000022 "TOF Total Path Length" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000014 "accuracy" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000106 "on" is a MS:1000021 "reflectron state" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000105 "off" is a MS:1000021 "reflectron state" part of MS:1000480 "analyzer attribute" is a (PART_OF) MS:1000451 "analyzer description" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" The following changes would make the Thermo and ABI stuff look like all the other vendors: MS:1000495 "Applied Biosystems" part of (IS_A) MS:1000121 "ABI / SCIEX" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000176 "MAT95XP Trap" is a (IS_A) MS:1000493 "Finnigan MAT" part of MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000175 "MAT95XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000174 "MAT900XP Trap" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000173 "MAT900XP" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" MS:1000172 "MAT253" is a MS:1000493 "Finnigan MAT" part of (IS_A) MS:1000483 "Thermo Fisher Scientific" is a MS:1000031 "model by vendor" part of MS:1000463 "instrument description" part of MS:0000000 "MZ controlled vocabularies" I still think there's a schema in there, albeit jammed in slightly sideways at the moment. - Brian |
From: Brian P. <bri...@in...> - 2007-10-06 06:23:12
|
That's how it looks to me too. I think this needs to be fixed, as I understand it the ontology is meant to be free of ambiguous terms. Thanks, Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Chris Taylor Sent: Friday, October 05, 2007 5:11 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] CV is broken? Actually I think the problem here is overloading of a term -- the thing is used in two different ways -- there is a description of the physical reality of the ion source (it does DE) and there is a term in a description -- really the problem here is that what is implied is that either a datum is part of the ion optics of the physical instance of a mass spec, or that a description (an abstract that can be manifest in files or whatever) contains a physical entity (DE-source bits). I think that's it anyway. So really the issue is the combination of two related but different things in one concept. Am I right? Brian Pratt wrote: > I think we have some early fruit from my messing around with OBO->W3C > schema conversion. > > > > In the CV file > http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controll edVocabulary/psi-ms.obo > there is exactly one term that claims both an is_a and part_of relationship: > > > > [Term] > > id: MS:1000246 > > name: delayed extraction > > def: "The application of the accelerating voltage pulse after a time > delay in desorption ionization from a surface. The extraction delay can > produce energy focusing in a time-of-flight mass spectrometer." [PSI:MS] > > exact_synonym: "DE" [] > > is_a: MS:1000462 ! ion optics > > relationship: part_of MS:1000456 ! precursor activation description > > > > Let's follow the inheritance chains: > > > > MS:1000246 "delayed extraction" is_a > > MS:1000462 "ion optics" part_of > > MS:1000463 "instrument description" part_of > > MS:0000000 "MZ controlled vocabularies" > > > > And also, > > > > MS:1000246 "delayed extraction" part_of > > MS:1000456 "precursor activation description" part_of > > MS:1000442 "spectrum" part_of > > MS:0000000 "MZ controlled vocabularies" > > > > So: > > A is a kind of B > > A is a part of C > > B is not a part of C > > > > This would appear to violate the transitive property of the is_a and > part_of relationships. Normally in discussing inheritance one views "is > a" and "has a" (or in the topsy-turvy world of OBO, "part of") as being > distinct and mutually exclusive ideas. > > > > Actually the format itself is a bit of a surprise, I had anticipated > "is_a" being an enumerated type of "relationship" as "part_of" is. If > this MS:1000246 is simply a victim of a clerical error, as I suspect it > is, then a tidier representation of inheritance would have helped catch > the problem sooner. > > > > - Brian > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Chris T. <chr...@eb...> - 2007-10-06 00:11:18
|
Actually I think the problem here is overloading of a term -- the thing is used in two different ways -- there is a description of the physical reality of the ion source (it does DE) and there is a term in a description -- really the problem here is that what is implied is that either a datum is part of the ion optics of the physical instance of a mass spec, or that a description (an abstract that can be manifest in files or whatever) contains a physical entity (DE-source bits). I think that's it anyway. So really the issue is the combination of two related but different things in one concept. Am I right? Brian Pratt wrote: > I think we have some early fruit from my messing around with OBO->W3C > schema conversion. > > > > In the CV file > http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo > there is exactly one term that claims both an is_a and part_of relationship: > > > > [Term] > > id: MS:1000246 > > name: delayed extraction > > def: "The application of the accelerating voltage pulse after a time > delay in desorption ionization from a surface. The extraction delay can > produce energy focusing in a time-of-flight mass spectrometer." [PSI:MS] > > exact_synonym: "DE" [] > > is_a: MS:1000462 ! ion optics > > relationship: part_of MS:1000456 ! precursor activation description > > > > Let's follow the inheritance chains: > > > > MS:1000246 "delayed extraction" is_a > > MS:1000462 "ion optics" part_of > > MS:1000463 "instrument description" part_of > > MS:0000000 "MZ controlled vocabularies" > > > > And also, > > > > MS:1000246 "delayed extraction" part_of > > MS:1000456 "precursor activation description" part_of > > MS:1000442 "spectrum" part_of > > MS:0000000 "MZ controlled vocabularies" > > > > So: > > A is a kind of B > > A is a part of C > > B is not a part of C > > > > This would appear to violate the transitive property of the is_a and > part_of relationships. Normally in discussing inheritance one views “is > a” and “has a” (or in the topsy-turvy world of OBO, “part of”) as being > distinct and mutually exclusive ideas. > > > > Actually the format itself is a bit of a surprise, I had anticipated > “is_a” being an enumerated type of “relationship” as “part_of” is. If > this MS:1000246 is simply a victim of a clerical error, as I suspect it > is, then a tidier representation of inheritance would have helped catch > the problem sooner. > > > > - Brian > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Brian P. <bri...@in...> - 2007-10-05 19:43:07
|
I think we have some early fruit from my messing around with OBO->W3C schema conversion. In the CV file <http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/control ledVocabulary/psi-ms.obo> http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controll edVocabulary/psi-ms.obo there is exactly one term that claims both an is_a and part_of relationship: [Term] id: MS:1000246 name: delayed extraction def: "The application of the accelerating voltage pulse after a time delay in desorption ionization from a surface. The extraction delay can produce energy focusing in a time-of-flight mass spectrometer." [PSI:MS] exact_synonym: "DE" [] is_a: MS:1000462 ! ion optics relationship: part_of MS:1000456 ! precursor activation description Let's follow the inheritance chains: MS:1000246 "delayed extraction" is_a MS:1000462 "ion optics" part_of MS:1000463 "instrument description" part_of MS:0000000 "MZ controlled vocabularies" And also, MS:1000246 "delayed extraction" part_of MS:1000456 "precursor activation description" part_of MS:1000442 "spectrum" part_of MS:0000000 "MZ controlled vocabularies" So: A is a kind of B A is a part of C B is not a part of C This would appear to violate the transitive property of the is_a and part_of relationships. Normally in discussing inheritance one views "is a" and "has a" (or in the topsy-turvy world of OBO, "part of") as being distinct and mutually exclusive ideas. Actually the format itself is a bit of a surprise, I had anticipated "is_a" being an enumerated type of "relationship" as "part_of" is. If this MS:1000246 is simply a victim of a clerical error, as I suspect it is, then a tidier representation of inheritance would have helped catch the problem sooner. - Brian |
From: Matthew C. <mat...@va...> - 2007-10-05 18:32:49
|
Mike Coleman wrote: > I see what you're saying, but I'm very sympathetic to Angel's point as > well. If nothing else, a well-written piece of software needs to emit > a warning upon seeing an inconsistent count field. This means that it > needs to calculate the correct value, which mostly eliminates the > value of having it in the file in the first place. > Steps to implement a reader dealing with invalid count attributes: 1) Ignore the count 2) Keep counters of how many list elements (whatever is being counted) were actually read in 3) If you write out mzML, use your counters' values (subtracting the # of elements filtered out and adding the # of elements added) 4) Voila, valid mzML by ignoring count attributes! :) It's considerably more difficult to implement readers which work with references to previously parsed elements (i.e. ParamGroups and references to them). And that's NOT something that an implementation can ignore! -Matt |
From: Mike C. <tu...@gm...> - 2007-10-05 18:17:35
|
On 10/5/07, Matthew Chambers <mat...@va...> wrote: > Angel Pizarro wrote: > > (b) all read tools must check the count attribute's correctness since > > (a) may not actually be true, thus completely defeating the point of > > having this attribute in the first place. > > > I do not see how you come to this conclusion. Read tools do not need to > check the correctness, they can choose to parse whatever is there. I see what you're saying, but I'm very sympathetic to Angel's point as well. If nothing else, a well-written piece of software needs to emit a warning upon seeing an inconsistent count field. This means that it needs to calculate the correct value, which mostly eliminates the value of having it in the file in the first place. Additionally, some pieces of software might want to process these files in a "dumb" mode. So, for example, if a program reads and mzML file with an invalid count and then just prints what it read, it will be generating invalid output. Similarly, it won't be possible to just drop some elements in a dumb way, as counts will need updating, which necessitates a slightly smarter tool. An additional problem with counts is that each program has to decide upon a threshold of outlandishness. That is, if the count says "one trillion", do you trust it and allocate the memory, or balk? History suggests that counts that seem unreasonable now may become typical later, which means that you're buying a maintenance problem. None of this necessarily outweighs the value of providing counts. I will never trust or use them in the programs that I write, though (and I wish I didn't have to generate them). Mike |
From: Angel P. <an...@ma...> - 2007-10-05 17:14:29
|
On 10/5/07, Matthew Chambers <mat...@va...> wrote: > > Angel Pizarro wrote: > > Just finished going through the specification, which is great BTW. > > Just have a few notes/questions on the spec/schema as it stands. I'll > > also post it these to the PSI site. > > > > (1) sourceFileRef in multiple places > > > > Why does this exist > > run -> spectrumList -> spectrum [:sourceFileRef => anyURI ] > > when there is this? > > run -> sourceFileRefList -> sourceFileRef [:ref => anyURI ] > I agree, I don't see a reason for the sourceFileRefList. Only a > sourceFileList is needed. Geez, I just realized with Matt's response what these attrs were for! OK, that means that the spec needs to be augmented a bit with the following information: mzML -> [fileDescription -> sourceFileList -> sourceFile ] all these tags in brackets need better documentation mzML -> run -> ssourceFileRefList -> sourceFileRef should probably go away, since there can only be one run present in an mzML file, hence what is the point of referencing more files in mzML -> fileDescription -> sourceFileList than were output to the run? mzML -> run -> spectrumList -> spectrum -> sourceFileRef should documented to say that this reference is an internal pointer to the particular sourceFile in the whole document's sourceFileList that gave rise to this particular spectrum. If a sourceFile contains more than one spectrum, then the scanNumber attribute serves to disambiguate it from its siblings. -angel |
From: Matthew C. <mat...@va...> - 2007-10-05 17:06:26
|
Angel Pizarro wrote: > > The count atts are required, so you can't just ignore them. Plus if > you do, then you won't be playing nice with other tools out there that > do use them. Meaning that: > I meant ignore them while reading, which is entirely possible. Ignoring them while writing would not meet the spec. > (a) all write tools must encode counts properly Yes, and this should be very easy to do. > (b) all read tools must check the count attribute's correctness since > (a) may not actually be true, thus completely defeating the point of > having this attribute in the first place. > I do not see how you come to this conclusion. Read tools do not need to check the correctness, they can choose to parse whatever is there. If a reader reads the count element for spectra and pre-allocates memory for <count> spectra objects, that's the reader's choice. Readers don't HAVE to do that, it's just there for convenience. > Simpler on everybody if we just get rid of it in these spots. I routinely ignore count attributes in my XML parsers. Like I said, it's mainly convenient for human readability. Otherwise, readers have to do a "find all" on the element type, which is also not very hard, but some might complain. -Matt |
From: Angel P. <an...@ma...> - 2007-10-05 17:00:22
|
On 10/5/07, Matthew Chambers <mat...@va...> wrote: > > Angel Pizarro wrote: > > (2) count attributes in list like element types > > > > I *really* don't like the count attribute in list types (e.g. > > instrumentList[@count]). I think they are not too informative and > > prone to error (just another condition to code and maintain) > If you don't want to maintain the count attributes, ignore them. :) > They are mainly useful for human consumption, or if you wanted to write > a very (but bulky) fast parser with low error checking. The count atts are required, so you can't just ignore them. Plus if you do, then you won't be playing nice with other tools out there that do use them. Meaning that: (a) all write tools must encode counts properly (b) all read tools must check the count attribute's correctness since (a) may not actually be true, thus completely defeating the point of having this attribute in the first place. Simpler on everybody if we just get rid of it in these spots. BTW this is a different issue than the arrayLength attribute on the binary arrays, which I think have enough of a pay-off to justify their existence. -angel -Matt > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |
From: Matthew C. <mat...@va...> - 2007-10-05 16:06:46
|
Angel Pizarro wrote: > Just finished going through the specification, which is great BTW. > Just have a few notes/questions on the spec/schema as it stands. I'll > also post it these to the PSI site. > > (1) sourceFileRef in multiple places > > Why does this exist > run -> spectrumList -> spectrum [:sourceFileRef => anyURI ] > when there is this? > run -> sourceFileRefList -> sourceFileRef [:ref => anyURI ] I agree, I don't see a reason for the sourceFileRefList. Only a sourceFileList is needed. > > When would you use one over the other or would you have to specify > both or what? The spec should cover this a bit better. > > (2) count attributes in list like element types > > I *really* don't like the count attribute in list types (e.g. > instrumentList[@count]). I think they are not too informative and > prone to error (just another condition to code and maintain) If you don't want to maintain the count attributes, ignore them. :) They are mainly useful for human consumption, or if you wanted to write a very (but bulky) fast parser with low error checking. -Matt |
From: Angel P. <an...@ma...> - 2007-10-05 15:01:38
|
Just finished going through the specification, which is great BTW. Just have a few notes/questions on the spec/schema as it stands. I'll also post it these to the PSI site. (1) sourceFileRef in multiple places Why does this exist run -> spectrumList -> spectrum [:sourceFileRef => anyURI ] when there is this? run -> sourceFileRefList -> sourceFileRef [:ref => anyURI ] When would you use one over the other or would you have to specify both or what? The spec should cover this a bit better. (2) count attributes in list like element types I *really* don't like the count attribute in list types (e.g. instrumentList[@count]). I think they are not too informative and prone to error (just another condition to code and maintain) -angel |
From: Angel P. <an...@ma...> - 2007-10-05 12:20:23
|
On 10/4/07, Jimmy Eng <jk...@gm...> wrote: > > Angel, counter to what you're suggesting, I do believe that mzML was > developed to at least try and be an operational format also. > Otherwise, there would not be a need for a scan index with file offset > pointers in the wrapper schema, no? Very true. And I hope that decent performance comes from the API's written for the format. I am playing devils advocate here. Call me a pessimist, but I don't think any instrument manufacturer is going to use mzML as their native format (Or at least it will take on the order of 4 or more years for this to happen). If vendors do adopt it as the native format, great! I would be more than ecstatic, but I am not holding my breadth. Vendors, please correct me if I am making a wrong assumption here. Silence == agreement ;) The primary reason why mzXML was developed was to replace native MS > binary data files with something transparent and platform neutral (and > be an operational format for tools that consume these files). > Obviously everyone imagines mzML to address many, and it looks like > sometimes different & non-inclusive, use cases. My short sighted > personal interest is to see mzML address the operational raw file > replacement use case succinctly w/o any adverse complexities to make > its adoption for this use case difficult. Otherwise Angel's proposal > of mzML->SRF, mzML->mgf, and I dare say mzML->mzXML is going to end up > being reality for some subset of users. And in the world of these > users, why bother going from native->mzML->XYZ if native files are > around and you can do native->XYZ? My hope is that by having mzML in the middle, we can reliably say SRF == MGF, where with the current situation of native -> XYZ, we just can't make that claim. Also, it is my hope to reduce the burden of 3rd party vendors by having mzML be the officially supported format for input. -angel |
From: Jimmy E. <jk...@gm...> - 2007-10-05 02:30:48
|
Angel, counter to what you're suggesting, I do believe that mzML was developed to at least try and be an operational format also. Otherwise, there would not be a need for a scan index with file offset pointers in the wrapper schema, no? The primary reason why mzXML was developed was to replace native MS binary data files with something transparent and platform neutral (and be an operational format for tools that consume these files). Obviously everyone imagines mzML to address many, and it looks like sometimes different & non-inclusive, use cases. My short sighted personal interest is to see mzML address the operational raw file replacement use case succinctly w/o any adverse complexities to make its adoption for this use case difficult. Otherwise Angel's proposal of mzML->SRF, mzML->mgf, and I dare say mzML->mzXML is going to end up being reality for some subset of users. And in the world of these users, why bother going from native->mzML->XYZ if native files are around and you can do native->XYZ? Sorry I can't contribute to the cvParam talk here because I don't even know what that is! :) On 10/4/07, Angel Pizarro <an...@ma...> wrote: > On 10/4/07, Brian Pratt <bri...@in...> wrote: > > Hi Angel, > > > > I fear I may be misunderstanding your point, though? It might be read as > implying, for example, that converting from mzML back to mzXML for the > purposes of ASAPRatio and its elution profiling is a proper thing to do, but > I don't expect that's what you meant to say. Can you clarify? > > > Yep, that's exactly what I was proposing, but maybe ASAP ration is a bad > example since ASAP ratio is open source and controlled by the TPP folks ;) A > better example would be sequest and bioworks, which uses a binary file > format for storing processed peaks and the result in one file. The > conversion would be mzML -> RAW/SRF -> SRF -> whatever you want here. The > pay-off for bioworks to do something like this is fine-tuned random access > for spectral processing. Plus the code investment in supporting mzML is > relatively small and restricted to in/out of their format. > > Actually I take it back, ASAPR is a good example b/c using this model of > translating an archive format to/from operational formats allows the ISB to > put its development effort on newer algorithms, and prevent older projects > from being put out to pasture. > > -angel > > > > > > > > > > > Thanks, > > > > > > > > Brian > > > > > > > > > > > > ________________________________ > > > > > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Angel Pizarro > > Sent: Thursday, October 04, 2007 1:10 PM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] honey vs vinegar > > > > > > > > > > On 10/4/07, Brian Pratt <bri...@in...> wrote: > > > > > > > > These are interesting questions about how folks will use the format. I'm > > not comfortable with the idea that the format is intended for repositories > > instead of processing. I'd think you'd want a repository to contain > exactly > > the same artifacts that were processed lest anyone wonder later what > > differences may have existed in the various representations of the data. > > > > > > > > I think we agree here but are coming from different perspectives. In my > mind in order for a repository to have the most accurate representation of > the data, the standard has to be purposed for data archival and flexible > experimental annotation. Data processing routines would then take that > format and do whatever it will for peak detection, noise reduction, > base-line correction, etc. to give a final set of values (that typically go > into the search algorithms). All of the intermediate steps in the processing > should in theory be able to be represented by the same format. > > > > I think that mzML as it stands is able to do track the data and the > processes that where applied to it, but it will certainly not be the most > efficient way to represent the data *as the processing is being done*. A > special purpose format for the algorithm at hand will always win in terms of > engineering ease / speed / performance / interoperability (within a set of > tools). > > > > This I think is at the heart of the whole discussion, and why I think > cvParam is always getting hammered on the list. So while it seems that we > are talking cross purposes, I really don't think we are. > > > > -angel > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > -- > Angel Pizarro > Director, Bioinformatics Facility > Institute for Translational Medicine and Therapeutics > University of Pennsylvania > 806 BRB II/III > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > > P: 215-573-3736 > F: 215-573-9004 > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Brian P. <bri...@in...> - 2007-10-05 01:46:46
|
I'll take a shot at auto-generating a schema from the OBO tomorrow. I'm curious to know if I'm just blowing smoke or not.. - Brian _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Thursday, October 04, 2007 5:17 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Option A, B, or C On 10/4/07, Brian Pratt <bri...@in...> wrote: It still kind of amazes me that this is a problem we're solving from scratch in a world with W3C schema in it, but I'm trying to play nice since the cvParam thing seems to have unstoppable inertia. I'd much prefer this: <InstrumentType name="LCQ Deca" accession="MS:1000554" /> - that's proper XML, to my mind, as opposed to merely valid XML, and it still leverages the power of the CV. Actually I would prefer that structure as well and asked on the list for folks to specifically outline places in the schema where this could happen: http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m7 6356fe5g3f81b5eff44ce2c6%40mail.gmail.com See the threads from 8/7 - 8/9 for the full discussion, but let me just put it out there that it is not too late to have these types of changes! That's what the public review process is for! I don't think we did a good enough job of communicating to folks that this type of typed CV structure was an option for schema change proposals. -angel |
From: Matt C. <mat...@va...> - 2007-10-05 01:39:12
|
Two potential problems with this structure: it drops either the value accession number or the category accession number, given that Brian suggested it I expect he intended the latter to be dropped and that the element name becomes the unique category name. It also eliminates the possibility of having synonyms for the category names, and we can't change the element/category name without breaking backward compatibility. I don't really mind about either of these problems, but I'm under the impression that others do mind. So what you're asking Angel is what places in the schema have a category cvParam that could be set in stone and not allowed to have synonym category names and thus converted into this structure instead? -Matt Angel Pizarro wrote: > On 10/4/07, *Brian Pratt* <bri...@in... > <mailto:bri...@in...>> wrote: > > It still kind of amazes me that this is a problem we're > solving from scratch in a world with W3C schema in it, but I'm > trying to > play nice since the cvParam thing seems to have unstoppable > inertia. I'd > much prefer this: > <InstrumentType name="LCQ Deca" accession="MS:1000554" /> > - that's proper XML, to my mind, as opposed to merely valid XML, > and it > still leverages the power of the CV. > > > Actually I would prefer that structure as well and asked on the list > for folks to specifically outline places in the schema where this > could happen: > > http://sourceforge.net/mailarchive/message.php?msg_name=e38f4b170708071310m76356fe5g3f81b5eff44ce2c6%40mail.gmail.com > > See the threads from 8/7 - 8/9 for the full discussion, but let me > just put it out there that it is not too late to have these types of > changes! That's what the public review process is for! I don't think > we did a good enough job of communicating to folks that this type of > typed CV structure was an option for schema change proposals. > > -angel |