You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Fredrik L. <Fre...@im...> - 2008-10-13 12:42:57
|
Hi Eric, Looks fine. One comment though: Could some of the fields be optional? 'Controller' doesn't always make sense for the Thermo files, and 'process' is maybe not so relevant for all Waters data (OK, if it hasn't been processed in MassLynx, process could be put to zero, but Ithink I would prefer to leave it out). Also, I would like to have a recommendation on what to write if some value for some reason not is known. This could be to leave out or to replace the number with a question mark. Worst option would be to put a value which is maybe not correct, in order to produce a 'valid' file. Thanks Fredrik Eric Deutsch skrev: > Hi Matt, right you are. Sorry I neglected to follow up. I believe the > proposal we agreed upon on to use this: > > Thermo: > nativeID="controller=0 scan=1" > nativeID="controller=0 scan=1243" > nativeID="controller=1 scan=1" > (where controller 0 is probably always the mass spec?) > > Waters: > nativeID="function=1 process=0 scan=1" > nativeID="function=1 process=0 scan=2" > nativeID="function=2 process=0 scan=1" > > WIFF: > nativeID="Sample=0 period=1 cycle=1 experiment=2" > nativeID="Sample=0 period=1 cycle=1 experiment=3" > nativeID="Sample=0 period=1 cycle=2 experiment=2" > nativeID="Sample=0 period=1 cycle=2 experiment=3" > > Can anyone provide any more other vendor/format examples or further comments > that should appear in the documentation of the above? > > Thanks, > Eric > > > > > > >> -----Original Message----- >> From: Matthew Chambers [mailto:mat...@va...] >> Sent: Friday, October 10, 2008 12:57 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] PSI-MSS WG call Monday >> >> Hi Eric, we were supposed to refine the nativeID format over the last >> two weeks. :) >> You had this in the last meeting minutes: >> + Eric will send out a summary of what formats should look like for all >> vendors and we'll refine >> >> Would you like me to put together the summary instead? I'd basically >> replace my format and examples with the key=value format and it'd be set >> unless you had some other ideas. >> >> -Matt >> >> >> Eric Deutsch wrote: >> >>> Hi everyone, this is a reminder that the next PSI-MSS WG call is >>> Monday October 13 at 9am PDT: >>> >>> >>> >> http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year= >> 2008&hour=17&min=0&sec=0&p1=136 >> >> <http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year >> =2008&hour=17&min=0&sec=0&p1=136> >> >>> + Germany: 08001012079 >>> >>> + Switzerland: 0800000860 >>> >>> + UK: 08081095644 >>> >>> + USA: 1-866-314-3683 >>> >>> Generic international: +44 2083222500 (UK number) >>> >>> access code: 297427 >>> >>> The agenda will be to review some recent discussions and review all >>> the aspects related to mzML to start making progress again on the >>> highest priorities >>> >>> Topics: >>> >>> - nativeID format >>> >>> - mzML support information table >>> >>> - MIAPE example document >>> >>> - Other example documents >>> >>> - CV >>> >>> - CV template updates to vendors >>> >>> - Documentation >>> >>> - Web site >>> >>> - Validator >>> >>> - WIFF converter >>> >>> - Manuscript >>> >>> - Next call perhaps Tuesday 8am so that Pierre-Alain can join? >>> >>> Thanks, >>> >>> Eric >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------ >>> >> - >> >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> >> challenge >> >>> Build the coolest Linux based applications with Moblin SDK & win great >>> >> prizes >> >>> Grand prize is a trip for two to an Open Source event anywhere in the >>> >> world >> >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Eric D. <ede...@sy...> - 2008-10-12 07:40:12
|
Hi Matt, right you are. Sorry I neglected to follow up. I believe the proposal we agreed upon on to use this: Thermo: nativeID="controller=0 scan=1" nativeID="controller=0 scan=1243" nativeID="controller=1 scan=1" (where controller 0 is probably always the mass spec?) Waters: nativeID="function=1 process=0 scan=1" nativeID="function=1 process=0 scan=2" nativeID="function=2 process=0 scan=1" WIFF: nativeID="Sample=0 period=1 cycle=1 experiment=2" nativeID="Sample=0 period=1 cycle=1 experiment=3" nativeID="Sample=0 period=1 cycle=2 experiment=2" nativeID="Sample=0 period=1 cycle=2 experiment=3" Can anyone provide any more other vendor/format examples or further comments that should appear in the documentation of the above? Thanks, Eric > -----Original Message----- > From: Matthew Chambers [mailto:mat...@va...] > Sent: Friday, October 10, 2008 12:57 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] PSI-MSS WG call Monday > > Hi Eric, we were supposed to refine the nativeID format over the last > two weeks. :) > You had this in the last meeting minutes: > + Eric will send out a summary of what formats should look like for all > vendors and we'll refine > > Would you like me to put together the summary instead? I'd basically > replace my format and examples with the key=value format and it'd be set > unless you had some other ideas. > > -Matt > > > Eric Deutsch wrote: > > > > Hi everyone, this is a reminder that the next PSI-MSS WG call is > > Monday October 13 at 9am PDT: > > > > > http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year= > 2008&hour=17&min=0&sec=0&p1=136 > > > <http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year > =2008&hour=17&min=0&sec=0&p1=136> > > > > + Germany: 08001012079 > > > > + Switzerland: 0800000860 > > > > + UK: 08081095644 > > > > + USA: 1-866-314-3683 > > > > Generic international: +44 2083222500 (UK number) > > > > access code: 297427 > > > > The agenda will be to review some recent discussions and review all > > the aspects related to mzML to start making progress again on the > > highest priorities > > > > Topics: > > > > - nativeID format > > > > - mzML support information table > > > > - MIAPE example document > > > > - Other example documents > > > > - CV > > > > - CV template updates to vendors > > > > - Documentation > > > > - Web site > > > > - Validator > > > > - WIFF converter > > > > - Manuscript > > > > - Next call perhaps Tuesday 8am so that Pierre-Alain can join? > > > > Thanks, > > > > Eric > > > > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------ > - > > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > > Build the coolest Linux based applications with Moblin SDK & win great > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the > world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-10-10 19:57:31
|
Hi Eric, we were supposed to refine the nativeID format over the last two weeks. :) You had this in the last meeting minutes: + Eric will send out a summary of what formats should look like for all vendors and we’ll refine Would you like me to put together the summary instead? I'd basically replace my format and examples with the key=value format and it'd be set unless you had some other ideas. -Matt Eric Deutsch wrote: > > Hi everyone, this is a reminder that the next PSI-MSS WG call is > Monday October 13 at 9am PDT: > > http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year=2008&hour=17&min=0&sec=0&p1=136 > <http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year=2008&hour=17&min=0&sec=0&p1=136> > > + Germany: 08001012079 > > + Switzerland: 0800000860 > > + UK: 08081095644 > > + USA: 1-866-314-3683 > > Generic international: +44 2083222500 (UK number) > > access code: 297427 > > The agenda will be to review some recent discussions and review all > the aspects related to mzML to start making progress again on the > highest priorities > > Topics: > > - nativeID format > > - mzML support information table > > - MIAPE example document > > - Other example documents > > - CV > > - CV template updates to vendors > > - Documentation > > - Web site > > - Validator > > - WIFF converter > > - Manuscript > > - Next call perhaps Tuesday 8am so that Pierre-Alain can join? > > Thanks, > > Eric > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Eric D. <ede...@sy...> - 2008-10-10 19:41:03
|
Hi everyone, this is a reminder that the next PSI-MSS WG call is Monday October 13 at 9am PDT: http://www.timeanddate.com/worldclock/fixedtime.html?day=13 <http://www.timeanddate.com/worldclock/fixedtime.html?day=13&month=10&year=2 008&hour=17&min=0&sec=0&p1=136> &month=10&year=2008&hour=17&min=0&sec=0&p1=136 + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to review some recent discussions and review all the aspects related to mzML to start making progress again on the highest priorities Topics: - nativeID format - mzML support information table - MIAPE example document - Other example documents - CV - CV template updates to vendors - Documentation - Web site - Validator - WIFF converter - Manuscript - Next call perhaps Tuesday 8am so that Pierre-Alain can join? Thanks, Eric |
From: Fredrik L. <Fre...@im...> - 2008-10-07 10:12:56
|
Hi All, Just a comment/question on the third use case (data-independent acquisition) from Pierre-Alain. How is this supposed to look like in the mzML file? Should the window center be listed in both the isolationWindow and selectedIon, like this: <precursor spectrumRef="S5"> <isolationWindow> <cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="800"/> <cvParam cvRef="MS" accession="MS:1000023" name="isolation width" value="50"/> </isolationWindow> <selectedIonList count="1"> <selectedIon> <cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="800"/> </selectedIon> </selectedIonList> <activation> <cvParam cvRef="MS" accession="MS:1000422" name="high-energy collision-induced dissociation" value=""/> <cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="35"/> </activation> </precursor> If we do some peak picking in the MS window you would probably list all the ions in the selectedIonList, and then the central mass in the window should not be there since it is maybe not a peak at all. It seems to me that the selectedIonList could be empty in this case, but this is not allowed, since the xsd specifies that at least one selectIon needs to be listed. Or maybe the central window m/z value should only be in selectedIon and not in isolationWindow? Fredrik Pierre-Alain Binz wrote: > Hi Yuri, > Here are the possible usecases (to complete Matt's comments) that were > needed to be considered: > > Multiple precursors might mean: > > 1) one detected precursor m/z with possibly more than one charge state > (in case the charge state is not clearly attributed, but a possible > range can be specified) > 2) more than one precursor ion in a given m/z window. Typically > obtained when a signal is selected for fragmentation, all possible > detected signals at m/z +/- a delta can be also assumed to be > fragmented at the same time (named also accidental CID) > 2a) special usecase is using high resolution precursor scan and > allowing for more than one isotope to be included in the calculation. > Search for instance 1000 and 1000.5 +/- 0.005Da instead of 1000 +/- > 0.8Da for a 2+ peptide (impact on the search space and keep use of > the high mass accuracy of the instrument) > 3) no discrete precursor m/z values, but a range of m/z. Specific case > where no precursor scan is used, but only a selection window. This is > done in Gas Phase Fractionation experiments, such as those published > by waters or Dave Goodlett. > > Pierre-Alain |
From: Yury R. <yu...@ma...> - 2008-10-06 11:19:10
|
Thanks to both Matthew Chambers and Pierre-Alain for the answers. "You can make some of the people happy all the time. You can make all of the people happy some of the time. But you can't make all the people happy all the time." ;) Yes, I can see that my remark could seem a bit rude. But I actually meant is that one producer of mzML file will use multiple precursors for multiple isolation windows, another - for compound spectrum. As a result we will end up supporting several mzML sub-formats - I think it would defintely contradict the purpose of a standard. Wouldn't it be better to specify reason for multiple precursors as some parameter? It is not only me, who wants to know how to interpret it. The "possible charge state" accession is something I definitely missed. It resolves one question perfectly. Yury Rozhek |
From: Pierre-Alain B. <pie...@is...> - 2008-10-04 09:37:10
|
Hi Yuri, Here are the possible usecases (to complete Matt's comments) that were needed to be considered: Multiple precursors might mean: 1) one detected precursor m/z with possibly more than one charge state (in case the charge state is not clearly attributed, but a possible range can be specified) 2) more than one precursor ion in a given m/z window. Typically obtained when a signal is selected for fragmentation, all possible detected signals at m/z +/- a delta can be also assumed to be fragmented at the same time (named also accidental CID) 2a) special usecase is using high resolution precursor scan and allowing for more than one isotope to be included in the calculation. Search for instance 1000 and 1000.5 +/- 0.005Da instead of 1000 +/- 0.8Da for a 2+ peptide (impact on the search space and keep use of the high mass accuracy of the instrument) 3) no discrete precursor m/z values, but a range of m/z. Specific case where no precursor scan is used, but only a selection window. This is done in Gas Phase Fractionation experiments, such as those published by waters or Dave Goodlett. Pierre-Alain Matthew Chambers wrote: > Hi Yury, > > Good questions. :) We should make this more clear in the specification. > > 1. I am fairly certain that multiple precursors indicate multiple > isolation windows (where each has a list of selected of ions); the > multiple isolation windows may be due to a fancy instrument setup for a > single acquisition or it may be due to multiple acquisitions being > summed/averaged together. > > 2. Each precursor has one isolation window, so the list of selected ions > should be one or more peaks occurring in that window. If multiple ions > are listed, it may be due to uncertain isotopic distinction or it may be > due to the isolation window including multiple species in a complex > sample, or it could be a combination of both. :) > > 3. Since multiple precursors will only happen with multiple isolation > windows or compound spectra (summed/averaged acquisitions), you won't > use several precursors to indicate ambiguous peak m/zs within a single > isolation window. This is underspecified though and it should be explicit. > > 4. Ambiguous charge states are handled with zero or more "possible > charge state" terms (MS:1000633). Certain charge states are handled with > a single "charge state" term (MS:1000041). > > 5. "You can make some of the people happy all the time. You can make all > of the people happy some of the time. But you can't make all the people > happy all the time." ;) > > -Matt > > > Yury Rozhek wrote: > >> Hello all! >> >> I am working on a importing module for mzML data format into our Mascot >> and need some clarification. >> >> I have searched through the archives but found only a one-year old >> discussion about this, from which it is not quite clear how >> <precursorList> with multiple precursors and multiple <selectedIon> is >> going to be used in a spectrum with ms-level "2". >> >> 1. What does it mean to have several precursors? >> >> 2. What does it mean to have several selectedIon-s? >> >> 3. Is there clear demarkation between cases when one spectrum has >> several precursors because precursor m/z cannot be reliably detected and >> cases when several spectra are represented by one <spectrum>-element. >> >> Also not clear how ambiguous charge state would be specified for a >> precursor - as several "cvParam"s within one <selectedIon> element? >> >> I am not sure if it is relevant, but the fact that mzML allows to put >> several precursors into one spectrum for whatever reason every producer >> find reasonable doesn't make me happy.... >> >> >> Thanks >> >> Yury Rozhek >> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Matthew C. <mat...@va...> - 2008-10-03 18:38:52
|
Hi Yury, Good questions. :) We should make this more clear in the specification. 1. I am fairly certain that multiple precursors indicate multiple isolation windows (where each has a list of selected of ions); the multiple isolation windows may be due to a fancy instrument setup for a single acquisition or it may be due to multiple acquisitions being summed/averaged together. 2. Each precursor has one isolation window, so the list of selected ions should be one or more peaks occurring in that window. If multiple ions are listed, it may be due to uncertain isotopic distinction or it may be due to the isolation window including multiple species in a complex sample, or it could be a combination of both. :) 3. Since multiple precursors will only happen with multiple isolation windows or compound spectra (summed/averaged acquisitions), you won't use several precursors to indicate ambiguous peak m/zs within a single isolation window. This is underspecified though and it should be explicit. 4. Ambiguous charge states are handled with zero or more "possible charge state" terms (MS:1000633). Certain charge states are handled with a single "charge state" term (MS:1000041). 5. "You can make some of the people happy all the time. You can make all of the people happy some of the time. But you can't make all the people happy all the time." ;) -Matt Yury Rozhek wrote: > Hello all! > > I am working on a importing module for mzML data format into our Mascot > and need some clarification. > > I have searched through the archives but found only a one-year old > discussion about this, from which it is not quite clear how > <precursorList> with multiple precursors and multiple <selectedIon> is > going to be used in a spectrum with ms-level "2". > > 1. What does it mean to have several precursors? > > 2. What does it mean to have several selectedIon-s? > > 3. Is there clear demarkation between cases when one spectrum has > several precursors because precursor m/z cannot be reliably detected and > cases when several spectra are represented by one <spectrum>-element. > > Also not clear how ambiguous charge state would be specified for a > precursor - as several "cvParam"s within one <selectedIon> element? > > I am not sure if it is relevant, but the fact that mzML allows to put > several precursors into one spectrum for whatever reason every producer > find reasonable doesn't make me happy.... > > > Thanks > > Yury Rozhek > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Yury R. <yu...@ma...> - 2008-10-03 17:49:36
|
Hello all! I am working on a importing module for mzML data format into our Mascot and need some clarification. I have searched through the archives but found only a one-year old discussion about this, from which it is not quite clear how <precursorList> with multiple precursors and multiple <selectedIon> is going to be used in a spectrum with ms-level "2". 1. What does it mean to have several precursors? 2. What does it mean to have several selectedIon-s? 3. Is there clear demarkation between cases when one spectrum has several precursors because precursor m/z cannot be reliably detected and cases when several spectra are represented by one <spectrum>-element. Also not clear how ambiguous charge state would be specified for a precursor - as several "cvParam"s within one <selectedIon> element? I am not sure if it is relevant, but the fact that mzML allows to put several precursors into one spectrum for whatever reason every producer find reasonable doesn't make me happy.... Thanks Yury Rozhek |
From: Wilfred H T. <Ta...@ap...> - 2008-09-29 18:35:50
|
I will be out of the office starting 09/28/2008 and will not return until 10/01/2008. |
From: Eric D. <ede...@sy...> - 2008-09-29 17:28:35
|
Hi everyone, here are the notes from today's call: Present: Eric, Lennart, Darren, Matt, Randy, Jim - nativeID format + Note that nativeID is required in the index + We seem in agreement that we should not try to scrimp, and so we should make the format like "sample=1 period=2 cycle=123 experiment=4" + Eric will send out a summary of what formats should look like for all vendors and we'll refine - Imaging/separate data file issue + imzML contacted Randy and have an example file and are concerned about XML verbosity and secondary data file + Randy will send out the contact information and we will try to continue the discussion in the future - Validator + Eric sent some big files to Lennart + Lennart wants to make some performance enhancements + Need to flesh out all the rules in the rules file + There is a problem with the inclusion of the unit CV + Matt says that the on-line validator is a blacklist-based validation + We had once talked about the validator should be whitelist base + Eric proposes that the validation goes like: = Any term that is controlled by the mapping file may ONLY appear in the specified location = Any term that it NOT controlled anywhere by the mapping file may appear anywhere with a mild warning + Need to also include the nativeID validation + Eric will try to find the old list of things the validator should check and email to Lennart who will publish - WIFF converter + ABI is working on a WIFF -> mzML converter. Matt and Eric are testing an alpha version - Next PSI meeting in Turku Apr 27-29 + Describe proposed activities: CV, software promotion, validation - Manuscript + Waiting for MS CV. Needs to be in OLS first. And need to resolve the unit ontology. + Aim to submit by end of the year - Unit Ontology + We need to add some terms for mass spec and send to maintainer + We need to change some funny characters in there that are causing our software some problems + Maybe we can just fix the problem by converting unit.obo to proper Unicode? Lennart will try in next two weeks. - Next call + In two weeks: Oct 13 9am. Eric will consult with PAB about possibly 8am so he can make it. Not discussed: - mzML support information table - MIAPE example document - Other example documents - CV - CV template updates to vendors - Documentation - Web site _____ From: Eric Deutsch [mailto:ede...@sy...] Sent: Wednesday, September 24, 2008 11:25 PM To: 'Mass spectrometry standard development' Cc: 'Eric Deutsch'; 'Natalie Tasman' Subject: PSI-MSS WG call Monday Hi everyone, after a bit of a hiatus, the PSI Mass Spectrometry Standards Working Group will resume with calls this Monday *September* 29 at 9am PDT: http://www.timeanddate.com/worldclock/fixedtime.html?day=29 <http://www.timeanddate.com/worldclock/fixedtime.html?day=29&month=9&year=20 08&hour=17&min=0&sec=0&p1=136> &month=9&year=2008&hour=17&min=0&sec=0&p1=136 + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to review some recent discussions and review all the aspects related to mzML to start making progress again on the highest priorities Topics: - nativeID format - mzML support information table - MIAPE example document - Other example documents - CV - CV template updates to vendors - Documentation - Web site - Validator - WIFF converter - Manuscript - Turku - Next call Thanks, Eric |
From: Eric D. <ede...@sy...> - 2008-09-25 06:25:38
|
Hi everyone, after a bit of a hiatus, the PSI Mass Spectrometry Standards Working Group will resume with calls this Monday *September* 29 at 9am PDT: http://www.timeanddate.com/worldclock/fixedtime.html?day=29 <http://www.timeanddate.com/worldclock/fixedtime.html?day=29&month=9&year=20 08&hour=17&min=0&sec=0&p1=136> &month=9&year=2008&hour=17&min=0&sec=0&p1=136 + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to review some recent discussions and review all the aspects related to mzML to start making progress again on the highest priorities Topics: - nativeID format - mzML support information table - MIAPE example document - Other example documents - CV - CV template updates to vendors - Documentation - Web site - Validator - WIFF converter - Manuscript - Turku - Next call Thanks, Eric |
From: Eric D. <ede...@sy...> - 2008-09-25 06:18:55
|
Hi everyone, after a bit of a hiatus, the PSI Mass Spectrometry Standards Working Group will resume with calls this Monday July 29 at 9am PDT: http://www.timeanddate.com/worldclock/fixedtime.html?day=29 <http://www.timeanddate.com/worldclock/fixedtime.html?day=29&month=9&year=20 08&hour=17&min=0&sec=0&p1=136> &month=9&year=2008&hour=17&min=0&sec=0&p1=136 + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to review some recent discussions and review all the aspects related to mzML to start making progress again on the highest priorities Topics: - nativeID format - mzML support information table - MIAPE example document - Other example documents - CV - CV template updates to vendors - Documentation - Web site - Validator - WIFF converter - Manuscript - Turku - Next call Thanks, Eric |
From: Matthew C. <mat...@va...> - 2008-09-24 14:31:30
|
With labeled I would argue in favor of using the full names because it's less ambiguous and not arbitrary (i.e. should it be 'per' or 'peri', 'cyc' or 'cycl', 'exp' or 'expe' or 'expr'). If we're going to sacrifice ease of machine parsing for human readability, we might as well make it pleasant to read, e.g.: "sample 1, period 2, cycle 123, experiment 4" Since the CV terms will specify the axis names like: [Term] id: MS:x name: WIFF spectrum identifier def: "sample=xsd:nonNegativeInteger,period=xsd:positiveInteger,cycle=xsd:positiveInteger,experiment=xsd:positiveInteger" [PSI:MS] is_a: MS:x ! native spectrum identifier Tools can do semantic validation of both axis names and axis values directly from the term definition. Since more than one nativeID type should never be in the same run, it's impossible to use a single regex to do semantic validation like you described. It is necessary to split it up between native ID specification terms. -Matt Fredrik Levander wrote: > I believe the main usage of the nativeID will be when you want to have > look at the raw spectrum/scan in MS vendor software. If the nativeID is > then clear enough to allow this quickly it would be good. I therefore > prefer the A option (three-four letter words), even if mzL is already > verbose. > > It could also be validated using (a not so pretty) regex, which can also > be used to retrieve the values, for example: > "((func(\d+))?,?(scan(\d+)),?(proc(\d+))?,?(con(\d+))?)|((samp(\d+)),(per(\d+)),(cyc(\d+)),(exp(\d+))) > If matching (=valid nativeID), Thermo and Waters identifiers are in the > first capturing group - with scan as required, and function, process and > controller as optional. > Second capture group contains the scan+scan number, third contains just > the scan number. > WIFF idenifier is in another capture group, in this example sample, > period, cycle and experiment are all required. > > Valid nativeIDs would include: "scan1", "scan1,con0", "func1,scan1" and > "samp1,per2,cyc1,exp1" > but not "func1", "scan1,func1" or "s1" > > New formats can be added as required with another | (or). > > One could of course split into one regex per native id format, although > validation is less straightforward then. > > Regards > > Fredrik > >> I think we're all pretty close on the current topic. If it will be an >> abbreviated labeling syntax, one of you proponents of the idea need to >> propose the criteria we would use to select the abbreviated labels (and >> probably examples for the major sources, e.g. Thermo RAW, Waters RAW, >> WIFF, MGF, mzData, mzXML, and possibly DTA and PKL). >> >> I again have to plead for leaving the axis names in the header in some >> form instead of in the attributes. It doesn't make the attributes MUCH >> harder to parse, but it does make it somewhat harder and I still don't >> see the need when the string id is right there for human reading pleasure. >> >> -Matt >> |
From: Fredrik L. <Fre...@im...> - 2008-09-24 13:31:07
|
I believe the main usage of the nativeID will be when you want to have look at the raw spectrum/scan in MS vendor software. If the nativeID is then clear enough to allow this quickly it would be good. I therefore prefer the A option (three-four letter words), even if mzL is already verbose. It could also be validated using (a not so pretty) regex, which can also be used to retrieve the values, for example: "((func(\d+))?,?(scan(\d+)),?(proc(\d+))?,?(con(\d+))?)|((samp(\d+)),(per(\d+)),(cyc(\d+)),(exp(\d+))) If matching (=valid nativeID), Thermo and Waters identifiers are in the first capturing group - with scan as required, and function, process and controller as optional. Second capture group contains the scan+scan number, third contains just the scan number. WIFF idenifier is in another capture group, in this example sample, period, cycle and experiment are all required. Valid nativeIDs would include: "scan1", "scan1,con0", "func1,scan1" and "samp1,per2,cyc1,exp1" but not "func1", "scan1,func1" or "s1" New formats can be added as required with another | (or). One could of course split into one regex per native id format, although validation is less straightforward then. Regards Fredrik > I think we're all pretty close on the current topic. If it will be an > abbreviated labeling syntax, one of you proponents of the idea need to > propose the criteria we would use to select the abbreviated labels (and > probably examples for the major sources, e.g. Thermo RAW, Waters RAW, > WIFF, MGF, mzData, mzXML, and possibly DTA and PKL). > > I again have to plead for leaving the axis names in the header in some > form instead of in the attributes. It doesn't make the attributes MUCH > harder to parse, but it does make it somewhat harder and I still don't > see the need when the string id is right there for human reading pleasure. > > -Matt > |
From: Matthew C. <mat...@va...> - 2008-09-23 17:56:50
|
Profile data is totally impractical with text data point storage, and as Eric says, it would also be impractical to have more than one way to do it (at least within one schema), so the obvious choice is base64. I think we're all pretty close on the current topic. If it will be an abbreviated labeling syntax, one of you proponents of the idea need to propose the criteria we would use to select the abbreviated labels (and probably examples for the major sources, e.g. Thermo RAW, Waters RAW, WIFF, MGF, mzData, mzXML, and possibly DTA and PKL). I again have to plead for leaving the axis names in the header in some form instead of in the attributes. It doesn't make the attributes MUCH harder to parse, but it does make it somewhat harder and I still don't see the need when the string id is right there for human reading pleasure. -Matt Eric Deutsch wrote: > Hi Mike, yes, this topic has been raised before, and it is just an issue on > which we cannot all agree. There are good arguments on both sides and in the > end the decision was to stick with the base64 encoding. Above all else, we > didn't want to have more than one way to do it, so we had to pick one. > > The current topic may also be one on which we cannot all agree, but we will > pick one way based on the feedback we get and move on. > > Regards, > Eric > > > >> -----Original Message----- >> From: Mike Coleman [mailto:tu...@gm...] >> Sent: Monday, September 22, 2008 3:54 PM >> To: Mass spectrometry standard development >> Cc: Eric Deutsch >> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >> >> On Mon, Sep 22, 2008 at 1:01 PM, Eric Deutsch >> <ede...@sy...> wrote: >> >>> ... I can envision the day when I want to scan >>> through an mzML file as a text file to find a particular scan as part of >>> >> an >> >>> attempt to figure out "what went wrong" somewhere. I don't feel strongly >>> about this, just seems like a good idea. >>> >>> What do others think? >>> >> I agree that the capability for users/developers to scan though the >> spectrum file as a text file is a good idea. I'd go further and say >> that it would make a significant difference in terms of reliability >> and ease of development of the software that reads and writes these >> files. >> >> I do think, though, that the non-textual encoding of peak and >> intensity information detracts significantly from this capability. I >> still wonder if an alternate, wholly textual encoding would be useful, >> for shops that wish to avoid binary (or effectively binary) formats. >> >> Mike >> |
From: Eric D. <ede...@sy...> - 2008-09-22 23:41:30
|
Hi Mike, yes, this topic has been raised before, and it is just an issue on which we cannot all agree. There are good arguments on both sides and in the end the decision was to stick with the base64 encoding. Above all else, we didn't want to have more than one way to do it, so we had to pick one. The current topic may also be one on which we cannot all agree, but we will pick one way based on the feedback we get and move on. Regards, Eric > -----Original Message----- > From: Mike Coleman [mailto:tu...@gm...] > Sent: Monday, September 22, 2008 3:54 PM > To: Mass spectrometry standard development > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] Nailing down NativeID > > On Mon, Sep 22, 2008 at 1:01 PM, Eric Deutsch > <ede...@sy...> wrote: > > ... I can envision the day when I want to scan > > through an mzML file as a text file to find a particular scan as part of > an > > attempt to figure out "what went wrong" somewhere. I don't feel strongly > > about this, just seems like a good idea. > > > > What do others think? > > I agree that the capability for users/developers to scan though the > spectrum file as a text file is a good idea. I'd go further and say > that it would make a significant difference in terms of reliability > and ease of development of the software that reads and writes these > files. > > I do think, though, that the non-textual encoding of peak and > intensity information detracts significantly from this capability. I > still wonder if an alternate, wholly textual encoding would be useful, > for shops that wish to avoid binary (or effectively binary) formats. > > Mike |
From: Mike C. <tu...@gm...> - 2008-09-22 22:53:50
|
On Mon, Sep 22, 2008 at 1:01 PM, Eric Deutsch <ede...@sy...> wrote: > ... I can envision the day when I want to scan > through an mzML file as a text file to find a particular scan as part of an > attempt to figure out "what went wrong" somewhere. I don't feel strongly > about this, just seems like a good idea. > > What do others think? I agree that the capability for users/developers to scan though the spectrum file as a text file is a good idea. I'd go further and say that it would make a significant difference in terms of reliability and ease of development of the software that reads and writes these files. I do think, though, that the non-textual encoding of peak and intensity information detracts significantly from this capability. I still wonder if an alternate, wholly textual encoding would be useful, for shops that wish to avoid binary (or effectively binary) formats. Mike |
From: Darren K. <Dar...@cs...> - 2008-09-22 21:19:56
|
Hi all, I think human readability is most definitely a requirement of mzML (and its predecessors mzData and mzXML), even if this was not explicitly stated. One of the benefits of using XML is that it is easy to look at it and immediately understand the meaning of the data. If conciseness had been deemed to be more important than readability, an open binary format would be much more appropriate for representing MS data. I agree with Eric that we did a good job in making mzML human readable, and I think we should make the nativeIDs human readable as well. There is no cost to this -- the space increase is negligible compared to the base64 bloat and CV params, and it's not any harder to parse or write. Darren On Sep 22, 2008, at 11:01 AM, Eric Deutsch wrote: I'm not sure we ever set this as a requirement. But, I think we did a nice job making mzML developer-readable in general and it seems pleasing to me at least to extend that to nativeID. I can envision the day when I want to scan through an mzML file as a text file to find a particular scan as part of an attempt to figure out "what went wrong" somewhere. I don't feel strongly about this, just seems like a good idea. What do others think? > -----Original Message----- > From: Matthew Chambers [mailto:mat...@va...] > Sent: Monday, September 22, 2008 7:24 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Nailing down NativeID > > Where did the requirement (in a design sense) of human readability for > nativeID come from? I don't see the need for that requirement, and > in the > absence of it, conciseness seems most appropriate. Even in the extreme > case of a fully verbose identifier, e.g. > "sample1,period1,cycle123,experiment5", someone who is not familiar > with > the WIFF format and its ids will have no idea what those words mean. > In > the worst case scenario it will just confuse human readers. On the > other > hand, someone who is familiar with the WIFF format knows to expect > the ids > to occur in a certain order and what each id means. > > As a compromise, we can recommend that implementors place an XML > comment > containing the nativeID definition used for a file in the header, > e.g.: > <!-- WIFF spectrum identifier: sample > number=xsd:nonNegativeInteger,period > number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment > number=xsd:positiveInteger --> > or more readable without types: > <!-- WIFF spectrum identifier: sample number,period number,cycle > number,experiment number --> > > Or we can make the nativeID term name itself like "WIFF spectrum > identifier: sample number,period number,cycle number,experiment > number" > and then only give the types in the definition, like: > > [Term] > id: MS:x > name: WIFF spectrum identifier (sample number, period number, cycle > number, experiment number) > def: "sample number=xsd:nonNegativeInteger,period > number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment > number=xsd:positiveInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > > That way the nativeID specifier in the file header would always > provide > the names. > > -Matt > > > > Eric Deutsch wrote: >> Hi everyone, indeed this is a good discussion, thanks for bringing it > back >> to the fore. I can't do this coming Monday, but let's have another > telecon >> on Monday Sep 29 at the usual time, 9am PDT. >> >> So it seems we have three proposals on the table: >> >> A) Thermo: "con0,scan1" or WIFF: "sam0,per1,cyc1,exp2" >> >> B) Thermo: "C0,S1" or WIFF: "M0,P1,Y1,E2" >> >> C) Thermo: "0,1" or WIFF: "0,1,1,2" >> >> Or close variants thereof. Shall we hold open the floor for a little > more >> debate and then take a poll? >> >> My opinion is that A is the clearest to look at and while a little > verbose, >> it is embedded in XML and thus a drop in the bucket. C is very >> concise, > but >> not easily human interpretable. Conciseness doesn't seem like a great >> advantage here. B seems undesirable to me as there are multiple >> possible >> words beginning with P and S, so that just creates confusion. So I >> kinda >> like A myself. >> >> What do y'all think? >> >> Thanks, >> Eric >> >> >> >> >>> -----Original Message----- >>> From: psi...@li... >>> [mailto:psi...@li...] On >>> Behalf Of Matt Chambers >>> Sent: Thursday, September 18, 2008 4:45 PM >>> To: Mass spectrometry standard development >>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >>> >>> I prefer nativeIDs without the labels. Labels work better and can be >>> verbose in the arbitrary string 'id'; nativeID is provided >>> primarily for >>> machine readability and guaranteed formatting so to me it just makes >>> more sense to "KISS" (keep it small and simple). :) >>> >>> Since the two types of ids co-exist, human interpretation of the >>> nativeID is not an issue. >>> >>> This is good discussion though, we just need more of it - even >>> it's a >>> simple assent to the proposal (or the alternatives). :) >>> >>> Thanks, >>> -Matt >>> >>> >>> Darren Kessner wrote: >>> >>>> I think Fredrik has good points, and I like his idea of >>>> >>> using short >>> >>>> labels. >>>> >>>> An alternative to consider is 3-4 letter abbreviations >>>> >>> (using Matt's >>> >>>> examples): >>>> >>>> Thermo: >>>> "con0 scan1" >>>> "scan2" >>>> >>>> Waters: >>>> "fun1 proc0 scan1" >>>> >>>> WIFF: >>>> "sam0 per1 cyc1 exp2" >>>> >>>> >>>> Darren >>>> >>>> >>>> On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: >>>> >>>> >>>> >>>>> Hi Matt, >>>>> >>>>> I agree that the Native ID is a very important feature of >>>>> >>> the format >>> >>>>> and >>>>> that it needs to be settled. Your solution is elegant, I >>>>> >>> can see two >>> >>>>> disadvantages though: >>>>> 1) It is not straightforward to intepret the nativeID by visual >>>>> inspection, since you need to look in the CV to find out >>>>> >>> what order >>> >>>>> the >>>>> numbers are in. >>>>> 2) If the number in one axis is unknown or irrelevant for >>>>> >>> the setup, >>> >>>>> it >>>>> could be a problem to have it as required. One could imagine just >>>>> specifying an empty field instead of a number in that situation >>>>> though. >>>>> >>>>> An alternative is to have reserved characters in the native id: >>>>> S = scan >>>>> F = function >>>>> C = controller >>>>> P = process >>>>> Cy (or maybe Y) = Cycle >>>>> E = Experiment >>>>> Pe = Period >>>>> Other reserved letters can be added as needed. >>>>> >>>>> Then one can specify these as required for the instrumental setup. >>>>> Scan 1 would be "S1" >>>>> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", >>>>> >>> the later if >>> >>>>> comma separation is wanted. >>>>> If a certain order of the axes is wanted this can be >>>>> >>> imposed by regex. >>> >>>>> A problem with this solution could be if an axis needs to contain >>>>> letters instead of numbers, but it is doable, at least with comma >>>>> separation. >>>>> >>>>> A combination of the CV approach and initiating letters >>>>> >>> could maybe >>> >>>>> also >>>>> be an alternative: >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Waters RAW spectrum identifier >>>>> def: "F:function number=xsd:positiveInteger (optional),P:process >>>>> number=xsd:nonNegativeInteger (optional),S:scan >>>>> number=xsd:positiveInteger" >>>>> >>>>> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" >>>>> >>>>> It would be good to have some input on what is required to report >>>>> for the rest of the vendor instruments too, but I think the >>>>> nativeID format should be settled soon. >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Matthew Chambers skrev: >>>>> >>>>> >>>>>> It's been 4 months since we released the format and we >>>>>> >>> still can't >>> >>>>>> point >>>>>> implementors to documentation specifying what nativeIDs >>>>>> >>> must look >>> >>>>>> like. >>>>>> Can we please comment on my proposal or get other proposals to >>>>>> discuss? >>>>>> I am not averse to initially leaving out the terms that I >>>>>> >>> couldn't >>> >>>>>> come >>>>>> up with well-defined formats for (Bruker, PKL, ABI >>>>>> >>> Oracle, Shimadzu). >>> >>>>>> -Matt >>>>>> >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >>>>>> Date: Tue, 22 Jul 2008 21:28:34 -0500 >>>>>> From: Matt Chambers <mat...@va...> >>>>>> Reply-To: Mass spectrometry standard development >>>>>> <psi...@li...> >>>>>> To: Mass spectrometry standard development >>>>>> <psi...@li...> >>>>>> References: <488...@va...> >>>>>> >>>>>> >>> <5BE...@he...> >>> >>>>>> >>>>>> Hi Eric, >>>>>> >>>>>> Of course, sorry I should have realized that the axis >>>>>> >>> name concept >>> >>>>>> would >>>>>> confuse matters. The axis names are just there so that a machine >>>>>> reading >>>>>> the format specification can associate each comma >>>>>> >>> delimited section >>> >>>>>> (what I'm calling an "axis") with a logical name. >>>>>> >>>>>> Thermo: >>>>>> 0,1 (controller 0, scan 1) >>>>>> 0,2 >>>>>> 0,3 >>>>>> 1,1 (controller 1, scan 1) >>>>>> >>>>>> Waters: >>>>>> 1,0,1 (function 1, process 0, scan 1) >>>>>> 1,0,2 >>>>>> 1,0,3 >>>>>> 2,0,1 (function 2, process 0, scan 1) >>>>>> 2,0,2 >>>>>> 2,0,3 >>>>>> >>>>>> WIFF: >>>>>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) >>>>>> 0,1,1,3 >>>>>> 0,1,2,2 >>>>>> 0,1,2,3 >>>>>> 0,1,2,4 >>>>>> 0,1,3,2 >>>>>> 0,1,3,3 >>>>>> 0,1,3,2 >>>>>> 0,1,4,2 >>>>>> 1,1,1,2 >>>>>> 1,1,1,3 >>>>>> >>>>>> When a machine reads the WIFF definition, it will know that the >>>>>> fields >>>>>> mean (in order) "sample #", "period #", "cycle #", >>>>>> >>> "experiment #". >>> >>>>>> The >>>>>> detailed meaning of those names won't be covered by the format >>>>>> definition, but it's conceivable that we define those names in >>>>>> detail as >>>>>> separate CV terms. Remember the main idea for nativeID is to >>>>>> map a >>>>>> spectrum back to a source file in a way that is more >>>>>> >>> intuitive than a >>> >>>>>> simple index, so being able to use them to look up the >>>>>> >>> spectrum via a >>> >>>>>> native interface is important. >>>>>> >>>>>> I think we can safely require that the nativeIDs always >>>>>> >>> use all the >>> >>>>>> fields even if for an entire run all of a particular axis >>>>>> >>> has the >>> >>>>>> same >>>>>> value. For example, in Thermo data the controller number is >>>>>> almost >>>>>> always going to be the number corresponding with the MS >>>>>> controller >>>>>> (although the actual number is not guaranteed to be 0). >>>>>> >>> For backwards >>> >>>>>> compatibility with tools which expect Thermo ids to be >>>>>> >>> scan numbers >>> >>>>>> with >>>>>> an implicit assumption about the controller, it is very >>>>>> >>> reasonable to >>> >>>>>> require those tools to simply parse the id. Parsing a >>>>>> >>> comma-delimited >>> >>>>>> pair is far easier than all the other crap one must do to >>>>>> >>> get proper >>> >>>>>> mzML support. ;) In particular for you Eric and other TPP >>>>>> >>> users, the >>> >>>>>> RAMP adapter that pwiz uses will pass only the scan >>>>>> >>> number (and make >>> >>>>>> sure the spectrum is a mass spectrum). >>>>>> >>>>>> -Matt >>>>>> >>>>>> >>>>>> Eric Deutsch wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Matt, thanks, this looks well thought out, although I'm not >>>>>>> sure I >>>>>>> fully understand the syntax you're proposing. Can you >>>>>>> >>> provide one >>> >>>>>>> or two >>>>>>> examples of each type? >>>>>>> >>>>>>> Thanks! >>>>>>> Eric >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: psi...@li... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> [mailto:psidev-ms-dev- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> bo...@li...] On Behalf Of Matthew Chambers >>>>>>>> Sent: Tuesday, July 22, 2008 3:15 PM >>>>>>>> To: Mass spectrometry standard development >>>>>>>> Subject: [Psidev-ms-dev] Nailing down NativeID >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I think it's overdue that we get this part of mzML formally >>>>>>>> specified >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> - >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> at least for the vendors and generic formats. I am proposing a >>>>>>>> draft >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> of >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> nativeID formats, the place to put the formats in the >>>>>>>> >>> specification >>> >>>>>>>> documents, and to have mzML instance documents >>>>>>>> >>> explicitly reference >>> >>>>>>>> >>>>>>>> >>>>>>> the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> format they are using. This explicit reference should >>>>>>>> >>> be required >>> >>>>>>>> for >>>>>>>> semantic validation, but I'd also recommend that mzML >>>>>>>> >>> readers that >>> >>>>>>>> >>>>>>>> >>>>>>> don't >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> find or ignore the nativeID format term specified >>>>>>>> >>> simply treat the >>> >>>>>>>> nativeID as a free string (rendering it pretty useless, but at >>>>>>>> least >>>>>>>> there would be a defined way to handle it). The terms would be >>>>>>>> placed >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> in >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> the fileContent element to define the format for all >>>>>>>> >>> nativeIDs in >>> >>>>>>>> the >>>>>>>> file. >>>>>>>> >>>>>>>> I propose that the nativeID formats become CV terms, >>>>>>>> >>> and that the >>> >>>>>>>> term >>>>>>>> definitions define the formats unambiguously in a machine- >>>>>>>> readable way >>>>>>>> that a semantic validator can use to validate the >>>>>>>> >>> nativeIDs. I >>> >>>>>>>> will >>>>>>>> list my format drafts in OBO format. Each specific native >>>>>>>> format >>>>>>>> definition is a comma-delimited list of key-value pairs, where >>>>>>>> the key >>>>>>>> is the axis name (e.g. "scan number") and the value >>>>>>>> >>> specifies the >>> >>>>>>>> >>>>>>>> >>>>>>> format >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> of the axis in one of two ways: >>>>>>>> 1) a Perl-style regular expression that can provide semantic/ >>>>>>>> logical >>>>>>>> choices for strings (e.g. "controller type" can be >>>>>>>> >>> either "MS" or >>> >>>>>>>> >>>>>>>> >>>>>>> "PDA" >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> or "UV" etc.) >>>>>>>> 2) an XSD type that can specify unrestricted strings or >>>>>>>> >>> a numeric >>> >>>>>>>> type >>>>>>>> (possibly with semantic restrictions) >>>>>>>> >>>>>>>> I didn't actually need to use a regex for any of the >>>>>>>> >>> formats below, >>> >>>>>>>> >>>>>>>> >>>>>>> but >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I can see their usefulness. For example, they would be >>>>>>>> >>> needed if >>> >>>>>>>> I'm >>>>>>>> wrong about Xcalibur and it makes more sense for Thermo >>>>>>>> >>> spectra >>> >>>>>>>> to use >>>>>>>> controller names instead of controller numbers. >>>>>>>> >>>>>>>> Obviously the syntax of the format definitions is flexible if >>>>>>>> people >>>>>>>> have better ideas (ideally one that could combine the power of >>>>>>>> regex >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> and >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> XSD; "infinite cosmic power, itty bitty living space!"). >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: native spectrum identifier >>>>>>>> def: "References a spectrum in a native (non-mzML) >>>>>>>> >>> spectrum source >>> >>>>>>>> according to a strict format. The format is dependent >>>>>>>> >>> on the type >>> >>>>>>>> of >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> spectra source." [PSI:MS] >>>>>>>> is_a: MS:1000524 ! data file content >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: native chromatogram identifier >>>>>>>> def: "References a chromatogram in a native (non-mzML) >>>>>>>> >>> chromatogram >>> >>>>>>>> source according to a strict format. The format is >>>>>>>> >>> dependent on the >>> >>>>>>>> >>>>>>>> >>>>>>> type >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> of the chromatogram source." [PSI:MS] >>>>>>>> is_a: MS:1000524 ! data file content >>>>>>>> ! note: I don't have any instances of native chromatogram >>>>>>>> identifiers, >>>>>>>> but I can conceive of the possibilities! >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: Thermo RAW spectrum identifier >>>>>>>> def: "controller type=xsd:nonNegativeInteger,scan >>>>>>>> number=xsd:positiveInteger" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note to Jim: apparently, Xcalibur can handle multiple >>>>>>>> controllers of >>>>>>>> the same type, so is a choice between strings still >>>>>>>> appropriate? >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: Waters RAW spectrum identifier >>>>>>>> def: "function number=xsd:positiveInteger,process >>>>>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> [PSI:MS] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note: is process number ever non-zero? >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: WIFF spectrum identifier >>>>>>>> def: "sample number=xsd:nonNegativeInteger,period >>>>>>>> number=xsd:positiveInteger,cycle >>>>>>>> number=xsd:positiveInteger,experiment >>>>>>>> number=xsd:positiveInteger" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: ABI Oracle database spectrum identifier >>>>>>>> def: "" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note: need expertise here; alternatively, we could lump these >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> spectra >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> in with DTA/PKL nativeIDs (see below) when they are >>>>>>>> >>> extracted to >>> >>>>>>>> T2Ds >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: Bruker spectrum identifier >>>>>>>> def: "" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID >>>>>>>> spectrum >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> is >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> natively a single file, so that seems to make nativeID >>>>>>>> >>> irrelevant >>> >>>>>>>> and >>>>>>>> sourceFile[Ref] critical >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: Shimadzu spectrum identifier >>>>>>>> def: "" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note: need expertise here >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: MGF spectrum identifier >>>>>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note: TITLE attributes are optional, so the index >>>>>>>> >>> into the file >>> >>>>>>>> is >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> only reliable source (TITLE can be used for the string id if >>>>>>>> present) >>>>>>>> >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: mzData/mzXML/MS2 spectrum identifier >>>>>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> [Term] >>>>>>>> id: MS:x >>>>>>>> name: PKL/DTA spectrum identifier >>>>>>>> def: "" [PSI:MS] >>>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>>> ! note: like Bruker, a PKL or DTA could be standalone >>>>>>>> >>> so AFAIK the >>> >>>>>>>> >>>>>>>> >>>>>>> only >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> way to reliably reference it is via sourceFileRef >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>> -------------------------------------------------------------- >>> ----------- >>> This SF.Net email is sponsored by the Moblin Your Move >>> Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & >>> win great prizes >>> Grand prize is a trip for two to an Open Source event >>> anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> >> >> ------------------------------------------------------------------------ > - >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great > prizes >> Grand prize is a trip for two to an Open Source event anywhere in the > world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Eric D. <ede...@sy...> - 2008-09-22 18:29:52
|
I'm not sure we ever set this as a requirement. But, I think we did a nice job making mzML developer-readable in general and it seems pleasing to me at least to extend that to nativeID. I can envision the day when I want to scan through an mzML file as a text file to find a particular scan as part of an attempt to figure out "what went wrong" somewhere. I don't feel strongly about this, just seems like a good idea. What do others think? > -----Original Message----- > From: Matthew Chambers [mailto:mat...@va...] > Sent: Monday, September 22, 2008 7:24 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Nailing down NativeID > > Where did the requirement (in a design sense) of human readability for > nativeID come from? I don't see the need for that requirement, and in the > absence of it, conciseness seems most appropriate. Even in the extreme > case of a fully verbose identifier, e.g. > "sample1,period1,cycle123,experiment5", someone who is not familiar with > the WIFF format and its ids will have no idea what those words mean. In > the worst case scenario it will just confuse human readers. On the other > hand, someone who is familiar with the WIFF format knows to expect the ids > to occur in a certain order and what each id means. > > As a compromise, we can recommend that implementors place an XML comment > containing the nativeID definition used for a file in the header, e.g.: > <!-- WIFF spectrum identifier: sample number=xsd:nonNegativeInteger,period > number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment > number=xsd:positiveInteger --> > or more readable without types: > <!-- WIFF spectrum identifier: sample number,period number,cycle > number,experiment number --> > > Or we can make the nativeID term name itself like "WIFF spectrum > identifier: sample number,period number,cycle number,experiment number" > and then only give the types in the definition, like: > > [Term] > id: MS:x > name: WIFF spectrum identifier (sample number, period number, cycle > number, experiment number) > def: "sample number=xsd:nonNegativeInteger,period > number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment > number=xsd:positiveInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > > That way the nativeID specifier in the file header would always provide > the names. > > -Matt > > > > Eric Deutsch wrote: > > Hi everyone, indeed this is a good discussion, thanks for bringing it > back > > to the fore. I can't do this coming Monday, but let's have another > telecon > > on Monday Sep 29 at the usual time, 9am PDT. > > > > So it seems we have three proposals on the table: > > > > A) Thermo: "con0,scan1" or WIFF: "sam0,per1,cyc1,exp2" > > > > B) Thermo: "C0,S1" or WIFF: "M0,P1,Y1,E2" > > > > C) Thermo: "0,1" or WIFF: "0,1,1,2" > > > > Or close variants thereof. Shall we hold open the floor for a little > more > > debate and then take a poll? > > > > My opinion is that A is the clearest to look at and while a little > verbose, > > it is embedded in XML and thus a drop in the bucket. C is very concise, > but > > not easily human interpretable. Conciseness doesn't seem like a great > > advantage here. B seems undesirable to me as there are multiple possible > > words beginning with P and S, so that just creates confusion. So I kinda > > like A myself. > > > > What do y'all think? > > > > Thanks, > > Eric > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... > >> [mailto:psi...@li...] On > >> Behalf Of Matt Chambers > >> Sent: Thursday, September 18, 2008 4:45 PM > >> To: Mass spectrometry standard development > >> Subject: Re: [Psidev-ms-dev] Nailing down NativeID > >> > >> I prefer nativeIDs without the labels. Labels work better and can be > >> verbose in the arbitrary string 'id'; nativeID is provided > >> primarily for > >> machine readability and guaranteed formatting so to me it just makes > >> more sense to "KISS" (keep it small and simple). :) > >> > >> Since the two types of ids co-exist, human interpretation of the > >> nativeID is not an issue. > >> > >> This is good discussion though, we just need more of it - even it's a > >> simple assent to the proposal (or the alternatives). :) > >> > >> Thanks, > >> -Matt > >> > >> > >> Darren Kessner wrote: > >> > >>> I think Fredrik has good points, and I like his idea of > >>> > >> using short > >> > >>> labels. > >>> > >>> An alternative to consider is 3-4 letter abbreviations > >>> > >> (using Matt's > >> > >>> examples): > >>> > >>> Thermo: > >>> "con0 scan1" > >>> "scan2" > >>> > >>> Waters: > >>> "fun1 proc0 scan1" > >>> > >>> WIFF: > >>> "sam0 per1 cyc1 exp2" > >>> > >>> > >>> Darren > >>> > >>> > >>> On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > >>> > >>> > >>> > >>>> Hi Matt, > >>>> > >>>> I agree that the Native ID is a very important feature of > >>>> > >> the format > >> > >>>> and > >>>> that it needs to be settled. Your solution is elegant, I > >>>> > >> can see two > >> > >>>> disadvantages though: > >>>> 1) It is not straightforward to intepret the nativeID by visual > >>>> inspection, since you need to look in the CV to find out > >>>> > >> what order > >> > >>>> the > >>>> numbers are in. > >>>> 2) If the number in one axis is unknown or irrelevant for > >>>> > >> the setup, > >> > >>>> it > >>>> could be a problem to have it as required. One could imagine just > >>>> specifying an empty field instead of a number in that situation > >>>> though. > >>>> > >>>> An alternative is to have reserved characters in the native id: > >>>> S = scan > >>>> F = function > >>>> C = controller > >>>> P = process > >>>> Cy (or maybe Y) = Cycle > >>>> E = Experiment > >>>> Pe = Period > >>>> Other reserved letters can be added as needed. > >>>> > >>>> Then one can specify these as required for the instrumental setup. > >>>> Scan 1 would be "S1" > >>>> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", > >>>> > >> the later if > >> > >>>> comma separation is wanted. > >>>> If a certain order of the axes is wanted this can be > >>>> > >> imposed by regex. > >> > >>>> A problem with this solution could be if an axis needs to contain > >>>> letters instead of numbers, but it is doable, at least with comma > >>>> separation. > >>>> > >>>> A combination of the CV approach and initiating letters > >>>> > >> could maybe > >> > >>>> also > >>>> be an alternative: > >>>> > >>>> [Term] > >>>> id: MS:x > >>>> name: Waters RAW spectrum identifier > >>>> def: "F:function number=xsd:positiveInteger (optional),P:process > >>>> number=xsd:nonNegativeInteger (optional),S:scan > >>>> number=xsd:positiveInteger" > >>>> > >>>> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" > >>>> > >>>> It would be good to have some input on what is required to report > >>>> for the rest of the vendor instruments too, but I think the > >>>> nativeID format should be settled soon. > >>>> > >>>> Fredrik > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Matthew Chambers skrev: > >>>> > >>>> > >>>>> It's been 4 months since we released the format and we > >>>>> > >> still can't > >> > >>>>> point > >>>>> implementors to documentation specifying what nativeIDs > >>>>> > >> must look > >> > >>>>> like. > >>>>> Can we please comment on my proposal or get other proposals to > >>>>> discuss? > >>>>> I am not averse to initially leaving out the terms that I > >>>>> > >> couldn't > >> > >>>>> come > >>>>> up with well-defined formats for (Bruker, PKL, ABI > >>>>> > >> Oracle, Shimadzu). > >> > >>>>> -Matt > >>>>> > >>>>> > >>>>> -------- Original Message -------- > >>>>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID > >>>>> Date: Tue, 22 Jul 2008 21:28:34 -0500 > >>>>> From: Matt Chambers <mat...@va...> > >>>>> Reply-To: Mass spectrometry standard development > >>>>> <psi...@li...> > >>>>> To: Mass spectrometry standard development > >>>>> <psi...@li...> > >>>>> References: <488...@va...> > >>>>> > >>>>> > >> <5BE...@he...> > >> > >>>>> > >>>>> Hi Eric, > >>>>> > >>>>> Of course, sorry I should have realized that the axis > >>>>> > >> name concept > >> > >>>>> would > >>>>> confuse matters. The axis names are just there so that a machine > >>>>> reading > >>>>> the format specification can associate each comma > >>>>> > >> delimited section > >> > >>>>> (what I'm calling an "axis") with a logical name. > >>>>> > >>>>> Thermo: > >>>>> 0,1 (controller 0, scan 1) > >>>>> 0,2 > >>>>> 0,3 > >>>>> 1,1 (controller 1, scan 1) > >>>>> > >>>>> Waters: > >>>>> 1,0,1 (function 1, process 0, scan 1) > >>>>> 1,0,2 > >>>>> 1,0,3 > >>>>> 2,0,1 (function 2, process 0, scan 1) > >>>>> 2,0,2 > >>>>> 2,0,3 > >>>>> > >>>>> WIFF: > >>>>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) > >>>>> 0,1,1,3 > >>>>> 0,1,2,2 > >>>>> 0,1,2,3 > >>>>> 0,1,2,4 > >>>>> 0,1,3,2 > >>>>> 0,1,3,3 > >>>>> 0,1,3,2 > >>>>> 0,1,4,2 > >>>>> 1,1,1,2 > >>>>> 1,1,1,3 > >>>>> > >>>>> When a machine reads the WIFF definition, it will know that the > >>>>> fields > >>>>> mean (in order) "sample #", "period #", "cycle #", > >>>>> > >> "experiment #". > >> > >>>>> The > >>>>> detailed meaning of those names won't be covered by the format > >>>>> definition, but it's conceivable that we define those names in > >>>>> detail as > >>>>> separate CV terms. Remember the main idea for nativeID is to map a > >>>>> spectrum back to a source file in a way that is more > >>>>> > >> intuitive than a > >> > >>>>> simple index, so being able to use them to look up the > >>>>> > >> spectrum via a > >> > >>>>> native interface is important. > >>>>> > >>>>> I think we can safely require that the nativeIDs always > >>>>> > >> use all the > >> > >>>>> fields even if for an entire run all of a particular axis > >>>>> > >> has the > >> > >>>>> same > >>>>> value. For example, in Thermo data the controller number is almost > >>>>> always going to be the number corresponding with the MS controller > >>>>> (although the actual number is not guaranteed to be 0). > >>>>> > >> For backwards > >> > >>>>> compatibility with tools which expect Thermo ids to be > >>>>> > >> scan numbers > >> > >>>>> with > >>>>> an implicit assumption about the controller, it is very > >>>>> > >> reasonable to > >> > >>>>> require those tools to simply parse the id. Parsing a > >>>>> > >> comma-delimited > >> > >>>>> pair is far easier than all the other crap one must do to > >>>>> > >> get proper > >> > >>>>> mzML support. ;) In particular for you Eric and other TPP > >>>>> > >> users, the > >> > >>>>> RAMP adapter that pwiz uses will pass only the scan > >>>>> > >> number (and make > >> > >>>>> sure the spectrum is a mass spectrum). > >>>>> > >>>>> -Matt > >>>>> > >>>>> > >>>>> Eric Deutsch wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi Matt, thanks, this looks well thought out, although I'm not > >>>>>> sure I > >>>>>> fully understand the syntax you're proposing. Can you > >>>>>> > >> provide one > >> > >>>>>> or two > >>>>>> examples of each type? > >>>>>> > >>>>>> Thanks! > >>>>>> Eric > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: psi...@li... > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> [mailto:psidev-ms-dev- > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> bo...@li...] On Behalf Of Matthew Chambers > >>>>>>> Sent: Tuesday, July 22, 2008 3:15 PM > >>>>>>> To: Mass spectrometry standard development > >>>>>>> Subject: [Psidev-ms-dev] Nailing down NativeID > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I think it's overdue that we get this part of mzML formally > >>>>>>> specified > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> - > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> at least for the vendors and generic formats. I am proposing a > >>>>>>> draft > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> of > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> nativeID formats, the place to put the formats in the > >>>>>>> > >> specification > >> > >>>>>>> documents, and to have mzML instance documents > >>>>>>> > >> explicitly reference > >> > >>>>>>> > >>>>>>> > >>>>>> the > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> format they are using. This explicit reference should > >>>>>>> > >> be required > >> > >>>>>>> for > >>>>>>> semantic validation, but I'd also recommend that mzML > >>>>>>> > >> readers that > >> > >>>>>>> > >>>>>>> > >>>>>> don't > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> find or ignore the nativeID format term specified > >>>>>>> > >> simply treat the > >> > >>>>>>> nativeID as a free string (rendering it pretty useless, but at > >>>>>>> least > >>>>>>> there would be a defined way to handle it). The terms would be > >>>>>>> placed > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> in > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> the fileContent element to define the format for all > >>>>>>> > >> nativeIDs in > >> > >>>>>>> the > >>>>>>> file. > >>>>>>> > >>>>>>> I propose that the nativeID formats become CV terms, > >>>>>>> > >> and that the > >> > >>>>>>> term > >>>>>>> definitions define the formats unambiguously in a machine- > >>>>>>> readable way > >>>>>>> that a semantic validator can use to validate the > >>>>>>> > >> nativeIDs. I > >> > >>>>>>> will > >>>>>>> list my format drafts in OBO format. Each specific native format > >>>>>>> definition is a comma-delimited list of key-value pairs, where > >>>>>>> the key > >>>>>>> is the axis name (e.g. "scan number") and the value > >>>>>>> > >> specifies the > >> > >>>>>>> > >>>>>>> > >>>>>> format > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> of the axis in one of two ways: > >>>>>>> 1) a Perl-style regular expression that can provide semantic/ > >>>>>>> logical > >>>>>>> choices for strings (e.g. "controller type" can be > >>>>>>> > >> either "MS" or > >> > >>>>>>> > >>>>>>> > >>>>>> "PDA" > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> or "UV" etc.) > >>>>>>> 2) an XSD type that can specify unrestricted strings or > >>>>>>> > >> a numeric > >> > >>>>>>> type > >>>>>>> (possibly with semantic restrictions) > >>>>>>> > >>>>>>> I didn't actually need to use a regex for any of the > >>>>>>> > >> formats below, > >> > >>>>>>> > >>>>>>> > >>>>>> but > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> I can see their usefulness. For example, they would be > >>>>>>> > >> needed if > >> > >>>>>>> I'm > >>>>>>> wrong about Xcalibur and it makes more sense for Thermo > >>>>>>> > >> spectra > >> > >>>>>>> to use > >>>>>>> controller names instead of controller numbers. > >>>>>>> > >>>>>>> Obviously the syntax of the format definitions is flexible if > >>>>>>> people > >>>>>>> have better ideas (ideally one that could combine the power of > >>>>>>> regex > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> and > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> XSD; "infinite cosmic power, itty bitty living space!"). > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: native spectrum identifier > >>>>>>> def: "References a spectrum in a native (non-mzML) > >>>>>>> > >> spectrum source > >> > >>>>>>> according to a strict format. The format is dependent > >>>>>>> > >> on the type > >> > >>>>>>> of > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> the > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> spectra source." [PSI:MS] > >>>>>>> is_a: MS:1000524 ! data file content > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: native chromatogram identifier > >>>>>>> def: "References a chromatogram in a native (non-mzML) > >>>>>>> > >> chromatogram > >> > >>>>>>> source according to a strict format. The format is > >>>>>>> > >> dependent on the > >> > >>>>>>> > >>>>>>> > >>>>>> type > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> of the chromatogram source." [PSI:MS] > >>>>>>> is_a: MS:1000524 ! data file content > >>>>>>> ! note: I don't have any instances of native chromatogram > >>>>>>> identifiers, > >>>>>>> but I can conceive of the possibilities! > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: Thermo RAW spectrum identifier > >>>>>>> def: "controller type=xsd:nonNegativeInteger,scan > >>>>>>> number=xsd:positiveInteger" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note to Jim: apparently, Xcalibur can handle multiple > >>>>>>> controllers of > >>>>>>> the same type, so is a choice between strings still appropriate? > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: Waters RAW spectrum identifier > >>>>>>> def: "function number=xsd:positiveInteger,process > >>>>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> [PSI:MS] > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note: is process number ever non-zero? > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: WIFF spectrum identifier > >>>>>>> def: "sample number=xsd:nonNegativeInteger,period > >>>>>>> number=xsd:positiveInteger,cycle > >>>>>>> number=xsd:positiveInteger,experiment > >>>>>>> number=xsd:positiveInteger" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: ABI Oracle database spectrum identifier > >>>>>>> def: "" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note: need expertise here; alternatively, we could lump these > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> spectra > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> in with DTA/PKL nativeIDs (see below) when they are > >>>>>>> > >> extracted to > >> > >>>>>>> T2Ds > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: Bruker spectrum identifier > >>>>>>> def: "" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID > >>>>>>> spectrum > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> is > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> natively a single file, so that seems to make nativeID > >>>>>>> > >> irrelevant > >> > >>>>>>> and > >>>>>>> sourceFile[Ref] critical > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: Shimadzu spectrum identifier > >>>>>>> def: "" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note: need expertise here > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: MGF spectrum identifier > >>>>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note: TITLE attributes are optional, so the index > >>>>>>> > >> into the file > >> > >>>>>>> is > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> the > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> only reliable source (TITLE can be used for the string id if > >>>>>>> present) > >>>>>>> > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: mzData/mzXML/MS2 spectrum identifier > >>>>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> [Term] > >>>>>>> id: MS:x > >>>>>>> name: PKL/DTA spectrum identifier > >>>>>>> def: "" [PSI:MS] > >>>>>>> is_a: MS:x ! native spectrum identifier > >>>>>>> ! note: like Bruker, a PKL or DTA could be standalone > >>>>>>> > >> so AFAIK the > >> > >>>>>>> > >>>>>>> > >>>>>> only > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> way to reliably reference it is via sourceFileRef > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >> -------------------------------------------------------------- > >> ----------- > >> This SF.Net email is sponsored by the Moblin Your Move > >> Developer's challenge > >> Build the coolest Linux based applications with Moblin SDK & > >> win great prizes > >> Grand prize is a trip for two to an Open Source event > >> anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > > > > > > ------------------------------------------------------------------------ > - > > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > > Build the coolest Linux based applications with Moblin SDK & win great > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the > world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-09-22 14:36:48
|
I tried to send this on Friday but the mailing list never bounced it back or copied me on the broadcast so I suppose it got lost in cyberspace. -------- Original Message -------- Where did the requirement (in a design sense) of human readability for nativeID come from? I don't see the need for that requirement, and in the absence of it, conciseness seems most appropriate. Even in the extreme case of a fully verbose identifier, e.g. "sample1,period1,cycle123,experiment5", someone who is not familiar with the WIFF format and its ids will have no idea what those words mean. In the worst case scenario it will just confuse human readers. On the other hand, someone who is familiar with the WIFF format knows to expect the ids to occur in a certain order and what each id means. As a compromise, we can recommend that implementors place an XML comment containing the nativeID definition used for a file in the header, e.g.: <!-- WIFF spectrum identifier: sample number=xsd:nonNegativeInteger,period number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment number=xsd:positiveInteger --> or more readable without types: <!-- WIFF spectrum identifier: sample number,period number,cycle number,experiment number --> Or we can make the nativeID term name itself like "WIFF spectrum identifier: sample number,period number,cycle number,experiment number" and then only give the types in the definition, like: [Term] id: MS:x name: WIFF spectrum identifier (sample number, period number, cycle number, experiment number) def: "sample number=xsd:nonNegativeInteger,period number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment number=xsd:positiveInteger" [PSI:MS] is_a: MS:x ! native spectrum identifier That way the nativeID specifier in the file header would always provide the names. -Matt Eric Deutsch wrote: > Hi everyone, indeed this is a good discussion, thanks for bringing it back > to the fore. I can't do this coming Monday, but let's have another telecon > on Monday Sep 29 at the usual time, 9am PDT. > > So it seems we have three proposals on the table: > > A) Thermo: "con0,scan1" or WIFF: "sam0,per1,cyc1,exp2" > > B) Thermo: "C0,S1" or WIFF: "M0,P1,Y1,E2" > > C) Thermo: "0,1" or WIFF: "0,1,1,2" > > Or close variants thereof. Shall we hold open the floor for a little more > debate and then take a poll? > > My opinion is that A is the clearest to look at and while a little verbose, > it is embedded in XML and thus a drop in the bucket. C is very concise, but > not easily human interpretable. Conciseness doesn't seem like a great > advantage here. B seems undesirable to me as there are multiple possible > words beginning with P and S, so that just creates confusion. So I kinda > like A myself. > > What do y'all think? > > Thanks, > Eric > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matt Chambers >> Sent: Thursday, September 18, 2008 4:45 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >> >> I prefer nativeIDs without the labels. Labels work better and can be >> verbose in the arbitrary string 'id'; nativeID is provided >> primarily for >> machine readability and guaranteed formatting so to me it just makes >> more sense to "KISS" (keep it small and simple). :) >> >> Since the two types of ids co-exist, human interpretation of the >> nativeID is not an issue. >> >> This is good discussion though, we just need more of it - even it's a >> simple assent to the proposal (or the alternatives). :) >> >> Thanks, >> -Matt >> >> >> Darren Kessner wrote: >> >>> I think Fredrik has good points, and I like his idea of >>> >> using short >> >>> labels. >>> >>> An alternative to consider is 3-4 letter abbreviations >>> >> (using Matt's >> >>> examples): >>> >>> Thermo: >>> "con0 scan1" >>> "scan2" >>> >>> Waters: >>> "fun1 proc0 scan1" >>> >>> WIFF: >>> "sam0 per1 cyc1 exp2" >>> >>> >>> Darren >>> >>> >>> On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: >>> >>> >>> >>>> Hi Matt, >>>> >>>> I agree that the Native ID is a very important feature of >>>> >> the format >> >>>> and >>>> that it needs to be settled. Your solution is elegant, I >>>> >> can see two >> >>>> disadvantages though: >>>> 1) It is not straightforward to intepret the nativeID by visual >>>> inspection, since you need to look in the CV to find out >>>> >> what order >> >>>> the >>>> numbers are in. >>>> 2) If the number in one axis is unknown or irrelevant for >>>> >> the setup, >> >>>> it >>>> could be a problem to have it as required. One could imagine just >>>> specifying an empty field instead of a number in that situation >>>> though. >>>> >>>> An alternative is to have reserved characters in the native id: >>>> S = scan >>>> F = function >>>> C = controller >>>> P = process >>>> Cy (or maybe Y) = Cycle >>>> E = Experiment >>>> Pe = Period >>>> Other reserved letters can be added as needed. >>>> >>>> Then one can specify these as required for the instrumental setup. >>>> Scan 1 would be "S1" >>>> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", >>>> >> the later if >> >>>> comma separation is wanted. >>>> If a certain order of the axes is wanted this can be >>>> >> imposed by regex. >> >>>> A problem with this solution could be if an axis needs to contain >>>> letters instead of numbers, but it is doable, at least with comma >>>> separation. >>>> >>>> A combination of the CV approach and initiating letters >>>> >> could maybe >> >>>> also >>>> be an alternative: >>>> >>>> [Term] >>>> id: MS:x >>>> name: Waters RAW spectrum identifier >>>> def: "F:function number=xsd:positiveInteger (optional),P:process >>>> number=xsd:nonNegativeInteger (optional),S:scan >>>> number=xsd:positiveInteger" >>>> >>>> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" >>>> >>>> It would be good to have some input on what is required to report >>>> for the rest of the vendor instruments too, but I think the >>>> nativeID format should be settled soon. >>>> >>>> Fredrik >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Matthew Chambers skrev: >>>> >>>> >>>>> It's been 4 months since we released the format and we >>>>> >> still can't >> >>>>> point >>>>> implementors to documentation specifying what nativeIDs >>>>> >> must look >> >>>>> like. >>>>> Can we please comment on my proposal or get other proposals to >>>>> discuss? >>>>> I am not averse to initially leaving out the terms that I >>>>> >> couldn't >> >>>>> come >>>>> up with well-defined formats for (Bruker, PKL, ABI >>>>> >> Oracle, Shimadzu). >> >>>>> -Matt >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >>>>> Date: Tue, 22 Jul 2008 21:28:34 -0500 >>>>> From: Matt Chambers <mat...@va...> >>>>> Reply-To: Mass spectrometry standard development >>>>> <psi...@li...> >>>>> To: Mass spectrometry standard development >>>>> <psi...@li...> >>>>> References: <488...@va...> >>>>> >>>>> >> <5BE...@he...> >> >>>>> >>>>> Hi Eric, >>>>> >>>>> Of course, sorry I should have realized that the axis >>>>> >> name concept >> >>>>> would >>>>> confuse matters. The axis names are just there so that a machine >>>>> reading >>>>> the format specification can associate each comma >>>>> >> delimited section >> >>>>> (what I'm calling an "axis") with a logical name. >>>>> >>>>> Thermo: >>>>> 0,1 (controller 0, scan 1) >>>>> 0,2 >>>>> 0,3 >>>>> 1,1 (controller 1, scan 1) >>>>> >>>>> Waters: >>>>> 1,0,1 (function 1, process 0, scan 1) >>>>> 1,0,2 >>>>> 1,0,3 >>>>> 2,0,1 (function 2, process 0, scan 1) >>>>> 2,0,2 >>>>> 2,0,3 >>>>> >>>>> WIFF: >>>>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) >>>>> 0,1,1,3 >>>>> 0,1,2,2 >>>>> 0,1,2,3 >>>>> 0,1,2,4 >>>>> 0,1,3,2 >>>>> 0,1,3,3 >>>>> 0,1,3,2 >>>>> 0,1,4,2 >>>>> 1,1,1,2 >>>>> 1,1,1,3 >>>>> >>>>> When a machine reads the WIFF definition, it will know that the >>>>> fields >>>>> mean (in order) "sample #", "period #", "cycle #", >>>>> >> "experiment #". >> >>>>> The >>>>> detailed meaning of those names won't be covered by the format >>>>> definition, but it's conceivable that we define those names in >>>>> detail as >>>>> separate CV terms. Remember the main idea for nativeID is to map a >>>>> spectrum back to a source file in a way that is more >>>>> >> intuitive than a >> >>>>> simple index, so being able to use them to look up the >>>>> >> spectrum via a >> >>>>> native interface is important. >>>>> >>>>> I think we can safely require that the nativeIDs always >>>>> >> use all the >> >>>>> fields even if for an entire run all of a particular axis >>>>> >> has the >> >>>>> same >>>>> value. For example, in Thermo data the controller number is almost >>>>> always going to be the number corresponding with the MS controller >>>>> (although the actual number is not guaranteed to be 0). >>>>> >> For backwards >> >>>>> compatibility with tools which expect Thermo ids to be >>>>> >> scan numbers >> >>>>> with >>>>> an implicit assumption about the controller, it is very >>>>> >> reasonable to >> >>>>> require those tools to simply parse the id. Parsing a >>>>> >> comma-delimited >> >>>>> pair is far easier than all the other crap one must do to >>>>> >> get proper >> >>>>> mzML support. ;) In particular for you Eric and other TPP >>>>> >> users, the >> >>>>> RAMP adapter that pwiz uses will pass only the scan >>>>> >> number (and make >> >>>>> sure the spectrum is a mass spectrum). >>>>> >>>>> -Matt >>>>> >>>>> >>>>> Eric Deutsch wrote: >>>>> >>>>> >>>>> >>>>>> Hi Matt, thanks, this looks well thought out, although I'm not >>>>>> sure I >>>>>> fully understand the syntax you're proposing. Can you >>>>>> >> provide one >> >>>>>> or two >>>>>> examples of each type? >>>>>> >>>>>> Thanks! >>>>>> Eric >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: psi...@li... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> [mailto:psidev-ms-dev- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> bo...@li...] On Behalf Of Matthew Chambers >>>>>>> Sent: Tuesday, July 22, 2008 3:15 PM >>>>>>> To: Mass spectrometry standard development >>>>>>> Subject: [Psidev-ms-dev] Nailing down NativeID >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I think it's overdue that we get this part of mzML formally >>>>>>> specified >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> - >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> at least for the vendors and generic formats. I am proposing a >>>>>>> draft >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> of >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> nativeID formats, the place to put the formats in the >>>>>>> >> specification >> >>>>>>> documents, and to have mzML instance documents >>>>>>> >> explicitly reference >> >>>>>>> >>>>>>> >>>>>> the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> format they are using. This explicit reference should >>>>>>> >> be required >> >>>>>>> for >>>>>>> semantic validation, but I'd also recommend that mzML >>>>>>> >> readers that >> >>>>>>> >>>>>>> >>>>>> don't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> find or ignore the nativeID format term specified >>>>>>> >> simply treat the >> >>>>>>> nativeID as a free string (rendering it pretty useless, but at >>>>>>> least >>>>>>> there would be a defined way to handle it). The terms would be >>>>>>> placed >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> in >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> the fileContent element to define the format for all >>>>>>> >> nativeIDs in >> >>>>>>> the >>>>>>> file. >>>>>>> >>>>>>> I propose that the nativeID formats become CV terms, >>>>>>> >> and that the >> >>>>>>> term >>>>>>> definitions define the formats unambiguously in a machine- >>>>>>> readable way >>>>>>> that a semantic validator can use to validate the >>>>>>> >> nativeIDs. I >> >>>>>>> will >>>>>>> list my format drafts in OBO format. Each specific native format >>>>>>> definition is a comma-delimited list of key-value pairs, where >>>>>>> the key >>>>>>> is the axis name (e.g. "scan number") and the value >>>>>>> >> specifies the >> >>>>>>> >>>>>>> >>>>>> format >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> of the axis in one of two ways: >>>>>>> 1) a Perl-style regular expression that can provide semantic/ >>>>>>> logical >>>>>>> choices for strings (e.g. "controller type" can be >>>>>>> >> either "MS" or >> >>>>>>> >>>>>>> >>>>>> "PDA" >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> or "UV" etc.) >>>>>>> 2) an XSD type that can specify unrestricted strings or >>>>>>> >> a numeric >> >>>>>>> type >>>>>>> (possibly with semantic restrictions) >>>>>>> >>>>>>> I didn't actually need to use a regex for any of the >>>>>>> >> formats below, >> >>>>>>> >>>>>>> >>>>>> but >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> I can see their usefulness. For example, they would be >>>>>>> >> needed if >> >>>>>>> I'm >>>>>>> wrong about Xcalibur and it makes more sense for Thermo >>>>>>> >> spectra >> >>>>>>> to use >>>>>>> controller names instead of controller numbers. >>>>>>> >>>>>>> Obviously the syntax of the format definitions is flexible if >>>>>>> people >>>>>>> have better ideas (ideally one that could combine the power of >>>>>>> regex >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> XSD; "infinite cosmic power, itty bitty living space!"). >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: native spectrum identifier >>>>>>> def: "References a spectrum in a native (non-mzML) >>>>>>> >> spectrum source >> >>>>>>> according to a strict format. The format is dependent >>>>>>> >> on the type >> >>>>>>> of >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> spectra source." [PSI:MS] >>>>>>> is_a: MS:1000524 ! data file content >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: native chromatogram identifier >>>>>>> def: "References a chromatogram in a native (non-mzML) >>>>>>> >> chromatogram >> >>>>>>> source according to a strict format. The format is >>>>>>> >> dependent on the >> >>>>>>> >>>>>>> >>>>>> type >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> of the chromatogram source." [PSI:MS] >>>>>>> is_a: MS:1000524 ! data file content >>>>>>> ! note: I don't have any instances of native chromatogram >>>>>>> identifiers, >>>>>>> but I can conceive of the possibilities! >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: Thermo RAW spectrum identifier >>>>>>> def: "controller type=xsd:nonNegativeInteger,scan >>>>>>> number=xsd:positiveInteger" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note to Jim: apparently, Xcalibur can handle multiple >>>>>>> controllers of >>>>>>> the same type, so is a choice between strings still appropriate? >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: Waters RAW spectrum identifier >>>>>>> def: "function number=xsd:positiveInteger,process >>>>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> [PSI:MS] >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note: is process number ever non-zero? >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: WIFF spectrum identifier >>>>>>> def: "sample number=xsd:nonNegativeInteger,period >>>>>>> number=xsd:positiveInteger,cycle >>>>>>> number=xsd:positiveInteger,experiment >>>>>>> number=xsd:positiveInteger" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: ABI Oracle database spectrum identifier >>>>>>> def: "" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note: need expertise here; alternatively, we could lump these >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> spectra >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> in with DTA/PKL nativeIDs (see below) when they are >>>>>>> >> extracted to >> >>>>>>> T2Ds >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: Bruker spectrum identifier >>>>>>> def: "" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID >>>>>>> spectrum >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> is >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> natively a single file, so that seems to make nativeID >>>>>>> >> irrelevant >> >>>>>>> and >>>>>>> sourceFile[Ref] critical >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: Shimadzu spectrum identifier >>>>>>> def: "" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note: need expertise here >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: MGF spectrum identifier >>>>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note: TITLE attributes are optional, so the index >>>>>>> >> into the file >> >>>>>>> is >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> only reliable source (TITLE can be used for the string id if >>>>>>> present) >>>>>>> >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: mzData/mzXML/MS2 spectrum identifier >>>>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> [Term] >>>>>>> id: MS:x >>>>>>> name: PKL/DTA spectrum identifier >>>>>>> def: "" [PSI:MS] >>>>>>> is_a: MS:x ! native spectrum identifier >>>>>>> ! note: like Bruker, a PKL or DTA could be standalone >>>>>>> >> so AFAIK the >> >>>>>>> >>>>>>> >>>>>> only >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> way to reliably reference it is via sourceFileRef >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >> -------------------------------------------------------------- >> ----------- >> This SF.Net email is sponsored by the Moblin Your Move >> Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & >> win great prizes >> Grand prize is a trip for two to an Open Source event >> anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Eric D. <ede...@sy...> - 2008-09-18 22:45:10
|
Hi everyone, indeed this is a good discussion, thanks for bringing it back to the fore. I can't do this coming Monday, but let's have another telecon on Monday Sep 29 at the usual time, 9am PDT. So it seems we have three proposals on the table: A) Thermo: "con0,scan1" or WIFF: "sam0,per1,cyc1,exp2" B) Thermo: "C0,S1" or WIFF: "M0,P1,Y1,E2" C) Thermo: "0,1" or WIFF: "0,1,1,2" Or close variants thereof. Shall we hold open the floor for a little more debate and then take a poll? My opinion is that A is the clearest to look at and while a little verbose, it is embedded in XML and thus a drop in the bucket. C is very concise, but not easily human interpretable. Conciseness doesn't seem like a great advantage here. B seems undesirable to me as there are multiple possible words beginning with P and S, so that just creates confusion. So I kinda like A myself. What do y'all think? Thanks, Eric > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matt Chambers > Sent: Thursday, September 18, 2008 4:45 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Nailing down NativeID > > I prefer nativeIDs without the labels. Labels work better and can be > verbose in the arbitrary string 'id'; nativeID is provided > primarily for > machine readability and guaranteed formatting so to me it just makes > more sense to "KISS" (keep it small and simple). :) > > Since the two types of ids co-exist, human interpretation of the > nativeID is not an issue. > > This is good discussion though, we just need more of it - even it's a > simple assent to the proposal (or the alternatives). :) > > Thanks, > -Matt > > > Darren Kessner wrote: > > I think Fredrik has good points, and I like his idea of > using short > > labels. > > > > An alternative to consider is 3-4 letter abbreviations > (using Matt's > > examples): > > > > Thermo: > > "con0 scan1" > > "scan2" > > > > Waters: > > "fun1 proc0 scan1" > > > > WIFF: > > "sam0 per1 cyc1 exp2" > > > > > > Darren > > > > > > On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > > > > > >> Hi Matt, > >> > >> I agree that the Native ID is a very important feature of > the format > >> and > >> that it needs to be settled. Your solution is elegant, I > can see two > >> disadvantages though: > >> 1) It is not straightforward to intepret the nativeID by visual > >> inspection, since you need to look in the CV to find out > what order > >> the > >> numbers are in. > >> 2) If the number in one axis is unknown or irrelevant for > the setup, > >> it > >> could be a problem to have it as required. One could imagine just > >> specifying an empty field instead of a number in that situation > >> though. > >> > >> An alternative is to have reserved characters in the native id: > >> S = scan > >> F = function > >> C = controller > >> P = process > >> Cy (or maybe Y) = Cycle > >> E = Experiment > >> Pe = Period > >> Other reserved letters can be added as needed. > >> > >> Then one can specify these as required for the instrumental setup. > >> Scan 1 would be "S1" > >> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", > the later if > >> comma separation is wanted. > >> If a certain order of the axes is wanted this can be > imposed by regex. > >> A problem with this solution could be if an axis needs to contain > >> letters instead of numbers, but it is doable, at least with comma > >> separation. > >> > >> A combination of the CV approach and initiating letters > could maybe > >> also > >> be an alternative: > >> > >> [Term] > >> id: MS:x > >> name: Waters RAW spectrum identifier > >> def: "F:function number=xsd:positiveInteger (optional),P:process > >> number=xsd:nonNegativeInteger (optional),S:scan > >> number=xsd:positiveInteger" > >> > >> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" > >> > >> It would be good to have some input on what is required to report > >> for the rest of the vendor instruments too, but I think the > >> nativeID format should be settled soon. > >> > >> Fredrik > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Matthew Chambers skrev: > >> > >>> It's been 4 months since we released the format and we > still can't > >>> point > >>> implementors to documentation specifying what nativeIDs > must look > >>> like. > >>> Can we please comment on my proposal or get other proposals to > >>> discuss? > >>> I am not averse to initially leaving out the terms that I > couldn't > >>> come > >>> up with well-defined formats for (Bruker, PKL, ABI > Oracle, Shimadzu). > >>> > >>> -Matt > >>> > >>> > >>> -------- Original Message -------- > >>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID > >>> Date: Tue, 22 Jul 2008 21:28:34 -0500 > >>> From: Matt Chambers <mat...@va...> > >>> Reply-To: Mass spectrometry standard development > >>> <psi...@li...> > >>> To: Mass spectrometry standard development > >>> <psi...@li...> > >>> References: <488...@va...> > >>> > <5BE...@he...> > >>> > >>> > >>> > >>> Hi Eric, > >>> > >>> Of course, sorry I should have realized that the axis > name concept > >>> would > >>> confuse matters. The axis names are just there so that a machine > >>> reading > >>> the format specification can associate each comma > delimited section > >>> (what I'm calling an "axis") with a logical name. > >>> > >>> Thermo: > >>> 0,1 (controller 0, scan 1) > >>> 0,2 > >>> 0,3 > >>> 1,1 (controller 1, scan 1) > >>> > >>> Waters: > >>> 1,0,1 (function 1, process 0, scan 1) > >>> 1,0,2 > >>> 1,0,3 > >>> 2,0,1 (function 2, process 0, scan 1) > >>> 2,0,2 > >>> 2,0,3 > >>> > >>> WIFF: > >>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) > >>> 0,1,1,3 > >>> 0,1,2,2 > >>> 0,1,2,3 > >>> 0,1,2,4 > >>> 0,1,3,2 > >>> 0,1,3,3 > >>> 0,1,3,2 > >>> 0,1,4,2 > >>> 1,1,1,2 > >>> 1,1,1,3 > >>> > >>> When a machine reads the WIFF definition, it will know that the > >>> fields > >>> mean (in order) "sample #", "period #", "cycle #", > "experiment #". > >>> The > >>> detailed meaning of those names won't be covered by the format > >>> definition, but it's conceivable that we define those names in > >>> detail as > >>> separate CV terms. Remember the main idea for nativeID is to map a > >>> spectrum back to a source file in a way that is more > intuitive than a > >>> simple index, so being able to use them to look up the > spectrum via a > >>> native interface is important. > >>> > >>> I think we can safely require that the nativeIDs always > use all the > >>> fields even if for an entire run all of a particular axis > has the > >>> same > >>> value. For example, in Thermo data the controller number is almost > >>> always going to be the number corresponding with the MS controller > >>> (although the actual number is not guaranteed to be 0). > For backwards > >>> compatibility with tools which expect Thermo ids to be > scan numbers > >>> with > >>> an implicit assumption about the controller, it is very > reasonable to > >>> require those tools to simply parse the id. Parsing a > comma-delimited > >>> pair is far easier than all the other crap one must do to > get proper > >>> mzML support. ;) In particular for you Eric and other TPP > users, the > >>> RAMP adapter that pwiz uses will pass only the scan > number (and make > >>> sure the spectrum is a mass spectrum). > >>> > >>> -Matt > >>> > >>> > >>> Eric Deutsch wrote: > >>> > >>> > >>>> Hi Matt, thanks, this looks well thought out, although I'm not > >>>> sure I > >>>> fully understand the syntax you're proposing. Can you > provide one > >>>> or two > >>>> examples of each type? > >>>> > >>>> Thanks! > >>>> Eric > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: psi...@li... > >>>>> > >>>>> > >>>>> > >>>> [mailto:psidev-ms-dev- > >>>> > >>>> > >>>> > >>>>> bo...@li...] On Behalf Of Matthew Chambers > >>>>> Sent: Tuesday, July 22, 2008 3:15 PM > >>>>> To: Mass spectrometry standard development > >>>>> Subject: [Psidev-ms-dev] Nailing down NativeID > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I think it's overdue that we get this part of mzML formally > >>>>> specified > >>>>> > >>>>> > >>>>> > >>>> - > >>>> > >>>> > >>>> > >>>>> at least for the vendors and generic formats. I am proposing a > >>>>> draft > >>>>> > >>>>> > >>>>> > >>>> of > >>>> > >>>> > >>>> > >>>>> nativeID formats, the place to put the formats in the > specification > >>>>> documents, and to have mzML instance documents > explicitly reference > >>>>> > >>>>> > >>>>> > >>>> the > >>>> > >>>> > >>>> > >>>>> format they are using. This explicit reference should > be required > >>>>> for > >>>>> semantic validation, but I'd also recommend that mzML > readers that > >>>>> > >>>>> > >>>>> > >>>> don't > >>>> > >>>> > >>>> > >>>>> find or ignore the nativeID format term specified > simply treat the > >>>>> nativeID as a free string (rendering it pretty useless, but at > >>>>> least > >>>>> there would be a defined way to handle it). The terms would be > >>>>> placed > >>>>> > >>>>> > >>>>> > >>>> in > >>>> > >>>> > >>>> > >>>>> the fileContent element to define the format for all > nativeIDs in > >>>>> the > >>>>> file. > >>>>> > >>>>> I propose that the nativeID formats become CV terms, > and that the > >>>>> term > >>>>> definitions define the formats unambiguously in a machine- > >>>>> readable way > >>>>> that a semantic validator can use to validate the > nativeIDs. I > >>>>> will > >>>>> list my format drafts in OBO format. Each specific native format > >>>>> definition is a comma-delimited list of key-value pairs, where > >>>>> the key > >>>>> is the axis name (e.g. "scan number") and the value > specifies the > >>>>> > >>>>> > >>>>> > >>>> format > >>>> > >>>> > >>>> > >>>>> of the axis in one of two ways: > >>>>> 1) a Perl-style regular expression that can provide semantic/ > >>>>> logical > >>>>> choices for strings (e.g. "controller type" can be > either "MS" or > >>>>> > >>>>> > >>>>> > >>>> "PDA" > >>>> > >>>> > >>>> > >>>>> or "UV" etc.) > >>>>> 2) an XSD type that can specify unrestricted strings or > a numeric > >>>>> type > >>>>> (possibly with semantic restrictions) > >>>>> > >>>>> I didn't actually need to use a regex for any of the > formats below, > >>>>> > >>>>> > >>>>> > >>>> but > >>>> > >>>> > >>>> > >>>>> I can see their usefulness. For example, they would be > needed if > >>>>> I'm > >>>>> wrong about Xcalibur and it makes more sense for Thermo > spectra > >>>>> to use > >>>>> controller names instead of controller numbers. > >>>>> > >>>>> Obviously the syntax of the format definitions is flexible if > >>>>> people > >>>>> have better ideas (ideally one that could combine the power of > >>>>> regex > >>>>> > >>>>> > >>>>> > >>>> and > >>>> > >>>> > >>>> > >>>>> XSD; "infinite cosmic power, itty bitty living space!"). > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: native spectrum identifier > >>>>> def: "References a spectrum in a native (non-mzML) > spectrum source > >>>>> according to a strict format. The format is dependent > on the type > >>>>> of > >>>>> > >>>>> > >>>>> > >>>> the > >>>> > >>>> > >>>> > >>>>> spectra source." [PSI:MS] > >>>>> is_a: MS:1000524 ! data file content > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: native chromatogram identifier > >>>>> def: "References a chromatogram in a native (non-mzML) > chromatogram > >>>>> source according to a strict format. The format is > dependent on the > >>>>> > >>>>> > >>>>> > >>>> type > >>>> > >>>> > >>>> > >>>>> of the chromatogram source." [PSI:MS] > >>>>> is_a: MS:1000524 ! data file content > >>>>> ! note: I don't have any instances of native chromatogram > >>>>> identifiers, > >>>>> but I can conceive of the possibilities! > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Thermo RAW spectrum identifier > >>>>> def: "controller type=xsd:nonNegativeInteger,scan > >>>>> number=xsd:positiveInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note to Jim: apparently, Xcalibur can handle multiple > >>>>> controllers of > >>>>> the same type, so is a choice between strings still appropriate? > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Waters RAW spectrum identifier > >>>>> def: "function number=xsd:positiveInteger,process > >>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" > >>>>> > >>>>> > >>>>> > >>>> [PSI:MS] > >>>> > >>>> > >>>> > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: is process number ever non-zero? > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: WIFF spectrum identifier > >>>>> def: "sample number=xsd:nonNegativeInteger,period > >>>>> number=xsd:positiveInteger,cycle > >>>>> number=xsd:positiveInteger,experiment > >>>>> number=xsd:positiveInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: ABI Oracle database spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: need expertise here; alternatively, we could lump these > >>>>> > >>>>> > >>>>> > >>>> spectra > >>>> > >>>> > >>>> > >>>>> in with DTA/PKL nativeIDs (see below) when they are > extracted to > >>>>> T2Ds > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Bruker spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID > >>>>> spectrum > >>>>> > >>>>> > >>>>> > >>>> is > >>>> > >>>> > >>>> > >>>>> natively a single file, so that seems to make nativeID > irrelevant > >>>>> and > >>>>> sourceFile[Ref] critical > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Shimadzu spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: need expertise here > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: MGF spectrum identifier > >>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: TITLE attributes are optional, so the index > into the file > >>>>> is > >>>>> > >>>>> > >>>>> > >>>> the > >>>> > >>>> > >>>> > >>>>> only reliable source (TITLE can be used for the string id if > >>>>> present) > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: mzData/mzXML/MS2 spectrum identifier > >>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: PKL/DTA spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: like Bruker, a PKL or DTA could be standalone > so AFAIK the > >>>>> > >>>>> > >>>>> > >>>> only > >>>> > >>>> > >>>> > >>>>> way to reliably reference it is via sourceFileRef > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > > > > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge > Build the coolest Linux based applications with Moblin SDK & > win great prizes > Grand prize is a trip for two to an Open Source event > anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matt C. <mat...@va...> - 2008-09-18 17:36:20
|
Ah, you're right. No whitespace or colons, but pretty much anything else? The XML standard seems to give a huge list of hexadecimal Unicode characters or character ranges to specify what's valid in xs:ID. Wilfred H Tang wrote: > > Currently, 'id' is not an arbitrary string. Rather, it is designated > as an XML-type ID, which means there are significant restrictions on > what is allowed. > > > > *Matt Chambers <mat...@va...>* > Sent by: psi...@li... > > 09/18/2008 04:44 PM > Please respond to > Mass spectrometry standard development > <psi...@li...> > > > > To > Mass spectrometry standard development > <psi...@li...> > cc > > Subject > Re: [Psidev-ms-dev] Nailing down NativeID > > > > > > > > > > I prefer nativeIDs without the labels. Labels work better and can be > verbose in the arbitrary string 'id'; nativeID is provided primarily for > machine readability and guaranteed formatting so to me it just makes > more sense to "KISS" (keep it small and simple). :) > > Since the two types of ids co-exist, human interpretation of the > nativeID is not an issue. > > This is good discussion though, we just need more of it - even it's a > simple assent to the proposal (or the alternatives). :) > > Thanks, > -Matt > > > Darren Kessner wrote: > > I think Fredrik has good points, and I like his idea of using short > > labels. > > > > An alternative to consider is 3-4 letter abbreviations (using Matt's > > examples): > > > > Thermo: > > "con0 scan1" > > "scan2" > > > > Waters: > > "fun1 proc0 scan1" > > > > WIFF: > > "sam0 per1 cyc1 exp2" > > > > > > Darren > > > > > > On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > > > > > >> Hi Matt, > >> > >> I agree that the Native ID is a very important feature of the format > >> and > >> that it needs to be settled. Your solution is elegant, I can see two > >> disadvantages though: > >> 1) It is not straightforward to intepret the nativeID by visual > >> inspection, since you need to look in the CV to find out what order > >> the > >> numbers are in. > >> 2) If the number in one axis is unknown or irrelevant for the setup, > >> it > >> could be a problem to have it as required. One could imagine just > >> specifying an empty field instead of a number in that situation > >> though. > >> > >> An alternative is to have reserved characters in the native id: > >> S = scan > >> F = function > >> C = controller > >> P = process > >> Cy (or maybe Y) = Cycle > >> E = Experiment > >> Pe = Period > >> Other reserved letters can be added as needed. > >> > >> Then one can specify these as required for the instrumental setup. > >> Scan 1 would be "S1" > >> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", the later if > >> comma separation is wanted. > >> If a certain order of the axes is wanted this can be imposed by regex. > >> A problem with this solution could be if an axis needs to contain > >> letters instead of numbers, but it is doable, at least with comma > >> separation. > >> > >> A combination of the CV approach and initiating letters could maybe > >> also > >> be an alternative: > >> > >> [Term] > >> id: MS:x > >> name: Waters RAW spectrum identifier > >> def: "F:function number=xsd:positiveInteger (optional),P:process > >> number=xsd:nonNegativeInteger (optional),S:scan > >> number=xsd:positiveInteger" > >> > >> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" > >> > >> It would be good to have some input on what is required to report > >> for the rest of the vendor instruments too, but I think the > >> nativeID format should be settled soon. > >> > >> Fredrik > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Matthew Chambers skrev: > >> > >>> It's been 4 months since we released the format and we still can't > >>> point > >>> implementors to documentation specifying what nativeIDs must look > >>> like. > >>> Can we please comment on my proposal or get other proposals to > >>> discuss? > >>> I am not averse to initially leaving out the terms that I couldn't > >>> come > >>> up with well-defined formats for (Bruker, PKL, ABI Oracle, Shimadzu). > >>> > >>> -Matt > >>> > >>> > >>> -------- Original Message -------- > >>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID > >>> Date: Tue, 22 Jul 2008 21:28:34 -0500 > >>> From: Matt Chambers <mat...@va...> > >>> Reply-To: Mass spectrometry standard development > >>> <psi...@li...> > >>> To: Mass spectrometry standard development > >>> <psi...@li...> > >>> References: <488...@va...> > >>> <5BE...@he...> > >>> > >>> > >>> > >>> Hi Eric, > >>> > >>> Of course, sorry I should have realized that the axis name concept > >>> would > >>> confuse matters. The axis names are just there so that a machine > >>> reading > >>> the format specification can associate each comma delimited section > >>> (what I'm calling an "axis") with a logical name. > >>> > >>> Thermo: > >>> 0,1 (controller 0, scan 1) > >>> 0,2 > >>> 0,3 > >>> 1,1 (controller 1, scan 1) > >>> > >>> Waters: > >>> 1,0,1 (function 1, process 0, scan 1) > >>> 1,0,2 > >>> 1,0,3 > >>> 2,0,1 (function 2, process 0, scan 1) > >>> 2,0,2 > >>> 2,0,3 > >>> > >>> WIFF: > >>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) > >>> 0,1,1,3 > >>> 0,1,2,2 > >>> 0,1,2,3 > >>> 0,1,2,4 > >>> 0,1,3,2 > >>> 0,1,3,3 > >>> 0,1,3,2 > >>> 0,1,4,2 > >>> 1,1,1,2 > >>> 1,1,1,3 > >>> > >>> When a machine reads the WIFF definition, it will know that the > >>> fields > >>> mean (in order) "sample #", "period #", "cycle #", "experiment #". > >>> The > >>> detailed meaning of those names won't be covered by the format > >>> definition, but it's conceivable that we define those names in > >>> detail as > >>> separate CV terms. Remember the main idea for nativeID is to map a > >>> spectrum back to a source file in a way that is more intuitive than a > >>> simple index, so being able to use them to look up the spectrum via a > >>> native interface is important. > >>> > >>> I think we can safely require that the nativeIDs always use all the > >>> fields even if for an entire run all of a particular axis has the > >>> same > >>> value. For example, in Thermo data the controller number is almost > >>> always going to be the number corresponding with the MS controller > >>> (although the actual number is not guaranteed to be 0). For backwards > >>> compatibility with tools which expect Thermo ids to be scan numbers > >>> with > >>> an implicit assumption about the controller, it is very reasonable to > >>> require those tools to simply parse the id. Parsing a comma-delimited > >>> pair is far easier than all the other crap one must do to get proper > >>> mzML support. ;) In particular for you Eric and other TPP users, the > >>> RAMP adapter that pwiz uses will pass only the scan number (and make > >>> sure the spectrum is a mass spectrum). > >>> > >>> -Matt > >>> > >>> > >>> Eric Deutsch wrote: > >>> > >>> > >>>> Hi Matt, thanks, this looks well thought out, although I'm not > >>>> sure I > >>>> fully understand the syntax you're proposing. Can you provide one > >>>> or two > >>>> examples of each type? > >>>> > >>>> Thanks! > >>>> Eric > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: psi...@li... > >>>>> > >>>>> > >>>>> > >>>> [mailto:psidev-ms-dev- > >>>> > >>>> > >>>> > >>>>> bo...@li...] On Behalf Of Matthew Chambers > >>>>> Sent: Tuesday, July 22, 2008 3:15 PM > >>>>> To: Mass spectrometry standard development > >>>>> Subject: [Psidev-ms-dev] Nailing down NativeID > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I think it's overdue that we get this part of mzML formally > >>>>> specified > >>>>> > >>>>> > >>>>> > >>>> - > >>>> > >>>> > >>>> > >>>>> at least for the vendors and generic formats. I am proposing a > >>>>> draft > >>>>> > >>>>> > >>>>> > >>>> of > >>>> > >>>> > >>>> > >>>>> nativeID formats, the place to put the formats in the specification > >>>>> documents, and to have mzML instance documents explicitly reference > >>>>> > >>>>> > >>>>> > >>>> the > >>>> > >>>> > >>>> > >>>>> format they are using. This explicit reference should be required > >>>>> for > >>>>> semantic validation, but I'd also recommend that mzML readers that > >>>>> > >>>>> > >>>>> > >>>> don't > >>>> > >>>> > >>>> > >>>>> find or ignore the nativeID format term specified simply treat the > >>>>> nativeID as a free string (rendering it pretty useless, but at > >>>>> least > >>>>> there would be a defined way to handle it). The terms would be > >>>>> placed > >>>>> > >>>>> > >>>>> > >>>> in > >>>> > >>>> > >>>> > >>>>> the fileContent element to define the format for all nativeIDs in > >>>>> the > >>>>> file. > >>>>> > >>>>> I propose that the nativeID formats become CV terms, and that the > >>>>> term > >>>>> definitions define the formats unambiguously in a machine- > >>>>> readable way > >>>>> that a semantic validator can use to validate the nativeIDs. I > >>>>> will > >>>>> list my format drafts in OBO format. Each specific native format > >>>>> definition is a comma-delimited list of key-value pairs, where > >>>>> the key > >>>>> is the axis name (e.g. "scan number") and the value specifies the > >>>>> > >>>>> > >>>>> > >>>> format > >>>> > >>>> > >>>> > >>>>> of the axis in one of two ways: > >>>>> 1) a Perl-style regular expression that can provide semantic/ > >>>>> logical > >>>>> choices for strings (e.g. "controller type" can be either "MS" or > >>>>> > >>>>> > >>>>> > >>>> "PDA" > >>>> > >>>> > >>>> > >>>>> or "UV" etc.) > >>>>> 2) an XSD type that can specify unrestricted strings or a numeric > >>>>> type > >>>>> (possibly with semantic restrictions) > >>>>> > >>>>> I didn't actually need to use a regex for any of the formats below, > >>>>> > >>>>> > >>>>> > >>>> but > >>>> > >>>> > >>>> > >>>>> I can see their usefulness. For example, they would be needed if > >>>>> I'm > >>>>> wrong about Xcalibur and it makes more sense for Thermo spectra > >>>>> to use > >>>>> controller names instead of controller numbers. > >>>>> > >>>>> Obviously the syntax of the format definitions is flexible if > >>>>> people > >>>>> have better ideas (ideally one that could combine the power of > >>>>> regex > >>>>> > >>>>> > >>>>> > >>>> and > >>>> > >>>> > >>>> > >>>>> XSD; "infinite cosmic power, itty bitty living space!"). > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: native spectrum identifier > >>>>> def: "References a spectrum in a native (non-mzML) spectrum source > >>>>> according to a strict format. The format is dependent on the type > >>>>> of > >>>>> > >>>>> > >>>>> > >>>> the > >>>> > >>>> > >>>> > >>>>> spectra source." [PSI:MS] > >>>>> is_a: MS:1000524 ! data file content > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: native chromatogram identifier > >>>>> def: "References a chromatogram in a native (non-mzML) chromatogram > >>>>> source according to a strict format. The format is dependent on the > >>>>> > >>>>> > >>>>> > >>>> type > >>>> > >>>> > >>>> > >>>>> of the chromatogram source." [PSI:MS] > >>>>> is_a: MS:1000524 ! data file content > >>>>> ! note: I don't have any instances of native chromatogram > >>>>> identifiers, > >>>>> but I can conceive of the possibilities! > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Thermo RAW spectrum identifier > >>>>> def: "controller type=xsd:nonNegativeInteger,scan > >>>>> number=xsd:positiveInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note to Jim: apparently, Xcalibur can handle multiple > >>>>> controllers of > >>>>> the same type, so is a choice between strings still appropriate? > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Waters RAW spectrum identifier > >>>>> def: "function number=xsd:positiveInteger,process > >>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" > >>>>> > >>>>> > >>>>> > >>>> [PSI:MS] > >>>> > >>>> > >>>> > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: is process number ever non-zero? > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: WIFF spectrum identifier > >>>>> def: "sample number=xsd:nonNegativeInteger,period > >>>>> number=xsd:positiveInteger,cycle > >>>>> number=xsd:positiveInteger,experiment > >>>>> number=xsd:positiveInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: ABI Oracle database spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: need expertise here; alternatively, we could lump these > >>>>> > >>>>> > >>>>> > >>>> spectra > >>>> > >>>> > >>>> > >>>>> in with DTA/PKL nativeIDs (see below) when they are extracted to > >>>>> T2Ds > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Bruker spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID > >>>>> spectrum > >>>>> > >>>>> > >>>>> > >>>> is > >>>> > >>>> > >>>> > >>>>> natively a single file, so that seems to make nativeID irrelevant > >>>>> and > >>>>> sourceFile[Ref] critical > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: Shimadzu spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: need expertise here > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: MGF spectrum identifier > >>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: TITLE attributes are optional, so the index into the file > >>>>> is > >>>>> > >>>>> > >>>>> > >>>> the > >>>> > >>>> > >>>> > >>>>> only reliable source (TITLE can be used for the string id if > >>>>> present) > >>>>> > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: mzData/mzXML/MS2 spectrum identifier > >>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> [Term] > >>>>> id: MS:x > >>>>> name: PKL/DTA spectrum identifier > >>>>> def: "" [PSI:MS] > >>>>> is_a: MS:x ! native spectrum identifier > >>>>> ! note: like Bruker, a PKL or DTA could be standalone so AFAIK the > >>>>> > >>>>> > >>>>> > >>>> only > >>>> > >>>> > >>>> > >>>>> way to reliably reference it is via sourceFileRef > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > > > |
From: Wilfred H T. <Ta...@ap...> - 2008-09-18 17:03:31
|
Currently, 'id' is not an arbitrary string. Rather, it is designated as an XML-type ID, which means there are significant restrictions on what is allowed. Matt Chambers <mat...@va...> Sent by: psi...@li... 09/18/2008 04:44 PM Please respond to Mass spectrometry standard development <psi...@li...> To Mass spectrometry standard development <psi...@li...> cc Subject Re: [Psidev-ms-dev] Nailing down NativeID I prefer nativeIDs without the labels. Labels work better and can be verbose in the arbitrary string 'id'; nativeID is provided primarily for machine readability and guaranteed formatting so to me it just makes more sense to "KISS" (keep it small and simple). :) Since the two types of ids co-exist, human interpretation of the nativeID is not an issue. This is good discussion though, we just need more of it - even it's a simple assent to the proposal (or the alternatives). :) Thanks, -Matt Darren Kessner wrote: > I think Fredrik has good points, and I like his idea of using short > labels. > > An alternative to consider is 3-4 letter abbreviations (using Matt's > examples): > > Thermo: > "con0 scan1" > "scan2" > > Waters: > "fun1 proc0 scan1" > > WIFF: > "sam0 per1 cyc1 exp2" > > > Darren > > > On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > > >> Hi Matt, >> >> I agree that the Native ID is a very important feature of the format >> and >> that it needs to be settled. Your solution is elegant, I can see two >> disadvantages though: >> 1) It is not straightforward to intepret the nativeID by visual >> inspection, since you need to look in the CV to find out what order >> the >> numbers are in. >> 2) If the number in one axis is unknown or irrelevant for the setup, >> it >> could be a problem to have it as required. One could imagine just >> specifying an empty field instead of a number in that situation >> though. >> >> An alternative is to have reserved characters in the native id: >> S = scan >> F = function >> C = controller >> P = process >> Cy (or maybe Y) = Cycle >> E = Experiment >> Pe = Period >> Other reserved letters can be added as needed. >> >> Then one can specify these as required for the instrumental setup. >> Scan 1 would be "S1" >> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", the later if >> comma separation is wanted. >> If a certain order of the axes is wanted this can be imposed by regex. >> A problem with this solution could be if an axis needs to contain >> letters instead of numbers, but it is doable, at least with comma >> separation. >> >> A combination of the CV approach and initiating letters could maybe >> also >> be an alternative: >> >> [Term] >> id: MS:x >> name: Waters RAW spectrum identifier >> def: "F:function number=xsd:positiveInteger (optional),P:process >> number=xsd:nonNegativeInteger (optional),S:scan >> number=xsd:positiveInteger" >> >> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" >> >> It would be good to have some input on what is required to report >> for the rest of the vendor instruments too, but I think the >> nativeID format should be settled soon. >> >> Fredrik >> >> >> >> >> >> >> >> >> >> >> >> >> Matthew Chambers skrev: >> >>> It's been 4 months since we released the format and we still can't >>> point >>> implementors to documentation specifying what nativeIDs must look >>> like. >>> Can we please comment on my proposal or get other proposals to >>> discuss? >>> I am not averse to initially leaving out the terms that I couldn't >>> come >>> up with well-defined formats for (Bruker, PKL, ABI Oracle, Shimadzu). >>> >>> -Matt >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >>> Date: Tue, 22 Jul 2008 21:28:34 -0500 >>> From: Matt Chambers <mat...@va...> >>> Reply-To: Mass spectrometry standard development >>> <psi...@li...> >>> To: Mass spectrometry standard development >>> <psi...@li...> >>> References: <488...@va...> >>> <5BE...@he...> >>> >>> >>> >>> Hi Eric, >>> >>> Of course, sorry I should have realized that the axis name concept >>> would >>> confuse matters. The axis names are just there so that a machine >>> reading >>> the format specification can associate each comma delimited section >>> (what I'm calling an "axis") with a logical name. >>> >>> Thermo: >>> 0,1 (controller 0, scan 1) >>> 0,2 >>> 0,3 >>> 1,1 (controller 1, scan 1) >>> >>> Waters: >>> 1,0,1 (function 1, process 0, scan 1) >>> 1,0,2 >>> 1,0,3 >>> 2,0,1 (function 2, process 0, scan 1) >>> 2,0,2 >>> 2,0,3 >>> >>> WIFF: >>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) >>> 0,1,1,3 >>> 0,1,2,2 >>> 0,1,2,3 >>> 0,1,2,4 >>> 0,1,3,2 >>> 0,1,3,3 >>> 0,1,3,2 >>> 0,1,4,2 >>> 1,1,1,2 >>> 1,1,1,3 >>> >>> When a machine reads the WIFF definition, it will know that the >>> fields >>> mean (in order) "sample #", "period #", "cycle #", "experiment #". >>> The >>> detailed meaning of those names won't be covered by the format >>> definition, but it's conceivable that we define those names in >>> detail as >>> separate CV terms. Remember the main idea for nativeID is to map a >>> spectrum back to a source file in a way that is more intuitive than a >>> simple index, so being able to use them to look up the spectrum via a >>> native interface is important. >>> >>> I think we can safely require that the nativeIDs always use all the >>> fields even if for an entire run all of a particular axis has the >>> same >>> value. For example, in Thermo data the controller number is almost >>> always going to be the number corresponding with the MS controller >>> (although the actual number is not guaranteed to be 0). For backwards >>> compatibility with tools which expect Thermo ids to be scan numbers >>> with >>> an implicit assumption about the controller, it is very reasonable to >>> require those tools to simply parse the id. Parsing a comma-delimited >>> pair is far easier than all the other crap one must do to get proper >>> mzML support. ;) In particular for you Eric and other TPP users, the >>> RAMP adapter that pwiz uses will pass only the scan number (and make >>> sure the spectrum is a mass spectrum). >>> >>> -Matt >>> >>> >>> Eric Deutsch wrote: >>> >>> >>>> Hi Matt, thanks, this looks well thought out, although I'm not >>>> sure I >>>> fully understand the syntax you're proposing. Can you provide one >>>> or two >>>> examples of each type? >>>> >>>> Thanks! >>>> Eric >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: psi...@li... >>>>> >>>>> >>>>> >>>> [mailto:psidev-ms-dev- >>>> >>>> >>>> >>>>> bo...@li...] On Behalf Of Matthew Chambers >>>>> Sent: Tuesday, July 22, 2008 3:15 PM >>>>> To: Mass spectrometry standard development >>>>> Subject: [Psidev-ms-dev] Nailing down NativeID >>>>> >>>>> Hi all, >>>>> >>>>> I think it's overdue that we get this part of mzML formally >>>>> specified >>>>> >>>>> >>>>> >>>> - >>>> >>>> >>>> >>>>> at least for the vendors and generic formats. I am proposing a >>>>> draft >>>>> >>>>> >>>>> >>>> of >>>> >>>> >>>> >>>>> nativeID formats, the place to put the formats in the specification >>>>> documents, and to have mzML instance documents explicitly reference >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> format they are using. This explicit reference should be required >>>>> for >>>>> semantic validation, but I'd also recommend that mzML readers that >>>>> >>>>> >>>>> >>>> don't >>>> >>>> >>>> >>>>> find or ignore the nativeID format term specified simply treat the >>>>> nativeID as a free string (rendering it pretty useless, but at >>>>> least >>>>> there would be a defined way to handle it). The terms would be >>>>> placed >>>>> >>>>> >>>>> >>>> in >>>> >>>> >>>> >>>>> the fileContent element to define the format for all nativeIDs in >>>>> the >>>>> file. >>>>> >>>>> I propose that the nativeID formats become CV terms, and that the >>>>> term >>>>> definitions define the formats unambiguously in a machine- >>>>> readable way >>>>> that a semantic validator can use to validate the nativeIDs. I >>>>> will >>>>> list my format drafts in OBO format. Each specific native format >>>>> definition is a comma-delimited list of key-value pairs, where >>>>> the key >>>>> is the axis name (e.g. "scan number") and the value specifies the >>>>> >>>>> >>>>> >>>> format >>>> >>>> >>>> >>>>> of the axis in one of two ways: >>>>> 1) a Perl-style regular expression that can provide semantic/ >>>>> logical >>>>> choices for strings (e.g. "controller type" can be either "MS" or >>>>> >>>>> >>>>> >>>> "PDA" >>>> >>>> >>>> >>>>> or "UV" etc.) >>>>> 2) an XSD type that can specify unrestricted strings or a numeric >>>>> type >>>>> (possibly with semantic restrictions) >>>>> >>>>> I didn't actually need to use a regex for any of the formats below, >>>>> >>>>> >>>>> >>>> but >>>> >>>> >>>> >>>>> I can see their usefulness. For example, they would be needed if >>>>> I'm >>>>> wrong about Xcalibur and it makes more sense for Thermo spectra >>>>> to use >>>>> controller names instead of controller numbers. >>>>> >>>>> Obviously the syntax of the format definitions is flexible if >>>>> people >>>>> have better ideas (ideally one that could combine the power of >>>>> regex >>>>> >>>>> >>>>> >>>> and >>>> >>>> >>>> >>>>> XSD; "infinite cosmic power, itty bitty living space!"). >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: native spectrum identifier >>>>> def: "References a spectrum in a native (non-mzML) spectrum source >>>>> according to a strict format. The format is dependent on the type >>>>> of >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> spectra source." [PSI:MS] >>>>> is_a: MS:1000524 ! data file content >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: native chromatogram identifier >>>>> def: "References a chromatogram in a native (non-mzML) chromatogram >>>>> source according to a strict format. The format is dependent on the >>>>> >>>>> >>>>> >>>> type >>>> >>>> >>>> >>>>> of the chromatogram source." [PSI:MS] >>>>> is_a: MS:1000524 ! data file content >>>>> ! note: I don't have any instances of native chromatogram >>>>> identifiers, >>>>> but I can conceive of the possibilities! >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Thermo RAW spectrum identifier >>>>> def: "controller type=xsd:nonNegativeInteger,scan >>>>> number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note to Jim: apparently, Xcalibur can handle multiple >>>>> controllers of >>>>> the same type, so is a choice between strings still appropriate? >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Waters RAW spectrum identifier >>>>> def: "function number=xsd:positiveInteger,process >>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" >>>>> >>>>> >>>>> >>>> [PSI:MS] >>>> >>>> >>>> >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: is process number ever non-zero? >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: WIFF spectrum identifier >>>>> def: "sample number=xsd:nonNegativeInteger,period >>>>> number=xsd:positiveInteger,cycle >>>>> number=xsd:positiveInteger,experiment >>>>> number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> [Term] >>>>> id: MS:x >>>>> name: ABI Oracle database spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here; alternatively, we could lump these >>>>> >>>>> >>>>> >>>> spectra >>>> >>>> >>>> >>>>> in with DTA/PKL nativeIDs (see below) when they are extracted to >>>>> T2Ds >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Bruker spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID >>>>> spectrum >>>>> >>>>> >>>>> >>>> is >>>> >>>> >>>> >>>>> natively a single file, so that seems to make nativeID irrelevant >>>>> and >>>>> sourceFile[Ref] critical >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Shimadzu spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: MGF spectrum identifier >>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: TITLE attributes are optional, so the index into the file >>>>> is >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> only reliable source (TITLE can be used for the string id if >>>>> present) >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: mzData/mzXML/MS2 spectrum identifier >>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> [Term] >>>>> id: MS:x >>>>> name: PKL/DTA spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: like Bruker, a PKL or DTA could be standalone so AFAIK the >>>>> >>>>> >>>>> >>>> only >>>> >>>> >>>> >>>>> way to reliably reference it is via sourceFileRef >>>>> >>>>> >>>>> >>>>> >>>>> > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matt C. <mat...@va...> - 2008-09-18 16:44:42
|
I prefer nativeIDs without the labels. Labels work better and can be verbose in the arbitrary string 'id'; nativeID is provided primarily for machine readability and guaranteed formatting so to me it just makes more sense to "KISS" (keep it small and simple). :) Since the two types of ids co-exist, human interpretation of the nativeID is not an issue. This is good discussion though, we just need more of it - even it's a simple assent to the proposal (or the alternatives). :) Thanks, -Matt Darren Kessner wrote: > I think Fredrik has good points, and I like his idea of using short > labels. > > An alternative to consider is 3-4 letter abbreviations (using Matt's > examples): > > Thermo: > "con0 scan1" > "scan2" > > Waters: > "fun1 proc0 scan1" > > WIFF: > "sam0 per1 cyc1 exp2" > > > Darren > > > On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > > >> Hi Matt, >> >> I agree that the Native ID is a very important feature of the format >> and >> that it needs to be settled. Your solution is elegant, I can see two >> disadvantages though: >> 1) It is not straightforward to intepret the nativeID by visual >> inspection, since you need to look in the CV to find out what order >> the >> numbers are in. >> 2) If the number in one axis is unknown or irrelevant for the setup, >> it >> could be a problem to have it as required. One could imagine just >> specifying an empty field instead of a number in that situation >> though. >> >> An alternative is to have reserved characters in the native id: >> S = scan >> F = function >> C = controller >> P = process >> Cy (or maybe Y) = Cycle >> E = Experiment >> Pe = Period >> Other reserved letters can be added as needed. >> >> Then one can specify these as required for the instrumental setup. >> Scan 1 would be "S1" >> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", the later if >> comma separation is wanted. >> If a certain order of the axes is wanted this can be imposed by regex. >> A problem with this solution could be if an axis needs to contain >> letters instead of numbers, but it is doable, at least with comma >> separation. >> >> A combination of the CV approach and initiating letters could maybe >> also >> be an alternative: >> >> [Term] >> id: MS:x >> name: Waters RAW spectrum identifier >> def: "F:function number=xsd:positiveInteger (optional),P:process >> number=xsd:nonNegativeInteger (optional),S:scan >> number=xsd:positiveInteger" >> >> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" >> >> It would be good to have some input on what is required to report >> for the rest of the vendor instruments too, but I think the >> nativeID format should be settled soon. >> >> Fredrik >> >> >> >> >> >> >> >> >> >> >> >> >> Matthew Chambers skrev: >> >>> It's been 4 months since we released the format and we still can't >>> point >>> implementors to documentation specifying what nativeIDs must look >>> like. >>> Can we please comment on my proposal or get other proposals to >>> discuss? >>> I am not averse to initially leaving out the terms that I couldn't >>> come >>> up with well-defined formats for (Bruker, PKL, ABI Oracle, Shimadzu). >>> >>> -Matt >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >>> Date: Tue, 22 Jul 2008 21:28:34 -0500 >>> From: Matt Chambers <mat...@va...> >>> Reply-To: Mass spectrometry standard development >>> <psi...@li...> >>> To: Mass spectrometry standard development >>> <psi...@li...> >>> References: <488...@va...> >>> <5BE...@he...> >>> >>> >>> >>> Hi Eric, >>> >>> Of course, sorry I should have realized that the axis name concept >>> would >>> confuse matters. The axis names are just there so that a machine >>> reading >>> the format specification can associate each comma delimited section >>> (what I'm calling an "axis") with a logical name. >>> >>> Thermo: >>> 0,1 (controller 0, scan 1) >>> 0,2 >>> 0,3 >>> 1,1 (controller 1, scan 1) >>> >>> Waters: >>> 1,0,1 (function 1, process 0, scan 1) >>> 1,0,2 >>> 1,0,3 >>> 2,0,1 (function 2, process 0, scan 1) >>> 2,0,2 >>> 2,0,3 >>> >>> WIFF: >>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) >>> 0,1,1,3 >>> 0,1,2,2 >>> 0,1,2,3 >>> 0,1,2,4 >>> 0,1,3,2 >>> 0,1,3,3 >>> 0,1,3,2 >>> 0,1,4,2 >>> 1,1,1,2 >>> 1,1,1,3 >>> >>> When a machine reads the WIFF definition, it will know that the >>> fields >>> mean (in order) "sample #", "period #", "cycle #", "experiment #". >>> The >>> detailed meaning of those names won't be covered by the format >>> definition, but it's conceivable that we define those names in >>> detail as >>> separate CV terms. Remember the main idea for nativeID is to map a >>> spectrum back to a source file in a way that is more intuitive than a >>> simple index, so being able to use them to look up the spectrum via a >>> native interface is important. >>> >>> I think we can safely require that the nativeIDs always use all the >>> fields even if for an entire run all of a particular axis has the >>> same >>> value. For example, in Thermo data the controller number is almost >>> always going to be the number corresponding with the MS controller >>> (although the actual number is not guaranteed to be 0). For backwards >>> compatibility with tools which expect Thermo ids to be scan numbers >>> with >>> an implicit assumption about the controller, it is very reasonable to >>> require those tools to simply parse the id. Parsing a comma-delimited >>> pair is far easier than all the other crap one must do to get proper >>> mzML support. ;) In particular for you Eric and other TPP users, the >>> RAMP adapter that pwiz uses will pass only the scan number (and make >>> sure the spectrum is a mass spectrum). >>> >>> -Matt >>> >>> >>> Eric Deutsch wrote: >>> >>> >>>> Hi Matt, thanks, this looks well thought out, although I'm not >>>> sure I >>>> fully understand the syntax you're proposing. Can you provide one >>>> or two >>>> examples of each type? >>>> >>>> Thanks! >>>> Eric >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: psi...@li... >>>>> >>>>> >>>>> >>>> [mailto:psidev-ms-dev- >>>> >>>> >>>> >>>>> bo...@li...] On Behalf Of Matthew Chambers >>>>> Sent: Tuesday, July 22, 2008 3:15 PM >>>>> To: Mass spectrometry standard development >>>>> Subject: [Psidev-ms-dev] Nailing down NativeID >>>>> >>>>> Hi all, >>>>> >>>>> I think it's overdue that we get this part of mzML formally >>>>> specified >>>>> >>>>> >>>>> >>>> - >>>> >>>> >>>> >>>>> at least for the vendors and generic formats. I am proposing a >>>>> draft >>>>> >>>>> >>>>> >>>> of >>>> >>>> >>>> >>>>> nativeID formats, the place to put the formats in the specification >>>>> documents, and to have mzML instance documents explicitly reference >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> format they are using. This explicit reference should be required >>>>> for >>>>> semantic validation, but I'd also recommend that mzML readers that >>>>> >>>>> >>>>> >>>> don't >>>> >>>> >>>> >>>>> find or ignore the nativeID format term specified simply treat the >>>>> nativeID as a free string (rendering it pretty useless, but at >>>>> least >>>>> there would be a defined way to handle it). The terms would be >>>>> placed >>>>> >>>>> >>>>> >>>> in >>>> >>>> >>>> >>>>> the fileContent element to define the format for all nativeIDs in >>>>> the >>>>> file. >>>>> >>>>> I propose that the nativeID formats become CV terms, and that the >>>>> term >>>>> definitions define the formats unambiguously in a machine- >>>>> readable way >>>>> that a semantic validator can use to validate the nativeIDs. I >>>>> will >>>>> list my format drafts in OBO format. Each specific native format >>>>> definition is a comma-delimited list of key-value pairs, where >>>>> the key >>>>> is the axis name (e.g. "scan number") and the value specifies the >>>>> >>>>> >>>>> >>>> format >>>> >>>> >>>> >>>>> of the axis in one of two ways: >>>>> 1) a Perl-style regular expression that can provide semantic/ >>>>> logical >>>>> choices for strings (e.g. "controller type" can be either "MS" or >>>>> >>>>> >>>>> >>>> "PDA" >>>> >>>> >>>> >>>>> or "UV" etc.) >>>>> 2) an XSD type that can specify unrestricted strings or a numeric >>>>> type >>>>> (possibly with semantic restrictions) >>>>> >>>>> I didn't actually need to use a regex for any of the formats below, >>>>> >>>>> >>>>> >>>> but >>>> >>>> >>>> >>>>> I can see their usefulness. For example, they would be needed if >>>>> I'm >>>>> wrong about Xcalibur and it makes more sense for Thermo spectra >>>>> to use >>>>> controller names instead of controller numbers. >>>>> >>>>> Obviously the syntax of the format definitions is flexible if >>>>> people >>>>> have better ideas (ideally one that could combine the power of >>>>> regex >>>>> >>>>> >>>>> >>>> and >>>> >>>> >>>> >>>>> XSD; "infinite cosmic power, itty bitty living space!"). >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: native spectrum identifier >>>>> def: "References a spectrum in a native (non-mzML) spectrum source >>>>> according to a strict format. The format is dependent on the type >>>>> of >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> spectra source." [PSI:MS] >>>>> is_a: MS:1000524 ! data file content >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: native chromatogram identifier >>>>> def: "References a chromatogram in a native (non-mzML) chromatogram >>>>> source according to a strict format. The format is dependent on the >>>>> >>>>> >>>>> >>>> type >>>> >>>> >>>> >>>>> of the chromatogram source." [PSI:MS] >>>>> is_a: MS:1000524 ! data file content >>>>> ! note: I don't have any instances of native chromatogram >>>>> identifiers, >>>>> but I can conceive of the possibilities! >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Thermo RAW spectrum identifier >>>>> def: "controller type=xsd:nonNegativeInteger,scan >>>>> number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note to Jim: apparently, Xcalibur can handle multiple >>>>> controllers of >>>>> the same type, so is a choice between strings still appropriate? >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Waters RAW spectrum identifier >>>>> def: "function number=xsd:positiveInteger,process >>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" >>>>> >>>>> >>>>> >>>> [PSI:MS] >>>> >>>> >>>> >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: is process number ever non-zero? >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: WIFF spectrum identifier >>>>> def: "sample number=xsd:nonNegativeInteger,period >>>>> number=xsd:positiveInteger,cycle >>>>> number=xsd:positiveInteger,experiment >>>>> number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> [Term] >>>>> id: MS:x >>>>> name: ABI Oracle database spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here; alternatively, we could lump these >>>>> >>>>> >>>>> >>>> spectra >>>> >>>> >>>> >>>>> in with DTA/PKL nativeIDs (see below) when they are extracted to >>>>> T2Ds >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Bruker spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID >>>>> spectrum >>>>> >>>>> >>>>> >>>> is >>>> >>>> >>>> >>>>> natively a single file, so that seems to make nativeID irrelevant >>>>> and >>>>> sourceFile[Ref] critical >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Shimadzu spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: MGF spectrum identifier >>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: TITLE attributes are optional, so the index into the file >>>>> is >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> only reliable source (TITLE can be used for the string id if >>>>> present) >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: mzData/mzXML/MS2 spectrum identifier >>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> [Term] >>>>> id: MS:x >>>>> name: PKL/DTA spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: like Bruker, a PKL or DTA could be standalone so AFAIK the >>>>> >>>>> >>>>> >>>> only >>>> >>>> >>>> >>>>> way to reliably reference it is via sourceFileRef >>>>> >>>>> >>>>> >>>>> >>>>> > |