You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Randy J. <rkj...@in...> - 2008-02-20 14:02:36
|
Perhaps this has been already raised, but if we have a sampleList and an unbounded number of samples, and an instrumentList and an unbounded number of instrument descriptions, then we have most of what we need to accommodate the use case for multiple samples in a single file. I know this was discussed and dropped, but it is a common mode for SciEx instruments. If we made the spectrumList unbounded, then the concept of ID would still be a unique identifier within the file, and the index would represent sequence within the spectrumList (and not have to be unique). SciEx instruments don't just collect multiple samples into one file, they also represent different "experiments" (meaning different acquisition parameter settings) as separate spectrumList-like things within the WIFF file. SciEx still calls a multi-injection acquisition a single "run" so there would be no need to change the <run> element, except to add a count for the number of spectrumList elements. Also, since (as in the SciEx case), these separate lists are not always different injections, I don't think we need an additional list container between run and spectrumList. If we don't want to tackle this now, then the only argument for both ID and index is that ID does not have to be numeric (just unique) and index must be numeric. What was the rationale behind multiplicity on sample if there is a single spectrumList? Was it to allow multiple samples to be represented? Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Rune Schjellerup Philosof Sent: Wednesday, February 20, 2008 7:37 AM To: Mass spectrometry standard development Subject: [Psidev-ms-dev] ScanNumber/index and id I don't see the point in having two unique keys for a spectrum. One ought to be enough. What is the purpose of scanNumber and of id? -- Rune ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Fredrik L. <Fre...@im...> - 2008-02-20 13:57:09
|
Hi All, I may be completely wrong here, but I think that the selectionWindowList / selectionWindow elements are a bit misleading when looking at the usage in the example files. It seems like they are used to define scan ranges, so scanWindow might be a better name. For me the selection window is the m/z-window that is let through the quadrupole, which means that is the same as the precursor ion selection window. In some experiments you would let a large mass range through the quadrupole instead of the typical few mass units, so there it would be nice to be able to define the ionSelection as a window (instead of single m/z value), as I think has been requested by Rune already. Isn't that what 'selectionWindow' should be used for? Second question: Are there any mzML precursor ion scan examples? I don't get how to define that the first quadrupole is scanning and the last one is set in a triple quadrupole instrument. Maybe if the spectrum type could be set to Parent scan spectrum / Precursor scan spectrum (is there a CV term, I cannot find one), and that the scanWindow / selectionWindow is used to define the scan range of Q1, but where to put the m/z in Q3? Would that be a second selectionWindow/scanWindow that defines Q3? In this case, how to know which selectionWindow is Q1 scan range and Q3 respectively? Or would the Q3 m/z be put under precursor ionSelection (even if it in this case is a product) just to distinguish the two? Can anyone sort this out for me? Fredrik |
From: Angel P. <an...@ma...> - 2008-02-20 13:26:18
|
On Wed, Feb 20, 2008 at 6:42 AM, Randy Julian <rkj...@in...> wrote: > Does the xs:ID constraint solve this problem? > maybe since I think we had agreed that mzML REFs are within-file only ? > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Matt > Chambers > Sent: Tuesday, February 19, 2008 10:34 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Relative URIs and RFC-2396 > > After a little more thought, absolute instances of xs:anyURI will not > always work as a fragment identifier. If a spectrum's id attribute was > xs:anyURI in file "foo.mzML": > <spectrum id="file://foo.1.1.1.dta" /> <!-- this is a valid (absolute) > xs:anyURI --> > > And in something like pepXML or analysisXML: > <spectrumQuery spectrumRef="file://foo.mzML#file://foo.1.1.1.dta" /> > <!-- not a valid xs:anyURI! --> > > Unless I'm missing something, using xs:anyURI for fragment identifiers > would actually make the schema less safe. Valid mzML ids would be > potentially unusable in an external URI unless URL-encoded, which would > defeat the point of using xs:anyURI in the first place. > > -Matt > > > Randy Julian wrote: > > All we are trying to achieve with the anyURI is safety for use within > a > > URI. Any xlink-safe way of doing this will work. So if xs:ID is > > supported better by the validating parsers, it would do what we want. > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On Behalf Of > > Matthew Chambers > > Sent: Tuesday, February 19, 2008 4:25 PM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] Relative URIs and RFC-2396 > > > > You are right Randy, we were forgetting about relative URIs which can > > simply refer to a resource's name with no path at all ("1" is > certainly > > a valid resource name). However, I think anyURI is still a bad idea > for > > any attribute which is not intended to be able to refer to something > in > > a remote location (e.g. not in the current file). The "id" attribute > in > > the XML namespace has type "xs:ID" which has semantics more along the > > lines of what I think you want. If I understand the use case > correctly, > > it is desirable to be able to link to certain mzML elements from > > external documents with a URI, like: > > file://data_source.mzML#s555 > > This is an example absolute URI reference to a spectrum in a file at > > "data_source.mzML" where the spectrum's id attribute is "s555". It > > wouldn't make sense for the id itself to be a URI, although the > > reference to it can (and should) be. > > > > So: > > 1) for id attributes which can be referred to externally or > internally, > > use the type "xs:ID" > > 2) for references to external or internal resources by their id > > attribute, use the type "xs:anyURI" > > > > This would have the problem of the Xerxes C parser not validating > > relative URIs correctly, but that seems to be wrong on their part. :/ > > Anyway, users of Xerxes C can turn off the validation feature to work > > around it. > > > > Also, Ref attributes in mzML could use anyURI for consistency reasons > > even though we don't currently know of a use case where such > references > > would be made to an external file. > > > > -Matt > > > > > > Randy Julian wrote: > > > >> Per our conversation today, the relevant specification is RFC-2396: > >> > >> http://www.ietf.org/rfc/rfc2396.txt > >> > >> Section 5 talks about relative URIs. They do not need to include the > >> protocol and their syntax would include all integers: > >> > >> The syntax for relative URI takes advantage of the <hier_part> syntax > >> of <absoluteURI> (Section 3) in order to express a reference that > >> > > is > > > >> relative to the namespace of another hierarchical URI. > >> > >> relativeURI = ( net_path | abs_path | rel_path ) [ "?" query > ] > >> > >> A relative reference beginning with two slash characters is termed > >> > > a > > > >> network-path reference, as defined by <net_path> in Section 3. > >> > > Such > > > >> references are rarely used. > >> > >> A relative reference beginning with a single slash character is > >> termed an absolute-path reference, as defined by <abs_path> in > >> Section 3. > >> > >> A relative reference that does not begin with a scheme name or a > >> slash character is termed a relative-path reference. > >> > >> rel_path = rel_segment [ abs_path ] > >> > >> rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | > "=" > >> > > | "+" | "$" | "," ) > > > >> That means that you don't need the net_path part or the abs_path part > > >> but can use the rel_path part alone. The rel_path part can have only > >> the rel_segment part which is required to have one or more unreserved > > >> characters (includes all the integers) and/or any of the above > special > >> > > > > > >> characters or escaped characters. > >> > >> The point of using it in mzML for IDs is that you can be assured of > it > >> > > > > > >> being a valid relative path when extended by all the other components > > >> needed to navigate to a referenced document (protocol, absolute path, > > >> etc.). > >> > >> We can achieve this by convention by saying in the mzML spec doc (and > > >> possibly putting the required pattern in the schema), that the string > > >> for ID must conform to RFC-2396. > >> > >> Randy > >> > >> Randall K Julian, Jr. Ph.D. > >> President > >> Indigo BioSystems, Inc. > >> (317) 536-2736 x101 > >> (317) 306-5447 mobile > >> > >> www.indigobio.com <http://www.indigobio.com/> > >> > > > > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Rune S. P. <ru...@ph...> - 2008-02-20 12:37:42
|
I don't see the point in having two unique keys for a spectrum. One ought to be enough. What is the purpose of scanNumber and of id? -- Rune |
From: Fredrik L. <Fre...@im...> - 2008-02-20 12:28:42
|
I like this proposal. This way there is only one place to look for scan numbers, and that will be in the the acquisitionList. This will however mean that mzML files with unprocessed data, like the massWolf output will have to add an acquisitionList with one acquisition element for every spectrum (scan), and that the description of acquisitionType will have to be edited to reflect that it is also usable for spectra (and not just peak lists). It makes sense to rename the current scanNumber to "index", which is what it would be. And yes, acqNumber should be acquisitionNumber, or maybe even better to just 'number' (like in the sourceFile attribute 'sourceFileName' which will change to 'name'). I've uploaded an edited peak list mzML file which has some of these changes, as an example for discussion: http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/FF_070504_MSMS_5B_edited.mzML At row 110-126 there is also an experiment with different ways to refer to the source for scans, including external referencing using URI. Fredrik Randy Julian wrote: > Based on today's discussion, I think the goal people had for scanNumber > is better achieved using the acqNumber element. Actually, what people > seemed to want was a place to put the Thermo-specific 'scan number' > which allows you to go back to a raw file and see the scan in the vendor > file. There is no reason why you couldn't put the scan number from > Thermo as an 'index' (the acqNumber element would still allow a more > accurate representation of the source of the spectrum, since it can > handle the summation or averaging of spectra), and having a > 'positiveInteger' for this makes great sense. > > The value of using a scan number as an index comes from the problem of > trying to order spectra from non-LC experiments where acquisition time > is meaningless. Thermo's 'scan number' works great for this, and we > need something like this for the other vendor formats too. > > We should have more discussion on this, but it will probably help to > have some examples - which should be available from the group soon. > > Randy > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of > Matthew Chambers > Sent: Tuesday, February 19, 2008 3:54 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > There is no requirement that an ascending lexicographical sort would > produce the same order that a sort by ascending scanNumber would (that > would be far too restrictive), but there is a requirement that the > spectrum elements in the file be stored in ascending order by > scanNumber. > > We had quite a bit of discussion in the teleconference today about > removing scanNumber as a primary key and replacing it with an index > attribute; such an attribute would probably have different semantics > (0-based and contiguous). I think Eric is probably preparing to post a > good summary of the discussion. > > My proposal for positiveInteger for scanNumber was accepted but then > rendered moot by the proposal to get rid of the scanNumber attribute. ;) > > "positiveInteger" has no schematic upper limit, so perhaps if we switch > to an index attribute we should make it "unsignedLong" (schematically > defined as a 64-bit unsigned integer). > > -Matt > > > Coleman, Michael wrote: > >> Is there a requirement that scanNumber and id be "co-ordered" (if I >> > sort > >> a file's spectra by one of these keys, the other key will then >> necessarily also be sorted)? >> >> Is there a requirement that the spectra in all mzML files be ordered >> > by > >> one or both of these keys? >> >> If scanNumber is being used for ordering in one of these ways, I agree >> that lexicographic ordering should be specified. If not, I'm >> > wondering > >> whether there is any other reason for specifying the ordering. >> >> >> If scanNumbers were specified to be contiguous, I'd say we ought to >> allow 0 as a scan number, since essentially all modern programming >> languages use 0-based arrays. But if I understand correctly, >> scanNumbers need not be contiguous (and thus programmers should not >> assume that they can be directly used for array indexing). >> >> Are scan numbers up to at least 2**62 or so allowed, to prepare for >> > the > >> coming ten-billion-spectrum runs? :-) >> >> >> Mike >> >> >> >> >> >>> -----Original Message----- >>> From: psi...@li... >>> [mailto:psi...@li...] On >>> Behalf Of Matthew Chambers >>> Sent: Tuesday, February 19, 2008 10:07 AM >>> To: Mass spectrometry standard development >>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>> >>> >>> Hi Michael, >>> >>> As it currently stands, both scanNumber and id are unique keys to a >>> spectrum - they need not be combined to create a unique key. Id is a >>> string and as such should be compared on a lexicographical basis (if >>> that isn't stated in the spec, it should be), and scanNumber >>> is an integer: >>> <xs:attribute name="scanNumber" type="xs:int" use="required"> >>> >>> By the way, I think we should change that type to be >>> xs:positiveInteger >>> so that the range is schematically limited to [1-infinity). 0 >>> shouldn't >>> be a valid scan number (if 0 is allowed then Michael's point >>> about the >>> -0 and 0 issue should be addressed, although that might be >>> done by the >>> XML Schema specification). >>> >>> -Matt >>> >>> >>> Coleman, Michael wrote: >>> >>> >>>> I don't understand the issues involved in this particular >>>> >>>> >>> question, but >>> >>> >>>> it reminds me of this key requirement: >>>> >>>> - There has to be a way of generating a unique key for each spectrum >>>> (i.e., unique across all spectra in the file) that will work for all >>>> mzML files. >>>> >>>> In the example below, it looks like that key is the 2-tuple "(id, >>>> scanNumber)". (Whatever the key is, it should be specified >>>> >>>> >>> as such in >>> >>> >>>> the standard.) >>>> >>>> >>>> If the key includes any numeric fields, it needs to be >>>> >>>> >>> specified whether >>> >>> >>>> or not (say) "0010" is equal to "10", whether or not "1.0" >>>> >>>> >>> is equal to >>> >>> >>>> "1", and whether or not "-0" is equal to "0". Hopefully >>>> >>>> >>> either (a) the >>> >>> >>>> former is simply disallowed in all of these cases or (b) >>>> >>>> >>> all fields are >>> >>> >>>> to be treated as strings, rather than numbers, and >>>> >>>> >>> comparison done on >>> >>> >>>> that basis. >>>> >>>> Mike >>>> >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: psi...@li... >>>>> [mailto:psi...@li...] On >>>>> Behalf Of Matthew Chambers >>>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>>> To: Mass spectrometry standard development >>>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>>> >>>>> >>>>> How do you feel about generating arbitrary unique scan >>>>> numbers and then >>>>> using the id attribute to preserve the original filename and >>>>> scan number: >>>>> <spectrum id="function1.1" scanNumber="1" ...> >>>>> <spectrum id="function1.2" scanNumber="2" ...> >>>>> ... >>>>> <spectrum id="function2.1" scanNumber="100" ...> >>>>> <spectrum id="function2.2" scanNumber="101" ...> >>>>> ... >>>>> >>>>> Or probably more intuitive would be to store the parallel spectra >>>>> sequentially (assuming that the same scan number from each >>>>> function is >>>>> correlated): >>>>> <spectrum id="function1.1" scanNumber="1" ...> >>>>> <spectrum id="function2.1" scanNumber="2" ...> >>>>> ... >>>>> <spectrum id="function1.2" scanNumber="100" ...> >>>>> <spectrum id="function2.2" scanNumber="101" ...> >>>>> ... >>>>> >>>>> It's either that or store each function in a separate mzML >>>>> file, because >>>>> mzML doesn't support multiple runs in the same file. >>>>> >>>>> -Matt >>>>> >>>>> >>>>> Fredrik Levander wrote: >>>>> >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>>> >>>>>> >>>>>> >>>>> have several >>>>> >>>>> >>>>> >>>>>> parallel 'functions' with data being recorded into separate >>>>>> >>>>>> >>>>>> >>>>> files. The >>>>> >>>>> >>>>> >>>>>> scan numbers are only unique within each function. In the >>>>>> >>>>>> >>> raw data >>> >>> >>>>>> folder we thus have several different spectra with the same >>>>>> >>>>>> >>>>>> >>>>> scan number >>>>> >>>>> >>>>> >>>>>> (but different source files). When converting this into an >>>>>> >>>>>> >>>>>> >>>>> mzML file it >>>>> >>>>> >>>>> >>>>>> would be good to keep the original scan numbers which are >>>>>> >>>>>> >>>>>> >>>>> useful for >>>>> >>>>> >>>>> >>>>>> traceability, but to generate unique spectrum ids. I thus >>>>>> >>>>>> >>>>>> >>>>> propose that >>>>> >>>>> >>>>> >>>>>> the requirement for unique scanNumbers within an mzML file >>>>>> >>>>>> >>>>>> >>>>> is removed. >>>>> >>>>> >>>>> >>>>>> However, spectra should not be repeated within the file, so >>>>>> >>>>>> >>>>>> >>>>> this would >>>>> >>>>> >>>>> >>>>>> NOT be applicable to the dta to mzML conversion use case. >>>>>> Would such a change generate problems for the readers? >>>>>> How is this solved in MassWolf? >>>>>> >>>>>> >>>>>> Regards >>>>>> >>>>>> Fredrik >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ----------- >>>>> >>>>> >>>>> >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> Psidev-ms-dev mailing list >>>>>> Psi...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ----------- >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ----------- >>> >>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ----------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> >> > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Lennart M. <len...@gm...> - 2008-02-20 12:14:37
|
Dear PSI-MS Enthousiasts, A new snapshot version that should incorporate the things we disussed during our telephone conference yesterday (Tuesday, 19 February 2008) and more or less agreed on so far. Changelog: http://www.ebi.ac.uk/~lmartens/mzML/20080220_changelog.txt Schema link: http://www.ebi.ac.uk/~lmartens/mzML/20080220_mzML0.99.9_SNAPSHOT.xsd Example instance doc: http://www.ebi.ac.uk/~lmartens/mzML/20080220_example_mzML0.99.9_SNAPSHOT.mzML Enjoy :). Cheers, lnnrt. |
From: Randy J. <rkj...@in...> - 2008-02-20 12:03:31
|
It's been a while, but early in some of the MS discussions, we made a move to expand all the abbreviated names of elements for better clarity. There is now only one abbreviation in mzML: acqNumber (attribute in acquisitionType) Since this is where I propose putting the actual 'scanNumber' (which is really an "acquisition" for TOF and other non-scanning instruments like FT), from the vendor file, can we change this to the non-abbreviated form? acquisitionNumber This change would (I think) eliminated all of the abbreviations from element and attribute names. Thoughts? Randy Randall K Julian, Jr. Ph.D. President Indigo BioSystems, Inc. (317) 536-2736 x101 (317) 306-5447 mobile www.indigobio.com <http://www.indigobio.com/> NOTICE: This message may contain confidential or privileged information that is for the sole use of the intended recipient. Any unauthorized review, use, disclosure, copying or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. |
From: Randy J. <rkj...@in...> - 2008-02-20 11:47:24
|
This is why I thought placing the non-unique scan number in the acquisition description section and change the meaning of the 'scanNumber' attribute since it cannot be correctly used with all instrument brands. Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Rune Schjellerup Philosof Sent: Wednesday, February 20, 2008 2:32 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Unique scan numbers Fredrik Levander wrote: > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > How is this solved in MassWolf? > The information is not saved. New scan numbers are assigned. That is, you have to use the time (and precursor with ms^2) information in order to locate the original scan. -- Regards Rune ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Randy J. <rkj...@in...> - 2008-02-20 11:42:51
|
Does the xs:ID constraint solve this problem? -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matt Chambers Sent: Tuesday, February 19, 2008 10:34 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Relative URIs and RFC-2396 After a little more thought, absolute instances of xs:anyURI will not always work as a fragment identifier. If a spectrum's id attribute was xs:anyURI in file "foo.mzML": <spectrum id="file://foo.1.1.1.dta" /> <!-- this is a valid (absolute) xs:anyURI --> And in something like pepXML or analysisXML: <spectrumQuery spectrumRef="file://foo.mzML#file://foo.1.1.1.dta" /> <!-- not a valid xs:anyURI! --> Unless I'm missing something, using xs:anyURI for fragment identifiers would actually make the schema less safe. Valid mzML ids would be potentially unusable in an external URI unless URL-encoded, which would defeat the point of using xs:anyURI in the first place. -Matt Randy Julian wrote: > All we are trying to achieve with the anyURI is safety for use within a > URI. Any xlink-safe way of doing this will work. So if xs:ID is > supported better by the validating parsers, it would do what we want. > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of > Matthew Chambers > Sent: Tuesday, February 19, 2008 4:25 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Relative URIs and RFC-2396 > > You are right Randy, we were forgetting about relative URIs which can > simply refer to a resource's name with no path at all ("1" is certainly > a valid resource name). However, I think anyURI is still a bad idea for > any attribute which is not intended to be able to refer to something in > a remote location (e.g. not in the current file). The "id" attribute in > the XML namespace has type "xs:ID" which has semantics more along the > lines of what I think you want. If I understand the use case correctly, > it is desirable to be able to link to certain mzML elements from > external documents with a URI, like: > file://data_source.mzML#s555 > This is an example absolute URI reference to a spectrum in a file at > "data_source.mzML" where the spectrum's id attribute is "s555". It > wouldn't make sense for the id itself to be a URI, although the > reference to it can (and should) be. > > So: > 1) for id attributes which can be referred to externally or internally, > use the type "xs:ID" > 2) for references to external or internal resources by their id > attribute, use the type "xs:anyURI" > > This would have the problem of the Xerxes C parser not validating > relative URIs correctly, but that seems to be wrong on their part. :/ > Anyway, users of Xerxes C can turn off the validation feature to work > around it. > > Also, Ref attributes in mzML could use anyURI for consistency reasons > even though we don't currently know of a use case where such references > would be made to an external file. > > -Matt > > > Randy Julian wrote: > >> Per our conversation today, the relevant specification is RFC-2396: >> >> http://www.ietf.org/rfc/rfc2396.txt >> >> Section 5 talks about relative URIs. They do not need to include the >> protocol and their syntax would include all integers: >> >> The syntax for relative URI takes advantage of the <hier_part> syntax >> of <absoluteURI> (Section 3) in order to express a reference that >> > is > >> relative to the namespace of another hierarchical URI. >> >> relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] >> >> A relative reference beginning with two slash characters is termed >> > a > >> network-path reference, as defined by <net_path> in Section 3. >> > Such > >> references are rarely used. >> >> A relative reference beginning with a single slash character is >> termed an absolute-path reference, as defined by <abs_path> in >> Section 3. >> >> A relative reference that does not begin with a scheme name or a >> slash character is termed a relative-path reference. >> >> rel_path = rel_segment [ abs_path ] >> >> rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" >> > | "+" | "$" | "," ) > >> That means that you don't need the net_path part or the abs_path part >> but can use the rel_path part alone. The rel_path part can have only >> the rel_segment part which is required to have one or more unreserved >> characters (includes all the integers) and/or any of the above special >> > > >> characters or escaped characters. >> >> The point of using it in mzML for IDs is that you can be assured of it >> > > >> being a valid relative path when extended by all the other components >> needed to navigate to a referenced document (protocol, absolute path, >> etc.). >> >> We can achieve this by convention by saying in the mzML spec doc (and >> possibly putting the required pattern in the schema), that the string >> for ID must conform to RFC-2396. >> >> Randy >> >> Randall K Julian, Jr. Ph.D. >> President >> Indigo BioSystems, Inc. >> (317) 536-2736 x101 >> (317) 306-5447 mobile >> >> www.indigobio.com <http://www.indigobio.com/> >> > ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Rune S. P. <ru...@ph...> - 2008-02-20 07:32:25
|
Fredrik Levander wrote: > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > How is this solved in MassWolf? > The information is not saved. New scan numbers are assigned. That is, you have to use the time (and precursor with ms^2) information in order to locate the original scan. -- Regards Rune |
From: Matt C. <mat...@va...> - 2008-02-20 04:21:12
|
After a little more thought, absolute instances of xs:anyURI will not always work as a fragment identifier. If a spectrum's id attribute was xs:anyURI in file "foo.mzML": <spectrum id="file://foo.1.1.1.dta" /> <!-- this is a valid (absolute) xs:anyURI --> And in something like pepXML or analysisXML: <spectrumQuery spectrumRef="file://foo.mzML#file://foo.1.1.1.dta" /> <!-- not a valid xs:anyURI! --> Unless I'm missing something, using xs:anyURI for fragment identifiers would actually make the schema less safe. Valid mzML ids would be potentially unusable in an external URI unless URL-encoded, which would defeat the point of using xs:anyURI in the first place. -Matt Randy Julian wrote: > All we are trying to achieve with the anyURI is safety for use within a > URI. Any xlink-safe way of doing this will work. So if xs:ID is > supported better by the validating parsers, it would do what we want. > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of > Matthew Chambers > Sent: Tuesday, February 19, 2008 4:25 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Relative URIs and RFC-2396 > > You are right Randy, we were forgetting about relative URIs which can > simply refer to a resource's name with no path at all ("1" is certainly > a valid resource name). However, I think anyURI is still a bad idea for > any attribute which is not intended to be able to refer to something in > a remote location (e.g. not in the current file). The "id" attribute in > the XML namespace has type "xs:ID" which has semantics more along the > lines of what I think you want. If I understand the use case correctly, > it is desirable to be able to link to certain mzML elements from > external documents with a URI, like: > file://data_source.mzML#s555 > This is an example absolute URI reference to a spectrum in a file at > "data_source.mzML" where the spectrum's id attribute is "s555". It > wouldn't make sense for the id itself to be a URI, although the > reference to it can (and should) be. > > So: > 1) for id attributes which can be referred to externally or internally, > use the type "xs:ID" > 2) for references to external or internal resources by their id > attribute, use the type "xs:anyURI" > > This would have the problem of the Xerxes C parser not validating > relative URIs correctly, but that seems to be wrong on their part. :/ > Anyway, users of Xerxes C can turn off the validation feature to work > around it. > > Also, Ref attributes in mzML could use anyURI for consistency reasons > even though we don't currently know of a use case where such references > would be made to an external file. > > -Matt > > > Randy Julian wrote: > >> Per our conversation today, the relevant specification is RFC-2396: >> >> http://www.ietf.org/rfc/rfc2396.txt >> >> Section 5 talks about relative URIs. They do not need to include the >> protocol and their syntax would include all integers: >> >> The syntax for relative URI takes advantage of the <hier_part> syntax >> of <absoluteURI> (Section 3) in order to express a reference that >> > is > >> relative to the namespace of another hierarchical URI. >> >> relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] >> >> A relative reference beginning with two slash characters is termed >> > a > >> network-path reference, as defined by <net_path> in Section 3. >> > Such > >> references are rarely used. >> >> A relative reference beginning with a single slash character is >> termed an absolute-path reference, as defined by <abs_path> in >> Section 3. >> >> A relative reference that does not begin with a scheme name or a >> slash character is termed a relative-path reference. >> >> rel_path = rel_segment [ abs_path ] >> >> rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" >> > | "+" | "$" | "," ) > >> That means that you don't need the net_path part or the abs_path part >> but can use the rel_path part alone. The rel_path part can have only >> the rel_segment part which is required to have one or more unreserved >> characters (includes all the integers) and/or any of the above special >> > > >> characters or escaped characters. >> >> The point of using it in mzML for IDs is that you can be assured of it >> > > >> being a valid relative path when extended by all the other components >> needed to navigate to a referenced document (protocol, absolute path, >> etc.). >> >> We can achieve this by convention by saying in the mzML spec doc (and >> possibly putting the required pattern in the schema), that the string >> for ID must conform to RFC-2396. >> >> Randy >> >> Randall K Julian, Jr. Ph.D. >> President >> Indigo BioSystems, Inc. >> (317) 536-2736 x101 >> (317) 306-5447 mobile >> >> www.indigobio.com <http://www.indigobio.com/> >> > |
From: Randy J. <rkj...@in...> - 2008-02-20 02:21:29
|
Based on today's discussion, I think the goal people had for scanNumber is better achieved using the acqNumber element. Actually, what people seemed to want was a place to put the Thermo-specific 'scan number' which allows you to go back to a raw file and see the scan in the vendor file. There is no reason why you couldn't put the scan number from Thermo as an 'index' (the acqNumber element would still allow a more accurate representation of the source of the spectrum, since it can handle the summation or averaging of spectra), and having a 'positiveInteger' for this makes great sense. The value of using a scan number as an index comes from the problem of trying to order spectra from non-LC experiments where acquisition time is meaningless. Thermo's 'scan number' works great for this, and we need something like this for the other vendor formats too. We should have more discussion on this, but it will probably help to have some examples - which should be available from the group soon. Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, February 19, 2008 3:54 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Unique scan numbers There is no requirement that an ascending lexicographical sort would produce the same order that a sort by ascending scanNumber would (that would be far too restrictive), but there is a requirement that the spectrum elements in the file be stored in ascending order by scanNumber. We had quite a bit of discussion in the teleconference today about removing scanNumber as a primary key and replacing it with an index attribute; such an attribute would probably have different semantics (0-based and contiguous). I think Eric is probably preparing to post a good summary of the discussion. My proposal for positiveInteger for scanNumber was accepted but then rendered moot by the proposal to get rid of the scanNumber attribute. ;) "positiveInteger" has no schematic upper limit, so perhaps if we switch to an index attribute we should make it "unsignedLong" (schematically defined as a 64-bit unsigned integer). -Matt Coleman, Michael wrote: > Is there a requirement that scanNumber and id be "co-ordered" (if I sort > a file's spectra by one of these keys, the other key will then > necessarily also be sorted)? > > Is there a requirement that the spectra in all mzML files be ordered by > one or both of these keys? > > If scanNumber is being used for ordering in one of these ways, I agree > that lexicographic ordering should be specified. If not, I'm wondering > whether there is any other reason for specifying the ordering. > > > If scanNumbers were specified to be contiguous, I'd say we ought to > allow 0 as a scan number, since essentially all modern programming > languages use 0-based arrays. But if I understand correctly, > scanNumbers need not be contiguous (and thus programmers should not > assume that they can be directly used for array indexing). > > Are scan numbers up to at least 2**62 or so allowed, to prepare for the > coming ten-billion-spectrum runs? :-) > > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 10:07 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> Hi Michael, >> >> As it currently stands, both scanNumber and id are unique keys to a >> spectrum - they need not be combined to create a unique key. Id is a >> string and as such should be compared on a lexicographical basis (if >> that isn't stated in the spec, it should be), and scanNumber >> is an integer: >> <xs:attribute name="scanNumber" type="xs:int" use="required"> >> >> By the way, I think we should change that type to be >> xs:positiveInteger >> so that the range is schematically limited to [1-infinity). 0 >> shouldn't >> be a valid scan number (if 0 is allowed then Michael's point >> about the >> -0 and 0 issue should be addressed, although that might be >> done by the >> XML Schema specification). >> >> -Matt >> >> >> Coleman, Michael wrote: >> >>> I don't understand the issues involved in this particular >>> >> question, but >> >>> it reminds me of this key requirement: >>> >>> - There has to be a way of generating a unique key for each spectrum >>> (i.e., unique across all spectra in the file) that will work for all >>> mzML files. >>> >>> In the example below, it looks like that key is the 2-tuple "(id, >>> scanNumber)". (Whatever the key is, it should be specified >>> >> as such in >> >>> the standard.) >>> >>> >>> If the key includes any numeric fields, it needs to be >>> >> specified whether >> >>> or not (say) "0010" is equal to "10", whether or not "1.0" >>> >> is equal to >> >>> "1", and whether or not "-0" is equal to "0". Hopefully >>> >> either (a) the >> >>> former is simply disallowed in all of these cases or (b) >>> >> all fields are >> >>> to be treated as strings, rather than numbers, and >>> >> comparison done on >> >>> that basis. >>> >>> Mike >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> [mailto:psi...@li...] On >>>> Behalf Of Matthew Chambers >>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>> >>>> >>>> How do you feel about generating arbitrary unique scan >>>> numbers and then >>>> using the id attribute to preserve the original filename and >>>> scan number: >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function1.2" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function2.1" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> Or probably more intuitive would be to store the parallel spectra >>>> sequentially (assuming that the same scan number from each >>>> function is >>>> correlated): >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function2.1" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function1.2" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> It's either that or store each function in a separate mzML >>>> file, because >>>> mzML doesn't support multiple runs in the same file. >>>> >>>> -Matt >>>> >>>> >>>> Fredrik Levander wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>> >>>>> >>>> have several >>>> >>>> >>>>> parallel 'functions' with data being recorded into separate >>>>> >>>>> >>>> files. The >>>> >>>> >>>>> scan numbers are only unique within each function. In the >>>>> >> raw data >> >>>>> folder we thus have several different spectra with the same >>>>> >>>>> >>>> scan number >>>> >>>> >>>>> (but different source files). When converting this into an >>>>> >>>>> >>>> mzML file it >>>> >>>> >>>>> would be good to keep the original scan numbers which are >>>>> >>>>> >>>> useful for >>>> >>>> >>>>> traceability, but to generate unique spectrum ids. I thus >>>>> >>>>> >>>> propose that >>>> >>>> >>>>> the requirement for unique scanNumbers within an mzML file >>>>> >>>>> >>>> is removed. >>>> >>>> >>>>> However, spectra should not be repeated within the file, so >>>>> >>>>> >>>> this would >>>> >>>> >>>>> NOT be applicable to the dta to mzML conversion use case. >>>>> Would such a change generate problems for the readers? >>>>> How is this solved in MassWolf? >>>>> >>>>> >>>>> Regards >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Randy J. <rkj...@in...> - 2008-02-20 02:12:03
|
All we are trying to achieve with the anyURI is safety for use within a URI. Any xlink-safe way of doing this will work. So if xs:ID is supported better by the validating parsers, it would do what we want. -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, February 19, 2008 4:25 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Relative URIs and RFC-2396 You are right Randy, we were forgetting about relative URIs which can simply refer to a resource's name with no path at all ("1" is certainly a valid resource name). However, I think anyURI is still a bad idea for any attribute which is not intended to be able to refer to something in a remote location (e.g. not in the current file). The "id" attribute in the XML namespace has type "xs:ID" which has semantics more along the lines of what I think you want. If I understand the use case correctly, it is desirable to be able to link to certain mzML elements from external documents with a URI, like: file://data_source.mzML#s555 This is an example absolute URI reference to a spectrum in a file at "data_source.mzML" where the spectrum's id attribute is "s555". It wouldn't make sense for the id itself to be a URI, although the reference to it can (and should) be. So: 1) for id attributes which can be referred to externally or internally, use the type "xs:ID" 2) for references to external or internal resources by their id attribute, use the type "xs:anyURI" This would have the problem of the Xerxes C parser not validating relative URIs correctly, but that seems to be wrong on their part. :/ Anyway, users of Xerxes C can turn off the validation feature to work around it. Also, Ref attributes in mzML could use anyURI for consistency reasons even though we don't currently know of a use case where such references would be made to an external file. -Matt Randy Julian wrote: > > Per our conversation today, the relevant specification is RFC-2396: > > http://www.ietf.org/rfc/rfc2396.txt > > Section 5 talks about relative URIs. They do not need to include the > protocol and their syntax would include all integers: > > The syntax for relative URI takes advantage of the <hier_part> syntax > of <absoluteURI> (Section 3) in order to express a reference that is > relative to the namespace of another hierarchical URI. > > relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] > > A relative reference beginning with two slash characters is termed a > network-path reference, as defined by <net_path> in Section 3. Such > references are rarely used. > > A relative reference beginning with a single slash character is > termed an absolute-path reference, as defined by <abs_path> in > Section 3. > > A relative reference that does not begin with a scheme name or a > slash character is termed a relative-path reference. > > rel_path = rel_segment [ abs_path ] > > rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," ) > > That means that you don't need the net_path part or the abs_path part > but can use the rel_path part alone. The rel_path part can have only > the rel_segment part which is required to have one or more unreserved > characters (includes all the integers) and/or any of the above special > characters or escaped characters. > > The point of using it in mzML for IDs is that you can be assured of it > being a valid relative path when extended by all the other components > needed to navigate to a referenced document (protocol, absolute path, > etc.). > > We can achieve this by convention by saying in the mzML spec doc (and > possibly putting the required pattern in the schema), that the string > for ID must conform to RFC-2396. > > Randy > > Randall K Julian, Jr. Ph.D. > President > Indigo BioSystems, Inc. > (317) 536-2736 x101 > (317) 306-5447 mobile > > www.indigobio.com <http://www.indigobio.com/> > > NOTICE: This message may contain confidential or privileged > information that is for the sole use of the intended recipient. Any > unauthorized review, use, disclosure, copying or distribution is > strictly prohibited. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of the > original message. > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-02-19 21:25:37
|
You are right Randy, we were forgetting about relative URIs which can simply refer to a resource's name with no path at all ("1" is certainly a valid resource name). However, I think anyURI is still a bad idea for any attribute which is not intended to be able to refer to something in a remote location (e.g. not in the current file). The "id" attribute in the XML namespace has type "xs:ID" which has semantics more along the lines of what I think you want. If I understand the use case correctly, it is desirable to be able to link to certain mzML elements from external documents with a URI, like: file://data_source.mzML#s555 This is an example absolute URI reference to a spectrum in a file at "data_source.mzML" where the spectrum's id attribute is "s555". It wouldn't make sense for the id itself to be a URI, although the reference to it can (and should) be. So: 1) for id attributes which can be referred to externally or internally, use the type "xs:ID" 2) for references to external or internal resources by their id attribute, use the type "xs:anyURI" This would have the problem of the Xerxes C parser not validating relative URIs correctly, but that seems to be wrong on their part. :/ Anyway, users of Xerxes C can turn off the validation feature to work around it. Also, Ref attributes in mzML could use anyURI for consistency reasons even though we don't currently know of a use case where such references would be made to an external file. -Matt Randy Julian wrote: > > Per our conversation today, the relevant specification is RFC-2396: > > http://www.ietf.org/rfc/rfc2396.txt > > Section 5 talks about relative URIs. They do not need to include the > protocol and their syntax would include all integers: > > The syntax for relative URI takes advantage of the <hier_part> syntax > of <absoluteURI> (Section 3) in order to express a reference that is > relative to the namespace of another hierarchical URI. > > relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] > > A relative reference beginning with two slash characters is termed a > network-path reference, as defined by <net_path> in Section 3. Such > references are rarely used. > > A relative reference beginning with a single slash character is > termed an absolute-path reference, as defined by <abs_path> in > Section 3. > > A relative reference that does not begin with a scheme name or a > slash character is termed a relative-path reference. > > rel_path = rel_segment [ abs_path ] > > rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," ) > > That means that you don’t need the net_path part or the abs_path part > but can use the rel_path part alone. The rel_path part can have only > the rel_segment part which is required to have one or more unreserved > characters (includes all the integers) and/or any of the above special > characters or escaped characters. > > The point of using it in mzML for IDs is that you can be assured of it > being a valid relative path when extended by all the other components > needed to navigate to a referenced document (protocol, absolute path, > etc.). > > We can achieve this by convention by saying in the mzML spec doc (and > possibly putting the required pattern in the schema), that the string > for ID must conform to RFC-2396. > > Randy > > Randall K Julian, Jr. Ph.D. > President > Indigo BioSystems, Inc. > (317) 536-2736 x101 > (317) 306-5447 mobile > > www.indigobio.com <http://www.indigobio.com/> > > NOTICE: This message may contain confidential or privileged > information that is for the sole use of the intended recipient. Any > unauthorized review, use, disclosure, copying or distribution is > strictly prohibited. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of the > original message. > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matthew C. <mat...@va...> - 2008-02-19 20:54:21
|
There is no requirement that an ascending lexicographical sort would produce the same order that a sort by ascending scanNumber would (that would be far too restrictive), but there is a requirement that the spectrum elements in the file be stored in ascending order by scanNumber. We had quite a bit of discussion in the teleconference today about removing scanNumber as a primary key and replacing it with an index attribute; such an attribute would probably have different semantics (0-based and contiguous). I think Eric is probably preparing to post a good summary of the discussion. My proposal for positiveInteger for scanNumber was accepted but then rendered moot by the proposal to get rid of the scanNumber attribute. ;) "positiveInteger" has no schematic upper limit, so perhaps if we switch to an index attribute we should make it "unsignedLong" (schematically defined as a 64-bit unsigned integer). -Matt Coleman, Michael wrote: > Is there a requirement that scanNumber and id be "co-ordered" (if I sort > a file's spectra by one of these keys, the other key will then > necessarily also be sorted)? > > Is there a requirement that the spectra in all mzML files be ordered by > one or both of these keys? > > If scanNumber is being used for ordering in one of these ways, I agree > that lexicographic ordering should be specified. If not, I'm wondering > whether there is any other reason for specifying the ordering. > > > If scanNumbers were specified to be contiguous, I'd say we ought to > allow 0 as a scan number, since essentially all modern programming > languages use 0-based arrays. But if I understand correctly, > scanNumbers need not be contiguous (and thus programmers should not > assume that they can be directly used for array indexing). > > Are scan numbers up to at least 2**62 or so allowed, to prepare for the > coming ten-billion-spectrum runs? :-) > > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 10:07 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> Hi Michael, >> >> As it currently stands, both scanNumber and id are unique keys to a >> spectrum - they need not be combined to create a unique key. Id is a >> string and as such should be compared on a lexicographical basis (if >> that isn't stated in the spec, it should be), and scanNumber >> is an integer: >> <xs:attribute name="scanNumber" type="xs:int" use="required"> >> >> By the way, I think we should change that type to be >> xs:positiveInteger >> so that the range is schematically limited to [1-infinity). 0 >> shouldn't >> be a valid scan number (if 0 is allowed then Michael's point >> about the >> -0 and 0 issue should be addressed, although that might be >> done by the >> XML Schema specification). >> >> -Matt >> >> >> Coleman, Michael wrote: >> >>> I don't understand the issues involved in this particular >>> >> question, but >> >>> it reminds me of this key requirement: >>> >>> - There has to be a way of generating a unique key for each spectrum >>> (i.e., unique across all spectra in the file) that will work for all >>> mzML files. >>> >>> In the example below, it looks like that key is the 2-tuple "(id, >>> scanNumber)". (Whatever the key is, it should be specified >>> >> as such in >> >>> the standard.) >>> >>> >>> If the key includes any numeric fields, it needs to be >>> >> specified whether >> >>> or not (say) "0010" is equal to "10", whether or not "1.0" >>> >> is equal to >> >>> "1", and whether or not "-0" is equal to "0". Hopefully >>> >> either (a) the >> >>> former is simply disallowed in all of these cases or (b) >>> >> all fields are >> >>> to be treated as strings, rather than numbers, and >>> >> comparison done on >> >>> that basis. >>> >>> Mike >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> [mailto:psi...@li...] On >>>> Behalf Of Matthew Chambers >>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>> >>>> >>>> How do you feel about generating arbitrary unique scan >>>> numbers and then >>>> using the id attribute to preserve the original filename and >>>> scan number: >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function1.2" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function2.1" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> Or probably more intuitive would be to store the parallel spectra >>>> sequentially (assuming that the same scan number from each >>>> function is >>>> correlated): >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function2.1" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function1.2" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> It's either that or store each function in a separate mzML >>>> file, because >>>> mzML doesn't support multiple runs in the same file. >>>> >>>> -Matt >>>> >>>> >>>> Fredrik Levander wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>> >>>>> >>>> have several >>>> >>>> >>>>> parallel 'functions' with data being recorded into separate >>>>> >>>>> >>>> files. The >>>> >>>> >>>>> scan numbers are only unique within each function. In the >>>>> >> raw data >> >>>>> folder we thus have several different spectra with the same >>>>> >>>>> >>>> scan number >>>> >>>> >>>>> (but different source files). When converting this into an >>>>> >>>>> >>>> mzML file it >>>> >>>> >>>>> would be good to keep the original scan numbers which are >>>>> >>>>> >>>> useful for >>>> >>>> >>>>> traceability, but to generate unique spectrum ids. I thus >>>>> >>>>> >>>> propose that >>>> >>>> >>>>> the requirement for unique scanNumbers within an mzML file >>>>> >>>>> >>>> is removed. >>>> >>>> >>>>> However, spectra should not be repeated within the file, so >>>>> >>>>> >>>> this would >>>> >>>> >>>>> NOT be applicable to the dta to mzML conversion use case. >>>>> Would such a change generate problems for the readers? >>>>> How is this solved in MassWolf? >>>>> >>>>> >>>>> Regards >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Randy J. <rkj...@in...> - 2008-02-19 19:11:22
|
Per our conversation today, the relevant specification is RFC-2396: http://www.ietf.org/rfc/rfc2396.txt Section 5 talks about relative URIs. They do not need to include the protocol and their syntax would include all integers: The syntax for relative URI takes advantage of the <hier_part> syntax of <absoluteURI> (Section 3) in order to express a reference that is relative to the namespace of another hierarchical URI. relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] A relative reference beginning with two slash characters is termed a network-path reference, as defined by <net_path> in Section 3. Such references are rarely used. A relative reference beginning with a single slash character is termed an absolute-path reference, as defined by <abs_path> in Section 3. A relative reference that does not begin with a scheme name or a slash character is termed a relative-path reference. rel_path = rel_segment [ abs_path ] rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," ) That means that you don't need the net_path part or the abs_path part but can use the rel_path part alone. The rel_path part can have only the rel_segment part which is required to have one or more unreserved characters (includes all the integers) and/or any of the above special characters or escaped characters. The point of using it in mzML for IDs is that you can be assured of it being a valid relative path when extended by all the other components needed to navigate to a referenced document (protocol, absolute path, etc.). We can achieve this by convention by saying in the mzML spec doc (and possibly putting the required pattern in the schema), that the string for ID must conform to RFC-2396. Randy Randall K Julian, Jr. Ph.D. President Indigo BioSystems, Inc. (317) 536-2736 x101 (317) 306-5447 mobile www.indigobio.com <http://www.indigobio.com/> NOTICE: This message may contain confidential or privileged information that is for the sole use of the intended recipient. Any unauthorized review, use, disclosure, copying or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. |
From: Coleman, M. <MK...@St...> - 2008-02-19 16:26:07
|
Is there a requirement that scanNumber and id be "co-ordered" (if I sort a file's spectra by one of these keys, the other key will then necessarily also be sorted)? Is there a requirement that the spectra in all mzML files be ordered by one or both of these keys? If scanNumber is being used for ordering in one of these ways, I agree that lexicographic ordering should be specified. If not, I'm wondering whether there is any other reason for specifying the ordering. If scanNumbers were specified to be contiguous, I'd say we ought to allow 0 as a scan number, since essentially all modern programming languages use 0-based arrays. But if I understand correctly, scanNumbers need not be contiguous (and thus programmers should not assume that they can be directly used for array indexing). Are scan numbers up to at least 2**62 or so allowed, to prepare for the coming ten-billion-spectrum runs? :-) Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > Sent: Tuesday, February 19, 2008 10:07 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > > Hi Michael, > > As it currently stands, both scanNumber and id are unique keys to a > spectrum - they need not be combined to create a unique key. Id is a > string and as such should be compared on a lexicographical basis (if > that isn't stated in the spec, it should be), and scanNumber > is an integer: > <xs:attribute name="scanNumber" type="xs:int" use="required"> > > By the way, I think we should change that type to be > xs:positiveInteger > so that the range is schematically limited to [1-infinity). 0 > shouldn't > be a valid scan number (if 0 is allowed then Michael's point > about the > -0 and 0 issue should be addressed, although that might be > done by the > XML Schema specification). > > -Matt > > > Coleman, Michael wrote: > > I don't understand the issues involved in this particular > question, but > > it reminds me of this key requirement: > > > > - There has to be a way of generating a unique key for each spectrum > > (i.e., unique across all spectra in the file) that will work for all > > mzML files. > > > > In the example below, it looks like that key is the 2-tuple "(id, > > scanNumber)". (Whatever the key is, it should be specified > as such in > > the standard.) > > > > > > If the key includes any numeric fields, it needs to be > specified whether > > or not (say) "0010" is equal to "10", whether or not "1.0" > is equal to > > "1", and whether or not "-0" is equal to "0". Hopefully > either (a) the > > former is simply disallowed in all of these cases or (b) > all fields are > > to be treated as strings, rather than numbers, and > comparison done on > > that basis. > > > > Mike > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... > >> [mailto:psi...@li...] On > >> Behalf Of Matthew Chambers > >> Sent: Tuesday, February 19, 2008 9:37 AM > >> To: Mass spectrometry standard development > >> Subject: Re: [Psidev-ms-dev] Unique scan numbers > >> > >> > >> How do you feel about generating arbitrary unique scan > >> numbers and then > >> using the id attribute to preserve the original filename and > >> scan number: > >> <spectrum id="function1.1" scanNumber="1" ...> > >> <spectrum id="function1.2" scanNumber="2" ...> > >> ... > >> <spectrum id="function2.1" scanNumber="100" ...> > >> <spectrum id="function2.2" scanNumber="101" ...> > >> ... > >> > >> Or probably more intuitive would be to store the parallel spectra > >> sequentially (assuming that the same scan number from each > >> function is > >> correlated): > >> <spectrum id="function1.1" scanNumber="1" ...> > >> <spectrum id="function2.1" scanNumber="2" ...> > >> ... > >> <spectrum id="function1.2" scanNumber="100" ...> > >> <spectrum id="function2.2" scanNumber="101" ...> > >> ... > >> > >> It's either that or store each function in a separate mzML > >> file, because > >> mzML doesn't support multiple runs in the same file. > >> > >> -Matt > >> > >> > >> Fredrik Levander wrote: > >> > >>> Hi All, > >>> > >>> In QTOF files from Waters with mixed MS1 and MS2 data we > >>> > >> have several > >> > >>> parallel 'functions' with data being recorded into separate > >>> > >> files. The > >> > >>> scan numbers are only unique within each function. In the > raw data > >>> folder we thus have several different spectra with the same > >>> > >> scan number > >> > >>> (but different source files). When converting this into an > >>> > >> mzML file it > >> > >>> would be good to keep the original scan numbers which are > >>> > >> useful for > >> > >>> traceability, but to generate unique spectrum ids. I thus > >>> > >> propose that > >> > >>> the requirement for unique scanNumbers within an mzML file > >>> > >> is removed. > >> > >>> However, spectra should not be repeated within the file, so > >>> > >> this would > >> > >>> NOT be applicable to the dta to mzML conversion use case. > >>> Would such a change generate problems for the readers? > >>> How is this solved in MassWolf? > >>> > >>> > >>> Regards > >>> > >>> Fredrik > >>> > >>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Psidev-ms-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>> > >>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Darren K. <dar...@cs...> - 2008-02-19 16:20:05
|
> I'm not inclined to change it because I'm lazy and because I suspect it > will make Darren and Brian sad. I won't be sad ;) I hadn't noticed before now that the <scan> element is optional. I don't like the idea of bogus scanNumber and msLevel if there is no <scan>. For current applications (e.g. RAMP and converted mzXML files) this won't make a significant difference in parsing the mzML. I would vote for consistency over the very slight easing of parsing. Darren |
From: Darren K. <dar...@cs...> - 2008-02-19 16:19:30
|
Actually, my comment about dataProcessing was limited to the uses of the software during processing. I think the addition of the cvParam for the general software type is useful (and in fact I'm using it in the latest msdata code). If nothing else, it provides for a much more straightfoward translation from mzXML. Without it, encoding the mzXML software type is much more awkward. Darren On Tue, 19 Feb 2008 4:20 am, Lennart Martens wrote: > Hi Eric, hi PSI MS Enthousiast, > > >> I read that this discussion was deemed moot. Play-by-play below. >> Lennart, should we remove your new cvParam entry location to remove >> temptation to use it, or leave it in? > > I'll schedule it for removal, and will do so in the version I'll try to > build after the phone con tonight (or this morning :)). > > > Cheers, > > lnnrt. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev Darren Darren |
From: Matthew C. <mat...@va...> - 2008-02-19 16:06:55
|
Hi Michael, As it currently stands, both scanNumber and id are unique keys to a spectrum - they need not be combined to create a unique key. Id is a string and as such should be compared on a lexicographical basis (if that isn't stated in the spec, it should be), and scanNumber is an integer: <xs:attribute name="scanNumber" type="xs:int" use="required"> By the way, I think we should change that type to be xs:positiveInteger so that the range is schematically limited to [1-infinity). 0 shouldn't be a valid scan number (if 0 is allowed then Michael's point about the -0 and 0 issue should be addressed, although that might be done by the XML Schema specification). -Matt Coleman, Michael wrote: > I don't understand the issues involved in this particular question, but > it reminds me of this key requirement: > > - There has to be a way of generating a unique key for each spectrum > (i.e., unique across all spectra in the file) that will work for all > mzML files. > > In the example below, it looks like that key is the 2-tuple "(id, > scanNumber)". (Whatever the key is, it should be specified as such in > the standard.) > > > If the key includes any numeric fields, it needs to be specified whether > or not (say) "0010" is equal to "10", whether or not "1.0" is equal to > "1", and whether or not "-0" is equal to "0". Hopefully either (a) the > former is simply disallowed in all of these cases or (b) all fields are > to be treated as strings, rather than numbers, and comparison done on > that basis. > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 9:37 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> How do you feel about generating arbitrary unique scan >> numbers and then >> using the id attribute to preserve the original filename and >> scan number: >> <spectrum id="function1.1" scanNumber="1" ...> >> <spectrum id="function1.2" scanNumber="2" ...> >> ... >> <spectrum id="function2.1" scanNumber="100" ...> >> <spectrum id="function2.2" scanNumber="101" ...> >> ... >> >> Or probably more intuitive would be to store the parallel spectra >> sequentially (assuming that the same scan number from each >> function is >> correlated): >> <spectrum id="function1.1" scanNumber="1" ...> >> <spectrum id="function2.1" scanNumber="2" ...> >> ... >> <spectrum id="function1.2" scanNumber="100" ...> >> <spectrum id="function2.2" scanNumber="101" ...> >> ... >> >> It's either that or store each function in a separate mzML >> file, because >> mzML doesn't support multiple runs in the same file. >> >> -Matt >> >> >> Fredrik Levander wrote: >> >>> Hi All, >>> >>> In QTOF files from Waters with mixed MS1 and MS2 data we >>> >> have several >> >>> parallel 'functions' with data being recorded into separate >>> >> files. The >> >>> scan numbers are only unique within each function. In the raw data >>> folder we thus have several different spectra with the same >>> >> scan number >> >>> (but different source files). When converting this into an >>> >> mzML file it >> >>> would be good to keep the original scan numbers which are >>> >> useful for >> >>> traceability, but to generate unique spectrum ids. I thus >>> >> propose that >> >>> the requirement for unique scanNumbers within an mzML file >>> >> is removed. >> >>> However, spectra should not be repeated within the file, so >>> >> this would >> >>> NOT be applicable to the dta to mzML conversion use case. >>> Would such a change generate problems for the readers? >>> How is this solved in MassWolf? >>> >>> >>> Regards >>> >>> Fredrik >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Coleman, M. <MK...@St...> - 2008-02-19 16:00:15
|
To clarify, I think having the converter be able to keep or discard the multiple charge information is fine. What I'm against is *forcing* this information to be discarded, which is how I interpret option (a) below. Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > Sent: Tuesday, February 19, 2008 9:44 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > > > Eh, I think we should leave it up to the implementor of the > converter. > Ideally the converter would be configurable to either keep the charge > state information or discard it. In either case, the scan > number would > only appear as a single element. > > -Matt > > > Coleman, Michael wrote: > > I'm strongly in favor of (b), i.e., keeping that charge state > > information. If the instrument software, or some other software > > upstream of the search engine has reason to believe that > the charge for > > a particular spectrum is +2 or +3 but not +1, or +2 but not > +1 or +3, or > > whatever, the search engine ought to be able to make use of this > > information. > > > > As a practical matter, the spectrum format we currently use > here (ms2, > > very similar to dta) efficiently encodes this information, > so not having > > it in mzML would be at least a minor argument for not > converting. (We > > could, of course, simply duplicate the entire spectrum in > this case, but > > this would further bloat the output, and still lose some important > > information.) > > > > Mike > > > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... > >> [mailto:psi...@li...] On > >> Behalf Of Fredrik Levander > >> Sent: Tuesday, February 19, 2008 9:04 AM > >> To: Mass spectrometry standard development > >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > >> > >> > >> Hi dta fans, > >> > >> I agree completely with 1 and 2. For 3 (several possible > >> charge states), > >> there seems to be two possibilities: > >> a) Do not write the chargestate at all into the mzML in > cases where > >> there are multiple guesses. > >> b) Put all the proposed values into one precursor. See line > >> 206-207 at: > >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH0 > >> 71030_002.mzML?rev=26 > >> > >> Anyone else who would prefer either of a or b? At least > some search > >> engines would try both 2+ and 3+ if there is no charge > state given in > >> the file, so maybe solution a is better? Or does b have advantages? > >> > >> Fredrik > >> > >> Eric Deutsch wrote: > >> > >>> Hi everyone, regarding list dta to mzML conversion, here are my > >>> thoughts: > >>> > >>> 1) The current rule is that scanNumbers must be unique > >>> > >> within a file and > >> > >>> always increasing, although not necessarily sequentially. > >>> > >> IDs must be > >> > >>> unique within a file. I don't think should change for > >>> > >> conversion from > >> > >>> dta. > >>> > >>> 2) I would only encode the spectrum once, since as you say > >>> > >> it is just > >> > >>> one spectrum. > >>> > >>> 3) I don't even see why you need two precursors. When we > >>> > >> convert dta to > >> > >>> mzXML, duplicates were dropped and the actual observed > >>> > >> precursor mass > >> > >>> was put in the mzXML. Thus you are "losing" the > information that the > >>> spectrum could be charge 2 or 3. However, this information > >>> > >> was guessed > >> > >>> in the first place, and most software I know that extracts > >>> > >> a spectrum > >> > >>> with no charge information will apply some rules to decide on what > >>> charges to search. So, I suggest that the conversion from > >>> > >> dta to mzML is > >> > >>> just the reverse of mzML to dta. One spectrum per scan. If > >>> > >> only 1 charge > >> > >>> (dta file) is provided, encode it at the user's discretion. > >>> > >> If more than > >> > >>> 1 charge (dta file) is provided, encode the spectrum > >>> > >> without any charge > >> > >>> information. For LCQ data, it would probably be reasonable > >>> > >> to not encode > >> > >>> *any* charge information in the mzML file at all. Because > it doesn't > >>> come with any in the first place. > >>> > >>> We will be adding the functionality for multiple precursors > >>> > >> anyway for > >> > >>> the case when you have multiple peaks in your selection > >>> > >> window as seen, > >> > >>> e.g., in an orbitrap. I suppose there's no reason you > couldn't take > >>> advantage of that to encode both the 2+ and 3+ although I wouldn't > >>> recommend it. > >>> > >>> Eric > >>> > >>> > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: psi...@li... > >>>> > >>>> > >>> [mailto:psidev-ms-dev- > >>> > >>> > >>>> bo...@li...] On Behalf Of Fredrik Levander > >>>> Sent: Thursday, February 14, 2008 9:55 AM > >>>> To: Mass spectrometry standard development > >>>> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion > >>>> > >>>> Hi Matt and Rune, > >>>> > >>>> Thanks for the comments. I agree that the important > >>>> > >> information is the > >> > >>>> scan number, since this is what you would like to look up > >>>> > >> in the raw > >> > >>>> data file. And it doesn't make much sense to have the > scan repeated > >>>> twice in the file, so I think we'll go for solution 2 > and just keep > >>>> > >>>> > >>> the > >>> > >>> > >>>> sourceFileRef to one of the files. > >>>> However, since we do have unique spectrum ids there should > >>>> > >> not be any > >> > >>>> real need to stick to the unique scan number requirement > >>>> > >> from what I > >> > >>>> > >>>> > >>> got > >>> > >>> > >>>> from the indexing discussion, even if it is still in the > specs (?). > >>>> Couldn't there be cases when data is collected in > >>>> > >> different channels > >> > >>>> where the scan numbers are the same in different channels? > >>>> > >>>> Regards > >>>> > >>>> Fredrik > >>>> > >>>> Matthew Chambers skrev: > >>>> > >>>> > >>>>> Hi Fredrik, > >>>>> > >>>>> Our group has a converter that does this conversion (to mzXML or > >>>>> > >>>>> > >>> mzData > >>> > >>> > >>>>> currently, not yet mzML, but they all have the same uniqueness > >>>>> constraints on scan numbers and they all support multiple > >>>>> > >> precursors > >> > >>>>> > >>>>> > >>> at > >>> > >>> > >>>>> least in theory); we went with solution 2 because solution 1 is > >>>>> > >>>>> > >>> invalid > >>> > >>> > >>>>> for all the XML formats (i.e. it would need a schema > >>>>> > >> change and that > >> > >>>>> change isn't likely to happen, whereas multiple > >>>>> > >> sourceFileRefs would > >> > >>>>> > >>>>> > >>> be > >>> > >>> > >>>>> understandable). As I understand it, sourceFileRef is optional > >>>>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" > >>>>> > >>>>> > >>> use="optional">"), > >>> > >>> > >>>>> so if you can't or don't want to encode it correctly, just don't > >>>>> > >>>>> > >>> include > >>> > >>> > >>>>> it. Our converter doesn't even bother to include the > >>>>> > >> sourceFileRefs > >> > >>>>> > >>>>> > >>> to > >>> > >>> > >>>>> the DTAs, it's not helpful information IMO. As long as the > >>>>> > >>>>> > >>> conversion is > >>> > >>> > >>>>> done without data loss, get it over with and then have > >>>>> > >> mercy on your > >> > >>>>> filesystem by deleting the DTAs. ;) > >>>>> > >>>>> -Matt > >>>>> > >>>>> > >>>>> Fredrik Levander wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> In the Proteios platform we're including converters from > >>>>>> > >> some peak > >> > >>>>>> > >>>>>> > >>> list > >>> > >>> > >>>>>> formats to mzData, and now also to mzML. It is clearly > >>>>>> > >> not optimal > >> > >>>>>> > >>>>>> > >>> with > >>> > >>> > >>>>>> such conversion since instrument settings etcetera are lost. > >>>>>> > >>>>>> > >>> However, I > >>> > >>> > >>>>>> guess there will be need for such converters if > someone wants to > >>>>>> > >>>>>> > >>> use > >>> > >>> > >>>>>> their old instruments with manufacturer peak picking > algorithms. > >>>>>> > >>>>>> There are sample files generated from DTAs and > ProteinLynx by the > >>>>>> converters (0.99.1) at: > >>>>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML > >>>>>> > >>>>>> The converters will be part of the new release of the Proteios > >>>>>> > >>>>>> > >>> Software > >>> > >>> > >>>>>> Environment, but if anyone would like to try them with > >>>>>> > >> their files, > >> > >>>>>> there is a standalone package (mzMLconverters.zip) at > the address > >>>>>> > >>>>>> > >>> above > >>> > >>> > >>>>>> which should work under Windows/Linux/OSX with Java 1.5 > >>>>>> > >> or higher. > >> > >>>>>> Please notice that the output files are not schematically valid > >>>>>> > >>>>>> > >>> since > >>> > >>> > >>>>>> some terms are still missing in the CV. > >>>>>> > >>>>>> For the conversion of multiple DTA files to one mzML > >>>>>> > >> file there is > >> > >>>>>> > >>>>>> > >>> a > >>> > >>> > >>>>>> small problem which is related to how lcq_dta generates > >>>>>> > >> dta files: > >> > >>>>>> > >>>>>> > >>> If > >>> > >>> > >>>>>> the charge state of the precursor can not be determined, > >>>>>> > >> a spectrum > >> > >>>>>> > >>>>>> > >>> can > >>> > >>> > >>>>>> result in two DTA files which are identical apart from the > >>>>>> > >>>>>> > >>> precursor. > >>> > >>> > >>>>>> There are two solutions on how to handle this: > >>>>>> 1) Two spectra, with the same scanNumber but different > >>>>>> > >> spectrum Ids > >> > >>>>>> > >>>>>> > >>>> (The > >>>> > >>>> > >>>>>> solution used by the current converter) > >>>>>> 2) One spectrum, two precursors. However, this will > not work with > >>>>>> > >>>>>> > >>> the > >>> > >>> > >>>>>> current schema since there can only be one sourceFileRef for a > >>>>>> > >>>>>> > >>>> spectrum. > >>>> > >>>> > >>>>>> Do you all think solution 1 is fine, or is there a > >>>>>> > >> better solution? > >> > >>>>>> Solution 2 seems to need schema changes. > >>>>>> Other comments are also welcome > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Fredrik > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >> -------------------------------------------------------------- > >> --------- > >> > >>> > >>> > >>>> -- > >>>> > >>>> > >>>>>> This SF.net email is sponsored by: Microsoft > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>>> _______________________________________________ > >>>>>> Psidev-ms-dev mailing list > >>>>>> Psi...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >> -------------------------------------------------------------- > >> ---------- > >> > >>> > >>> > >>>> - > >>>> > >>>> > >>>>> This SF.net email is sponsored by: Microsoft > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>> _______________________________________________ > >>>>> Psidev-ms-dev mailing list > >>>>> Psi...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >> -------------------------------------------------------------- > >> ---------- > >> > >>> - > >>> > >>> > >>>> This SF.net email is sponsored by: Microsoft > >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>> _______________________________________________ > >>>> Psidev-ms-dev mailing list > >>>> Psi...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>>> > >>>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2008. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Psidev-ms-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >>> > >>> > >> -------------------------------------------------------------- > >> ----------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Darren K. <dke...@ya...> - 2008-02-19 15:59:33
|
> I'm not inclined to change it because I'm lazy and because I suspect it > will make Darren and Brian sad. I won't be sad ;) I hadn't noticed before now that the <scan> element is optional. I don't like the idea of bogus scanNumber and msLevel if there is no <scan>. For current applications (e.g. RAMP and converted mzXML files) this won't make a significant difference in parsing the mzML. I would vote for consistency over the very slight easing of parsing. Darren |
From: Coleman, M. <MK...@St...> - 2008-02-19 15:57:43
|
I don't understand the issues involved in this particular question, but it reminds me of this key requirement: - There has to be a way of generating a unique key for each spectrum (i.e., unique across all spectra in the file) that will work for all mzML files. In the example below, it looks like that key is the 2-tuple "(id, scanNumber)". (Whatever the key is, it should be specified as such in the standard.) If the key includes any numeric fields, it needs to be specified whether or not (say) "0010" is equal to "10", whether or not "1.0" is equal to "1", and whether or not "-0" is equal to "0". Hopefully either (a) the former is simply disallowed in all of these cases or (b) all fields are to be treated as strings, rather than numbers, and comparison done on that basis. Mike > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > Sent: Tuesday, February 19, 2008 9:37 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] Unique scan numbers > > > How do you feel about generating arbitrary unique scan > numbers and then > using the id attribute to preserve the original filename and > scan number: > <spectrum id="function1.1" scanNumber="1" ...> > <spectrum id="function1.2" scanNumber="2" ...> > ... > <spectrum id="function2.1" scanNumber="100" ...> > <spectrum id="function2.2" scanNumber="101" ...> > ... > > Or probably more intuitive would be to store the parallel spectra > sequentially (assuming that the same scan number from each > function is > correlated): > <spectrum id="function1.1" scanNumber="1" ...> > <spectrum id="function2.1" scanNumber="2" ...> > ... > <spectrum id="function1.2" scanNumber="100" ...> > <spectrum id="function2.2" scanNumber="101" ...> > ... > > It's either that or store each function in a separate mzML > file, because > mzML doesn't support multiple runs in the same file. > > -Matt > > > Fredrik Levander wrote: > > Hi All, > > > > In QTOF files from Waters with mixed MS1 and MS2 data we > have several > > parallel 'functions' with data being recorded into separate > files. The > > scan numbers are only unique within each function. In the raw data > > folder we thus have several different spectra with the same > scan number > > (but different source files). When converting this into an > mzML file it > > would be good to keep the original scan numbers which are > useful for > > traceability, but to generate unique spectrum ids. I thus > propose that > > the requirement for unique scanNumbers within an mzML file > is removed. > > However, spectra should not be repeated within the file, so > this would > > NOT be applicable to the dta to mzML conversion use case. > > Would such a change generate problems for the readers? > > How is this solved in MassWolf? > > > > > > Regards > > > > Fredrik > > > > > -------------------------------------------------------------- > ----------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matthew C. <mat...@va...> - 2008-02-19 15:44:27
|
Eh, I think we should leave it up to the implementor of the converter. Ideally the converter would be configurable to either keep the charge state information or discard it. In either case, the scan number would only appear as a single element. -Matt Coleman, Michael wrote: > I'm strongly in favor of (b), i.e., keeping that charge state > information. If the instrument software, or some other software > upstream of the search engine has reason to believe that the charge for > a particular spectrum is +2 or +3 but not +1, or +2 but not +1 or +3, or > whatever, the search engine ought to be able to make use of this > information. > > As a practical matter, the spectrum format we currently use here (ms2, > very similar to dta) efficiently encodes this information, so not having > it in mzML would be at least a minor argument for not converting. (We > could, of course, simply duplicate the entire spectrum in this case, but > this would further bloat the output, and still lose some important > information.) > > Mike > > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Fredrik Levander >> Sent: Tuesday, February 19, 2008 9:04 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion >> >> >> Hi dta fans, >> >> I agree completely with 1 and 2. For 3 (several possible >> charge states), >> there seems to be two possibilities: >> a) Do not write the chargestate at all into the mzML in cases where >> there are multiple guesses. >> b) Put all the proposed values into one precursor. See line >> 206-207 at: >> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/ADH0 >> 71030_002.mzML?rev=26 >> >> Anyone else who would prefer either of a or b? At least some search >> engines would try both 2+ and 3+ if there is no charge state given in >> the file, so maybe solution a is better? Or does b have advantages? >> >> Fredrik >> >> Eric Deutsch wrote: >> >>> Hi everyone, regarding list dta to mzML conversion, here are my >>> thoughts: >>> >>> 1) The current rule is that scanNumbers must be unique >>> >> within a file and >> >>> always increasing, although not necessarily sequentially. >>> >> IDs must be >> >>> unique within a file. I don't think should change for >>> >> conversion from >> >>> dta. >>> >>> 2) I would only encode the spectrum once, since as you say >>> >> it is just >> >>> one spectrum. >>> >>> 3) I don't even see why you need two precursors. When we >>> >> convert dta to >> >>> mzXML, duplicates were dropped and the actual observed >>> >> precursor mass >> >>> was put in the mzXML. Thus you are "losing" the information that the >>> spectrum could be charge 2 or 3. However, this information >>> >> was guessed >> >>> in the first place, and most software I know that extracts >>> >> a spectrum >> >>> with no charge information will apply some rules to decide on what >>> charges to search. So, I suggest that the conversion from >>> >> dta to mzML is >> >>> just the reverse of mzML to dta. One spectrum per scan. If >>> >> only 1 charge >> >>> (dta file) is provided, encode it at the user's discretion. >>> >> If more than >> >>> 1 charge (dta file) is provided, encode the spectrum >>> >> without any charge >> >>> information. For LCQ data, it would probably be reasonable >>> >> to not encode >> >>> *any* charge information in the mzML file at all. Because it doesn't >>> come with any in the first place. >>> >>> We will be adding the functionality for multiple precursors >>> >> anyway for >> >>> the case when you have multiple peaks in your selection >>> >> window as seen, >> >>> e.g., in an orbitrap. I suppose there's no reason you couldn't take >>> advantage of that to encode both the 2+ and 3+ although I wouldn't >>> recommend it. >>> >>> Eric >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> >>>> >>> [mailto:psidev-ms-dev- >>> >>> >>>> bo...@li...] On Behalf Of Fredrik Levander >>>> Sent: Thursday, February 14, 2008 9:55 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] DTA to mzML conversion >>>> >>>> Hi Matt and Rune, >>>> >>>> Thanks for the comments. I agree that the important >>>> >> information is the >> >>>> scan number, since this is what you would like to look up >>>> >> in the raw >> >>>> data file. And it doesn't make much sense to have the scan repeated >>>> twice in the file, so I think we'll go for solution 2 and just keep >>>> >>>> >>> the >>> >>> >>>> sourceFileRef to one of the files. >>>> However, since we do have unique spectrum ids there should >>>> >> not be any >> >>>> real need to stick to the unique scan number requirement >>>> >> from what I >> >>>> >>>> >>> got >>> >>> >>>> from the indexing discussion, even if it is still in the specs (?). >>>> Couldn't there be cases when data is collected in >>>> >> different channels >> >>>> where the scan numbers are the same in different channels? >>>> >>>> Regards >>>> >>>> Fredrik >>>> >>>> Matthew Chambers skrev: >>>> >>>> >>>>> Hi Fredrik, >>>>> >>>>> Our group has a converter that does this conversion (to mzXML or >>>>> >>>>> >>> mzData >>> >>> >>>>> currently, not yet mzML, but they all have the same uniqueness >>>>> constraints on scan numbers and they all support multiple >>>>> >> precursors >> >>>>> >>>>> >>> at >>> >>> >>>>> least in theory); we went with solution 2 because solution 1 is >>>>> >>>>> >>> invalid >>> >>> >>>>> for all the XML formats (i.e. it would need a schema >>>>> >> change and that >> >>>>> change isn't likely to happen, whereas multiple >>>>> >> sourceFileRefs would >> >>>>> >>>>> >>> be >>> >>> >>>>> understandable). As I understand it, sourceFileRef is optional >>>>> ("<xs:attribute name="sourceFileRef" type="xs:anyURI" >>>>> >>>>> >>> use="optional">"), >>> >>> >>>>> so if you can't or don't want to encode it correctly, just don't >>>>> >>>>> >>> include >>> >>> >>>>> it. Our converter doesn't even bother to include the >>>>> >> sourceFileRefs >> >>>>> >>>>> >>> to >>> >>> >>>>> the DTAs, it's not helpful information IMO. As long as the >>>>> >>>>> >>> conversion is >>> >>> >>>>> done without data loss, get it over with and then have >>>>> >> mercy on your >> >>>>> filesystem by deleting the DTAs. ;) >>>>> >>>>> -Matt >>>>> >>>>> >>>>> Fredrik Levander wrote: >>>>> >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> In the Proteios platform we're including converters from >>>>>> >> some peak >> >>>>>> >>>>>> >>> list >>> >>> >>>>>> formats to mzData, and now also to mzML. It is clearly >>>>>> >> not optimal >> >>>>>> >>>>>> >>> with >>> >>> >>>>>> such conversion since instrument settings etcetera are lost. >>>>>> >>>>>> >>> However, I >>> >>> >>>>>> guess there will be need for such converters if someone wants to >>>>>> >>>>>> >>> use >>> >>> >>>>>> their old instruments with manufacturer peak picking algorithms. >>>>>> >>>>>> There are sample files generated from DTAs and ProteinLynx by the >>>>>> converters (0.99.1) at: >>>>>> http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML >>>>>> >>>>>> The converters will be part of the new release of the Proteios >>>>>> >>>>>> >>> Software >>> >>> >>>>>> Environment, but if anyone would like to try them with >>>>>> >> their files, >> >>>>>> there is a standalone package (mzMLconverters.zip) at the address >>>>>> >>>>>> >>> above >>> >>> >>>>>> which should work under Windows/Linux/OSX with Java 1.5 >>>>>> >> or higher. >> >>>>>> Please notice that the output files are not schematically valid >>>>>> >>>>>> >>> since >>> >>> >>>>>> some terms are still missing in the CV. >>>>>> >>>>>> For the conversion of multiple DTA files to one mzML >>>>>> >> file there is >> >>>>>> >>>>>> >>> a >>> >>> >>>>>> small problem which is related to how lcq_dta generates >>>>>> >> dta files: >> >>>>>> >>>>>> >>> If >>> >>> >>>>>> the charge state of the precursor can not be determined, >>>>>> >> a spectrum >> >>>>>> >>>>>> >>> can >>> >>> >>>>>> result in two DTA files which are identical apart from the >>>>>> >>>>>> >>> precursor. >>> >>> >>>>>> There are two solutions on how to handle this: >>>>>> 1) Two spectra, with the same scanNumber but different >>>>>> >> spectrum Ids >> >>>>>> >>>>>> >>>> (The >>>> >>>> >>>>>> solution used by the current converter) >>>>>> 2) One spectrum, two precursors. However, this will not work with >>>>>> >>>>>> >>> the >>> >>> >>>>>> current schema since there can only be one sourceFileRef for a >>>>>> >>>>>> >>>> spectrum. >>>> >>>> >>>>>> Do you all think solution 1 is fine, or is there a >>>>>> >> better solution? >> >>>>>> Solution 2 seems to need schema changes. >>>>>> Other comments are also welcome >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fredrik >>>>>> >>>>>> >>>>>> >>>>>> >> -------------------------------------------------------------- >> --------- >> >>> >>> >>>> -- >>>> >>>> >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> Psidev-ms-dev mailing list >>>>>> Psi...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >> -------------------------------------------------------------- >> ---------- >> >>> >>> >>>> - >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>> >>>> >> -------------------------------------------------------------- >> ---------- >> >>> - >>> >>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Matthew C. <mat...@va...> - 2008-02-19 15:37:35
|
How do you feel about generating arbitrary unique scan numbers and then using the id attribute to preserve the original filename and scan number: <spectrum id="function1.1" scanNumber="1" ...> <spectrum id="function1.2" scanNumber="2" ...> ... <spectrum id="function2.1" scanNumber="100" ...> <spectrum id="function2.2" scanNumber="101" ...> ... Or probably more intuitive would be to store the parallel spectra sequentially (assuming that the same scan number from each function is correlated): <spectrum id="function1.1" scanNumber="1" ...> <spectrum id="function2.1" scanNumber="2" ...> ... <spectrum id="function1.2" scanNumber="100" ...> <spectrum id="function2.2" scanNumber="101" ...> ... It's either that or store each function in a separate mzML file, because mzML doesn't support multiple runs in the same file. -Matt Fredrik Levander wrote: > Hi All, > > In QTOF files from Waters with mixed MS1 and MS2 data we have several > parallel 'functions' with data being recorded into separate files. The > scan numbers are only unique within each function. In the raw data > folder we thus have several different spectra with the same scan number > (but different source files). When converting this into an mzML file it > would be good to keep the original scan numbers which are useful for > traceability, but to generate unique spectrum ids. I thus propose that > the requirement for unique scanNumbers within an mzML file is removed. > However, spectra should not be repeated within the file, so this would > NOT be applicable to the dta to mzML conversion use case. > Would such a change generate problems for the readers? > How is this solved in MassWolf? > > > Regards > > Fredrik > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |