From: Eric D. <ede...@sy...> - 2008-03-04 09:07:36
|
Hi everyone, here is a reminder of the call coming up in 8 hr. Dial-in information is: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to discuss all the items that have come up recently. I list them as: - Latest nativeScanReference proposal - using defaultArrayLength attribute in <spectrum> but allow override in binaryDataArray - Allow new "possible charge state" term in <ionSelection> multiple times - Frederick suggestion that selectionWindow should really be scanWindow. selectionWindow is something different, and by the way, where is it, we really need that, too. - MS^E example - msLevel - chromatograms The schedule is still: Schedule: ----------------------- Jan 25: mzML reviews returned. Official community review complete. Feb 5: mzML telecon 9:00am PST Feb 19: mzML telecon 9:00am PST Mar 4: mzML telecon 9:00am PST Mar 17: US HUPO meeting Mar 25: mzML telecon 9:00am PST Apr 8: mzML telecon 9:00am PST Apr 23: PSI meeting in Toledo May Jun 1-5: ASMS - Must be done and advertising it here! |
From: Eric D. <ede...@sy...> - 2008-03-04 18:26:05
|
Hi everyone, here are my notes from the telecon. Thank you for your participation. Please let me know if I forgot/misunderstood anything: Meeting minutes 2008-03-04 9:00am PST Present: Darren, Jim, Matt, Lennart, Josh, Eric - Eric's most recent nativeID proposal is accepted except for the format of the value, which should be: nativeID="19" nativeID="2,6,5" nativeID="run1,3,4" nativeID=""run1,2",3,4" - possible charge state suggestion is fine with folks: - 0-1 "charge state" XOR - 2-N "possible charge state" - Come up with an example of how we want summed spectra to look: - one file that includes original scan and a summed spectrum - in old mzXML, we can have different start_scan and end_scan. How do we encode that? - defaultArrayLength proposal is fine - Jim has some examples of files with PDA data He will come up some examples of this Perhaps encoded as the MS spectrum as usual <spectrum index="27" nativeID="19"> and then another <spectrum index="28" nativeID="PDA19"> with a spectrum type "PDA spectrum" - Jim will also come up MALDI data examples for us - agreed that we rename selectionWindow to scanWindow as this is a misnomer - add optionally to <ionSelection>, all 2 new cvParams "selection window m/z lower limit" and "selection window m/z upper limit" for the *true* fragmentation selection window start and stop - agreed change acqNumber to acquisitionNumber - What to do with msLevel? agreed make it a cvParam next to spectrum type instead of a attribute of <spectrum> because for example a PDA spectrum will have no msLevel - state that index should be for seeking rather than cataloguing. So no msLevel in the index Index should be *complete* so it could be used for a list of items to seek to but should contain only identifiers, not attributes/metadata - Start a thread on this to discuss more - Matt will send <chromatogram> example since he is working on these now - Next meeting indeed March 25 as on schedule. This is *3* weeks from now, because of US HUPO - Maybe a few of us could meet and chat at US HUPO although unlikely anything official. ________________________________ From: Eric Deutsch Sent: Tuesday, March 04, 2008 1:08 AM To: 'Mass spectrometry standard development' Cc: Eric Deutsch Subject: PSI-MSSWG call in 8 hr Hi everyone, here is a reminder of the call coming up in 8 hr. Dial-in information is: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to discuss all the items that have come up recently. I list them as: - Latest nativeScanReference proposal - using defaultArrayLength attribute in <spectrum> but allow override in binaryDataArray - Allow new "possible charge state" term in <ionSelection> multiple times - Frederick suggestion that selectionWindow should really be scanWindow. selectionWindow is something different, and by the way, where is it, we really need that, too. - MS^E example - msLevel - chromatograms The schedule is still: Schedule: ----------------------- Jan 25: mzML reviews returned. Official community review complete. Feb 5: mzML telecon 9:00am PST Feb 19: mzML telecon 9:00am PST Mar 4: mzML telecon 9:00am PST Mar 17: US HUPO meeting Mar 25: mzML telecon 9:00am PST Apr 8: mzML telecon 9:00am PST Apr 23: PSI meeting in Toledo May Jun 1-5: ASMS - Must be done and advertising it here! |
From: Fredrik L. <Fre...@im...> - 2008-03-05 13:24:02
|
Not just 'number' since in <acquisition>? Fredrik > > > > - agreed change acqNumber to acquisitionNumber > > > |
From: Coleman, M. <MK...@St...> - 2008-03-06 20:25:18
|
> nativeID="19" > nativeID="2,6,5" > nativeID="run1,3,4" > nativeID=""run1,2",3,4" I don't understand what the meaning would be in these four cases. The fourth would seem to be "run1,2",3,4 Is this just covering a corner case to show that the value of 'nativeID' is a comma-separated list of items, each of which can include commas if quoted? (and if so, can an item also include a double-quote character?) > - 0-1 "charge state" > XOR > - 2-N "possible charge state" By specifying "2-N", it seems like you're implicitly saying that by declaring that +2 or +3 is "possible" that all other charges are not possible. The alternative would be to specify "1-N", in which case +1 could mean "I think the charge is +1, but it might be something else". Unless this interpretation is ruled out, it seems like "1-N" ought to be specified. > Index should be *complete* so it could be used for a list of items to seek to Might also specify that it's one-to-one: that is, every index entry must actually point to one spectrum (that is actually present in the file). Is it specified that the order of the index entries is the same as the order of the spectra in the file? Mike |
From: Matthew C. <mat...@va...> - 2008-03-06 21:42:47
|
Hi Michael, Coleman, Michael wrote: >> nativeID="19" >> nativeID="2,6,5" >> nativeID="run1,3,4" >> nativeID=""run1,2",3,4" >> > I don't understand what the meaning would be in these four cases. The fourth would seem to be > > "run1,2",3,4 > > Is this just covering a corner case to show that the value of 'nativeID' is a comma-separated list of items, each of which can include commas if quoted? (and if so, can an item also include a double-quote character?) > Yes, it is covering that case, although I don't know if I would call it a corner case. I think strings inside native IDs are moderately likely (for example, for IDs from ABI's 4000 series instruments), and they might contain commas. Good catch about the double quote character, though. We should allow for it, but I don't know how to escape it! Perhaps we should URL encode the string components instead of using XML-escaped double quotes? So the last example would be: %22run1%2C2%22,3,4 That looks awful, but otherwise the string "foo" (including the quotes as part of the string) would have to be encoded like """foo""" (the implicit rule being that a pair of " should be treated as a double-escaped quotation, i.e. it's part of the id string instead of delimiting the string itself. >> - 0-1 "charge state" >> XOR >> - 2-N "possible charge state" >> > > By specifying "2-N", it seems like you're implicitly saying that by declaring that +2 or +3 is "possible" that all other charges are not possible. > > The alternative would be to specify "1-N", in which case +1 could mean "I think the charge is +1, but it might be something else". Unless this interpretation is ruled out, it seems like "1-N" ought to be specified. > I'm not sure if you're confused about the meaning of 2-N in this case or if I'm confused about your meaning. The 2-N means the number of elements of that type that there can be. If the "possible charge state" element is used, there must be at least two proposed possible charge states or else it makes no sense to qualify the charge state assessment with the word "possible." In any case, if "possible charge state" was 1-N, it would be confusing for implementors as to which one to use when they are trying to figure out what to write. However, perhaps we should rename the term "probable charge state" since technically any charge state is possible? :) >> Index should be *complete* so it could be used for a list of items to seek to >> > > Might also specify that it's one-to-one: that is, every index entry must actually point to one spectrum (that is actually present in the file). Is it specified that the order of the index entries is the same as the order of the spectra in the file? > I think both these things are already specified. If not, I agree that they should be. -Matt |
From: Coleman, M. <MK...@St...> - 2008-03-06 23:30:47
|
> From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Matthew Chambers > > > > "run1,2",3,4 > > > Good catch about the double quote character, > though. We should allow for it, but I don't know how to escape it! > Perhaps we should URL encode the string components instead of using > XML-escaped double quotes? So the last example would be: > %22run1%2C2%22,3,4 > > That looks awful, but otherwise the string "foo" (including the quotes > as part of the string) would have to be encoded like > """foo""" (the implicit rule being that a pair of > " should be treated as a double-escaped quotation, i.e. it's part > of the id string instead of delimiting the string itself. A point of confusion here is that the quoting within "run1,2" is operating at a different level than the XML quoting. Perhaps the simplest regime would be to simply specify '\"' for '"' and '\\' for '\'. Ugly, but everyone would be familiar with it (from C, sh, etc), and I think it won't collide with XML escaping. (The "" for " thing reminds me of VMS. Does Windows do this, too?) > If the "possible charge state" element > is used, there must be at least two proposed possible charge states or > else it makes no sense to qualify the charge state assessment with the > word "possible." I guess I'm overthinking this one. The thing I was wondering about was whether it would be useful/necessary to be able to distinguish between "I think the charge of this spectrum is +1, but it might be something else" versus "I'm sure that the charge of this spectrum is +1". --Mike |
From: Matthew C. <mat...@va...> - 2008-03-10 14:50:34
|
Coleman, Michael wrote: >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> >>> "run1,2",3,4 >>> >> Good catch about the double quote character, >> though. We should allow for it, but I don't know how to escape it! >> Perhaps we should URL encode the string components instead of using >> XML-escaped double quotes? So the last example would be: >> %22run1%2C2%22,3,4 >> >> That looks awful, but otherwise the string "foo" (including the quotes >> as part of the string) would have to be encoded like >> """foo""" (the implicit rule being that a pair of >> " should be treated as a double-escaped quotation, i.e. it's part >> of the id string instead of delimiting the string itself. >> > A point of confusion here is that the quoting within "run1,2" is operating at a different level than the XML quoting. Perhaps the simplest regime would be to simply specify '\"' for '"' and '\\' for '\'. Ugly, but everyone would be familiar with it (from C, sh, etc), and I think it won't collide with XML escaping. > > (The "" for " thing reminds me of VMS. Does Windows do this, too?) > " and ' cannot appear in an XML file except as quotes for attributes. So it would have to be "\"". "\\" would still work to escape backslash, but I think sticking with the XML way of escaping things is more consistent. I doubt everyone will be familiar with the C-style escape convention, even if it's shared by some *nix shells. I think Windows does use "" to escape quotes, I'm not sure though. I'm not married to the weird convention that I suggested though. In fact, since I currently only expect that strings in nativeIDs will be for exporting ABI 4000 series data, and that comes from a database instead of a file, there are other issues surrounding that. I will start a thread asking for some example ABI series data. >> If the "possible charge state" element >> is used, there must be at least two proposed possible charge states or >> else it makes no sense to qualify the charge state assessment with the >> word "possible." >> > I guess I'm overthinking this one. The thing I was wondering about was whether it would be useful/necessary to be able to distinguish between "I think the charge of this spectrum is +1, but it might be something else" versus "I'm sure that the charge of this spectrum is +1". > If a file writer wants to convey the semantics of "I think the charge of this spectrum is +1, but it might be something else," then I think they should include the "something else" in the list of possible/probable charge states. The only other reasonable alternative I can see is to always treat it like a list and give each charge a probability (which I expect is what Darren's group would like to see. :) ) -Matt |
From: Coleman, M. <MK...@St...> - 2008-03-10 16:18:42
|
Matthew Chambers: > >>> "run1,2",3,4 > >> %22run1%2C2%22,3,4 > >> """foo""" (the implicit rule being > >> that a pair of " > > A point of confusion here is that the quoting within > "run1,2" is operating at a different level than the XML > quoting. Perhaps the simplest regime would be to simply > specify '\"' for '"' and '\\' for '\'. Ugly, but everyone > would be familiar with it (from C, sh, etc), and I think it > won't collide with XML escaping. > " and ' cannot appear in an XML file except as quotes for > attributes. So > it would have to be "\"". "\\" would still work to escape > backslash, but I think sticking with the XML way of escaping things is > more consistent. I doubt everyone will be familiar with the C-style > escape convention, even if it's shared by some *nix shells. I think > Windows does use "" to escape quotes, I'm not sure though. I was referring, with my examples, to the inner-level of quoting. There's so much going on here that it's difficult to even talk about, I think. Working from the inside out: 1. I can have a particular "run name" within the comma-separated list. For example: run"with"bizarre"name 2. Within the list, this might look like run"with"biz,arre"name,another"biz,arre"name"< except that that won't work as is, because we can't tell the commas in the names from the commas separating the names. We need to escape the commas within names somehow, together with escaping for the escaping, so that we will still be able to form names that ultimately contain any character sequence. It seems like there are two basic approaches here: (a) use an XML-ish escape mechanism, or (b) use something completely different. For (b) I'll use the C-ish backslash idea. I'm in favor of (b) because (a) will make everyone's head explode. (Note that (a) is *not* straight XML escaping--it can't be. Rather, it'll have to be a matter of running the string in question through XML (or XML-ish) escape interpretation a second time.) Assuming (b), we might have "run\"with\"biz,arre\"name","another\"biz,arre\"name"<" or 'run"with"biz,arre"name','another"biz,arre"name"<' if we decide that this comma-separated format can also use single-quotes instead of double quotes (as XML and Python do). Note carefully, however, that none of this is yet XML! We're still "inside". 3. Moving outward with the first of those two, now we will XML-escape it, so that it is a valid XML attribute: <yyy zzz="run\"with\"biz,arre\"name","another\"biz,arre\"name"<"> That's not enough, though, because we "captured" that string '"<' that looks like XML, but is actually part of the name. We have to be sure to escape the ampersands, too: <yyy zzz="run\"with\"biz,arre\"name","another\"biz,arre\"name&quot;&lt;"> This is indeed pretty awful, but it's difficult to see what would be better. If you want to try (a) above (which is I think what you mean when you say "stick with the XML way of quoting"), I'd be curious to see that worked out in the same way. If it's going into the standard, I definitely think that an example like this should be worked to make sure that everyone understands how things are supposed to work. What leaps out at me is how ugly and complex this is. (It also reminds me of why I don't like XML.) Hopefully most mzML producers will not generate stuff like this, but every consumer will need to correctly interpret it. The chances that everyone will be able to implement this stuff correctly seem very low. I think that an XML person would look at this and say that all of this is a sign that the whole inner structure of this comma separated list is too complex and needs to be broken out with something like <yyy> <zzzlist> <zzz> run"with"biz,arre"name </zzz> <zzz> another"biz,arre"name"quot;"lt; </zzz> </zzzlist> ... I don't necessarily agree with that, but I do think that we're kind of torturing XML by trying to squeeze all of that information into one attribute value. Another alternative that I think should be seriously considered is to just give up and restrict run names to a small set of characters like letters (upper and lower) digits these four characters: .-_: or perhaps a subset of these--that is, roughly the characters used in identifiers in typical programming languages. Would this be a terrible hardship? Mike |
From: Eric D. <ede...@sy...> - 2008-03-25 06:49:16
|
Hi everyone, the call is coming up in 9 hr. + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 During the call, let's discuss: - synching Darren's code and the xsd and example files and validator to current state - Various todo's: o JimS will provide some PDA spectrum examples o Matt will send <chromatogram> example o Pierre-Alain will finish mzML <--> MIAPE mapping/checking - Figure out cvParam datatype validation plan - Runup to Toledo meeting - Address cvParam category name issue, as described in spec doc. Do we want: A) <cvParam cvRef="MS" accession="MS:1000583" name="SRM spectrum"/> C) <cvParam cvRef="MS" accession="MS:1000583" name="SRM spectrum" categoryAccession="MS:1000035"/> (where MS:1000035="spectrum type") The category accession is not needed or used under normal circumstances by Darren's reader and presumably most readers. But it could be useful in cases where a term not known to the reader is used. Under scenario A, if the reader software predates MS:1000583, the reader will be very hard pressed to know what to do with this term. In theory, under the same scenario with option C, the reader could more easily determine what to do with this piece of information even though the exact term is unknown. Maybe "cannot determine spectrum type" should be a terminal error, while "unrecognized spectrum type" might merely be a warning. Darren and I discussed for a while at US HUPO. Let's see if we can come to a decision. Thanks! Eric ________________________________ From: Eric Deutsch Sent: Tuesday, March 04, 2008 10:26 AM To: 'Mass spectrometry standard development' Cc: Eric Deutsch Subject: RE: PSI-MSSWG call in 8 hr Hi everyone, here are my notes from the telecon. Thank you for your participation. Please let me know if I forgot/misunderstood anything: Meeting minutes 2008-03-04 9:00am PST Present: Darren, Jim, Matt, Lennart, Josh, Eric - Eric's most recent nativeID proposal is accepted except for the format of the value, which should be: nativeID="19" nativeID="2,6,5" nativeID="run1,3,4" nativeID=""run1,2",3,4" - possible charge state suggestion is fine with folks: - 0-1 "charge state" XOR - 2-N "possible charge state" - Come up with an example of how we want summed spectra to look: - one file that includes original scan and a summed spectrum - in old mzXML, we can have different start_scan and end_scan. How do we encode that? - defaultArrayLength proposal is fine - Jim has some examples of files with PDA data He will come up some examples of this Perhaps encoded as the MS spectrum as usual <spectrum index="27" nativeID="19"> and then another <spectrum index="28" nativeID="PDA19"> with a spectrum type "PDA spectrum" - Jim will also come up MALDI data examples for us - agreed that we rename selectionWindow to scanWindow as this is a misnomer - add optionally to <ionSelection>, all 2 new cvParams "selection window m/z lower limit" and "selection window m/z upper limit" for the *true* fragmentation selection window start and stop - agreed change acqNumber to acquisitionNumber - What to do with msLevel? agreed make it a cvParam next to spectrum type instead of a attribute of <spectrum> because for example a PDA spectrum will have no msLevel - state that index should be for seeking rather than cataloguing. So no msLevel in the index Index should be *complete* so it could be used for a list of items to seek to but should contain only identifiers, not attributes/metadata - Start a thread on this to discuss more - Matt will send <chromatogram> example since he is working on these now - Next meeting indeed March 25 as on schedule. This is *3* weeks from now, because of US HUPO - Maybe a few of us could meet and chat at US HUPO although unlikely anything official. ________________________________ From: Eric Deutsch Sent: Tuesday, March 04, 2008 1:08 AM To: 'Mass spectrometry standard development' Cc: Eric Deutsch Subject: PSI-MSSWG call in 8 hr Hi everyone, here is a reminder of the call coming up in 8 hr. Dial-in information is: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 The agenda will be to discuss all the items that have come up recently. I list them as: - Latest nativeScanReference proposal - using defaultArrayLength attribute in <spectrum> but allow override in binaryDataArray - Allow new "possible charge state" term in <ionSelection> multiple times - Frederick suggestion that selectionWindow should really be scanWindow. selectionWindow is something different, and by the way, where is it, we really need that, too. - MS^E example - msLevel - chromatograms The schedule is still: Schedule: ----------------------- Jan 25: mzML reviews returned. Official community review complete. Feb 5: mzML telecon 9:00am PST Feb 19: mzML telecon 9:00am PST Mar 4: mzML telecon 9:00am PST Mar 17: US HUPO meeting Mar 25: mzML telecon 9:00am PST Apr 8: mzML telecon 9:00am PST Apr 23: PSI meeting in Toledo May Jun 1-5: ASMS - Must be done and advertising it here! |
From: Angel P. <an...@ma...> - 2008-03-25 15:24:35
|
On Tue, Mar 25, 2008 at 2:49 AM, Eric Deutsch <ede...@sy...> wrote: > > - Address cvParam category name issue, as described in spec doc. Do we > want: > > A) <cvParam cvRef="MS" accession="MS:1000583" name="SRM spectrum"/> > > C) <cvParam cvRef="MS" accession="MS:1000583" name="SRM spectrum" > categoryAccession="MS:1000035"/> > > (where MS:1000035="spectrum type") > > The category accession is not needed or used under normal circumstances by > Darren's reader and presumably most readers. But it could be useful in cases > where a term not known to the reader is used. Under scenario A, if the > reader software predates MS:1000583, the reader will be very hard pressed to > know what to do with this term. In theory, under the same scenario with > option C, the reader could more easily determine what to do with this piece > of information even though the exact term is unknown. Maybe "cannot > determine spectrum type" should be a terminal error, while "unrecognized > spectrum type" might merely be a warning. Darren and I discussed for a while > at US HUPO. Let's see if we can come to a decision. > > Can't attend the call, but I really can't keep quiet about this. categoryAccession, or any other method of scoping terms (or essentially defining new terms) within xml instances that do not live within the referenced CV itslef is a *bad* idea. Here's the thing about cvRef, you can use more than one source wihtin an XML instance. Here is the thing about CV's and ontologies, they can reference and inherit from each other. So option (D) is born: D) <cvParam cvRef="MyMS" accession="ME:0000501" name="SRM spectrum"/> and within the MyMS CV, we would have ME:000501 IS_A MS:1000035 And the world is at peace and one with itself. Or at least that is the way is should be. |