You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Matthew C. <mat...@va...> - 2009-06-15 15:08:20
|
I'm not familiar with JAXB, but I would expect a reasonable serializer to/from base64 to work from binary byte[], not from base64 byte[], i.e. the serialization should transparently encode/decode the base64 for you. Perhaps changing to String is forcibly changing its behavior. This might be lost in translation, but I really don't understand these sentences: "But what I am having trouble with is the data I read from an existing mzML file. We used the default JAXB generated mapping and the data stored by the Unmarshaller in the mapped byte[] variable does not seem to be base64 encoded." What do you mean you "used" the default mapping? The past tense doesn't make sense to me there, unless you're talking about the validity of the existing mzML file - and if the file doesn't have valid base64, I don't know why you'd want to read it. The present tense makes sense, i.e. what mapping are you currently using to read the existing file? -Matt Florian Reisinger wrote: > Hi Matt, > > maybe I did not express myself well enough. > > We do have methods that take care of converting the data from the base64 encoded XML data into a > more convenient double[] in Java. Taking into account the CVParams for precision and compression, > this will actually hold double or float values. The user can then check if the actual data contained > in the array is in double or float values... > (I still have to cater for the int and String representations, though) > > > But what I am having trouble with is the data I read from an existing mzML file. We used the default > JAXB generated mapping and the data stored by the Unmarshaller in the mapped byte[] variable does > not seem to be base64 encoded. It seems the data in the XML is not correctly converted into the Java > byte[], which renders my attempts to base64-decode it useless. > > With the changes I mentioned to the XML schema, the mapping changes and the data is now unmarshalled > into a String variable. The byte[] obtained from this String is base64 encoded and if converted into > double/float I get reasonable values... > > > Cheers, > Florian > > > > Matt Chambers wrote: > >> Hi Florian, >> >> You might do better to customize the interface for binaryDataHandler in >> order to provide the array data as an Object that may be a Float[], >> Double[], Int32[], Int64[], and now String[]. Relying on your users to >> convert byte[] to the desired array type is a bit harsh, especially if >> it's compressed. But just having them check the type of the array before >> doing anything with it is easy and convenient. :) >> >> -Matt >> >> >> Florian Reisinger wrote: >> >>> Hi, >>> >>> due to requests for jmzml, the Java API for mzML we developed in the PRIDE team, we had to deal with >>> the 'binary' data elements in the BinaryDataArray. >>> >>> The default JAXB mapping of the xsd:base64Binary type to Java results in a byte[], which seems to >>> cause troubles when trying to convert with standard base64 algorithms as it does not produce the >>> expected result. >>> >>> I had to add an extension to the mzML schema defining the binary type to be of type "text/plain" >>> (see below), so the Java mapping would result in a String representation of the binary data instead >>> of a byte[]. The byte[] obtained from that String can then be converted to/from base64binary without >>> problem. >>> >>> The change in the schema is as follows: >>> >>> original: >>> <xs:element name="binary" type="xs:base64Binary"/> >>> >>> modified: >>> <xs:element name="binary" type="xs:base64Binary" xmime:expectedContentTypes="text/plain" >>> xmlns:xmime="http://www.w3.org/2005/05/xmlmime"/> >>> >>> >>> Now, I was wondering if someone else has experienced similar issues or if someone can see issues >>> with this modification of the schema (is there a potential issue with the schema specs? or should it >>> be defined in more detail in case there might be issues with other tools as well?). >>> >>> Also, I am not a expert in XML/Java mapping, so if there is an alternative/preferred way to handle >>> this type of data, I would be happy to know about it. >>> >>> Thanks, >>> Florian >>> >>> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables unlimited >> royalty-free distribution of the report engine for externally facing >> server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Florian R. <fl...@eb...> - 2009-06-15 14:46:16
|
Hi Matt, maybe I did not express myself well enough. We do have methods that take care of converting the data from the base64 encoded XML data into a more convenient double[] in Java. Taking into account the CVParams for precision and compression, this will actually hold double or float values. The user can then check if the actual data contained in the array is in double or float values... (I still have to cater for the int and String representations, though) But what I am having trouble with is the data I read from an existing mzML file. We used the default JAXB generated mapping and the data stored by the Unmarshaller in the mapped byte[] variable does not seem to be base64 encoded. It seems the data in the XML is not correctly converted into the Java byte[], which renders my attempts to base64-decode it useless. With the changes I mentioned to the XML schema, the mapping changes and the data is now unmarshalled into a String variable. The byte[] obtained from this String is base64 encoded and if converted into double/float I get reasonable values... Cheers, Florian Matt Chambers wrote: > Hi Florian, > > You might do better to customize the interface for binaryDataHandler in > order to provide the array data as an Object that may be a Float[], > Double[], Int32[], Int64[], and now String[]. Relying on your users to > convert byte[] to the desired array type is a bit harsh, especially if > it's compressed. But just having them check the type of the array before > doing anything with it is easy and convenient. :) > > -Matt > > > Florian Reisinger wrote: >> Hi, >> >> due to requests for jmzml, the Java API for mzML we developed in the PRIDE team, we had to deal with >> the 'binary' data elements in the BinaryDataArray. >> >> The default JAXB mapping of the xsd:base64Binary type to Java results in a byte[], which seems to >> cause troubles when trying to convert with standard base64 algorithms as it does not produce the >> expected result. >> >> I had to add an extension to the mzML schema defining the binary type to be of type "text/plain" >> (see below), so the Java mapping would result in a String representation of the binary data instead >> of a byte[]. The byte[] obtained from that String can then be converted to/from base64binary without >> problem. >> >> The change in the schema is as follows: >> >> original: >> <xs:element name="binary" type="xs:base64Binary"/> >> >> modified: >> <xs:element name="binary" type="xs:base64Binary" xmime:expectedContentTypes="text/plain" >> xmlns:xmime="http://www.w3.org/2005/05/xmlmime"/> >> >> >> Now, I was wondering if someone else has experienced similar issues or if someone can see issues >> with this modification of the schema (is there a potential issue with the schema specs? or should it >> be defined in more detail in case there might be issues with other tools as well?). >> >> Also, I am not a expert in XML/Java mapping, so if there is an alternative/preferred way to handle >> this type of data, I would be happy to know about it. >> >> Thanks, >> Florian >> > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matt C. <mat...@va...> - 2009-06-15 13:09:55
|
Hi Florian, You might do better to customize the interface for binaryDataHandler in order to provide the array data as an Object that may be a Float[], Double[], Int32[], Int64[], and now String[]. Relying on your users to convert byte[] to the desired array type is a bit harsh, especially if it's compressed. But just having them check the type of the array before doing anything with it is easy and convenient. :) -Matt Florian Reisinger wrote: > Hi, > > due to requests for jmzml, the Java API for mzML we developed in the PRIDE team, we had to deal with > the 'binary' data elements in the BinaryDataArray. > > The default JAXB mapping of the xsd:base64Binary type to Java results in a byte[], which seems to > cause troubles when trying to convert with standard base64 algorithms as it does not produce the > expected result. > > I had to add an extension to the mzML schema defining the binary type to be of type "text/plain" > (see below), so the Java mapping would result in a String representation of the binary data instead > of a byte[]. The byte[] obtained from that String can then be converted to/from base64binary without > problem. > > The change in the schema is as follows: > > original: > <xs:element name="binary" type="xs:base64Binary"/> > > modified: > <xs:element name="binary" type="xs:base64Binary" xmime:expectedContentTypes="text/plain" > xmlns:xmime="http://www.w3.org/2005/05/xmlmime"/> > > > Now, I was wondering if someone else has experienced similar issues or if someone can see issues > with this modification of the schema (is there a potential issue with the schema specs? or should it > be defined in more detail in case there might be issues with other tools as well?). > > Also, I am not a expert in XML/Java mapping, so if there is an alternative/preferred way to handle > this type of data, I would be happy to know about it. > > Thanks, > Florian > |
From: Florian R. <fl...@eb...> - 2009-06-15 10:58:27
|
Hi, due to requests for jmzml, the Java API for mzML we developed in the PRIDE team, we had to deal with the 'binary' data elements in the BinaryDataArray. The default JAXB mapping of the xsd:base64Binary type to Java results in a byte[], which seems to cause troubles when trying to convert with standard base64 algorithms as it does not produce the expected result. I had to add an extension to the mzML schema defining the binary type to be of type "text/plain" (see below), so the Java mapping would result in a String representation of the binary data instead of a byte[]. The byte[] obtained from that String can then be converted to/from base64binary without problem. The change in the schema is as follows: original: <xs:element name="binary" type="xs:base64Binary"/> modified: <xs:element name="binary" type="xs:base64Binary" xmime:expectedContentTypes="text/plain" xmlns:xmime="http://www.w3.org/2005/05/xmlmime"/> Now, I was wondering if someone else has experienced similar issues or if someone can see issues with this modification of the schema (is there a potential issue with the schema specs? or should it be defined in more detail in case there might be issues with other tools as well?). Also, I am not a expert in XML/Java mapping, so if there is an alternative/preferred way to handle this type of data, I would be happy to know about it. Thanks, Florian |
From: Matthew C. <mat...@va...> - 2009-06-12 15:10:00
|
Fixed. Matt Chambers wrote: > Yes, looks like I forgot about neutral change spectra while getting so > ambitious with the mapping file changes. Either we need some basic logic > in the mapping file (if spectrum is this type, this use this rule, else > another rule), or this is another one of those special rules. I'll fix > this later today. > > -Matt > > > Fredrik Levander wrote: > >> Hi, I am afraid that it will not be possible to make the neutral loss >> spectra valid with a required target mass for precursor and product, >> since there are none those a spectra (just two moving isolation >> windows). So I guess these new rules will have to be changed back to MAY >> again, even if they should be required in most other types of spectra. >> The other exception is precursor ion spectra, were the precursor >> isolation window is also moving. >> >> Fredrik >> >> Marc Sturm wrote: >> >> >>> Hi all, >>> >>> good news from the example files. I just validated all examples from the >>> website. Only two files contain minor errors: >>> >>> ------- >>> >>> file name: MzMLFile_PDA.mzML >>> file type: mzML >>> >>> Validating mzML file against XML schema version 1.10 >>> Success: the file is valid! >>> >>> Semantically validating mzML file: >>> Error: CV term must have a unit: MS:1000515 - intensity array >>> Error: CV term must have a unit: MS:1000515 - intensity array >>> Failed: errors are listed above! >>> >>> ------- >>> >>> file name: neutral_loss_example_1.1.0.mzML >>> file type: mzML >>> >>> Validating mzML file against XML schema version 1.10 >>> Success: the file is valid! >>> >>> Semantically validating mzML file: >>> Error: Violated mapping rule 'precursor_isolationwindow_must' at element >>> '/mzML/run/spectrumList/spectrum/precursorList/precursor/isolationWindow', >>> 1 should be present, 0 found! >>> Error: Violated mapping rule 'product_isolationwindow_must' at element >>> '/mzML/run/spectrumList/spectrum/productList/product/isolationWindow', 1 >>> should be present, 0 found! >>> Failed: errors are listed above! >>> >>> Best, >>> Marc >>> >>> > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Matthew C. <mat...@va...> - 2009-06-12 15:07:30
|
Stein, Stephen E. Dr. wrote: > Matt, > > Resolution depends on instrument, tuning and settings - I don't know the current state of reporting such information (or its reliability) in current instruments. > Right, that's my understanding. But without knowing this information, rounding m/z values in ASCII or binary is dangerously lossy. > We have long held all of our data in ASCII form (not just MS) - if you want flexibility and accuracy, this is the only path without inventing a new data structure. Error limits and annotation can be added as we like (peak labeling, for example). > This is what I was refuting below. Assuming 15 or fewer base10 digits are needed, a double precision float is a better representation than ASCII in every way except human readability. Do you have examples of reference data that uses more than 15 digits in ASCII? Peak annotations can be added to both the XML comments and in a non-standard data array for each spectrum: null terminated strings are a new binary data type we agreed to support after this week's conference call. > We will consider using comments - but I suspect no one will know they are there but us. > And how is that different than no one knowing whether an optional ASCII data representation is in the file? I guarantee you that the XML comment will be more human readable than the ASCII representations that have been proposed so far. And unless you can demonstrate that you need more than 15 digits of precision in your data, human readability is the only reason for ASCII representation. -Matt > Note that our focus is quite different from others - we are dealing with data that we have processed, perhaps heavily. I still ask for an optional ASCII data representation for reference data. > > -Steve > > -----Original Message----- > From: Matt Chambers [mailto:mat...@va...] > Sent: Friday, June 12, 2009 9:22 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder > > Now this I can agree with, especially with ppm representation when > appropriate. But doesn't the instrument's mass resolution and related CV > terms convey this information? And if someone doesn't write those at all > or can't write them in a machine-readable numeric representation, it > seems unlikely they will have done a proper job of rounding m/z values. > This is kind of the reason I was opposed to using strings to represent > mass resolution, but I was overruled. Perhaps we should revisit that? It > makes sense to me because it's a less redundant placement of this > precision information. > > Steve, do you agree with using XML comments to actually show > human-readable peak lists in the mzML? That seems like an orthogonal > issue to the precision one. > > -Matt > > > Stein, Stephen E. Dr. wrote: > >> that would be a nice addition - also allow ppm representation - more complex precision representations can be delayed for future versions. >> >> -----Original Message----- >> From: Fredrik Levander [mailto:Fre...@im...] >> Sent: Friday, June 12, 2009 8:28 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >> >> Wouldn't it make sense to add an optional CV term for the number of >> significant digits in a binary array? This way it would be easy to get >> back to the ASCII representation if a peak list with x number of >> decimals was converted to mzML. It might not be so useful for conversion >> of raw data, but if a peak list have been rounded to a certain number of >> decimals, that's information which shouldn't been thrown away when >> converting to mzML. The info could also be used for a viewer to show the >> right number of decimals. >> >> Fredrik >> >> Pierre-Alain Binz wrote: >> >> >>> One question to Steve and others. >>> reading mzML, as well as any othe files, has to be done with an >>> editor, being a simple text editor or a more elaborated viewer. >>> >>> Would a more elaborated XML viewer/editor that knows how to read >>> binary data and round it if needed not be an ideal "straight" reader >>> of mzML instead of using a more plain text viewer? >>> I know and myself also like to "call back" values with a defined >>> number of digits, as they were entered. And it's up to the software >>> design to "not interpret" what I have entered. But today, it's >>> relatively easy to get a XML reader that could "translate" the binary >>> arrays in a "mz Intensity" two column format with appropriate rounding >>> if necessary, so that it looks exactly as if it was an ascii table >>> (don't forget that in mzML the mz and intensity arrays are separate >>> and anyway have to be interpreted to look like a 2 column ascii table. >>> If the answer is OK, then we could stay with binary format, taking >>> care of the "precision issue" via the graphical view, and be therefore >>> compatible with the ascii precision. >>> >>> This sounds like a way to bring the technical question to a more >>> phylosophical, "ergonomic" one, but probably worth at that stage. >>> >>> Pierre-Alain >>> >>> Matthew Chambers wrote: >>> >>> >>>> No measurements I'm aware of in proteomic mass spec use more than 15 >>>> base 10 digits, which is the number of digits that double precision >>>> floats can represent without precision loss. That means that even if a >>>> value goes in as 1.5 (which can't be represented exactly), then as long >>>> as we round to the 15th digit we don't lose precision. As others have >>>> said, we can thus "round-trip" 15 digits. We get this high degree of >>>> fidelity to the source data without all the assumptions involved with >>>> the ASCII representation: I use doubles consistently then I'm always >>>> providing 15 significant digits. And if we did need more than 15, then >>>> ASCII is still a very inefficient encoding. You'd want to use arbitrary >>>> precision fixed or floating point binary types, which can't be computed >>>> on very easily or efficiently, but they are the Right Way to achieve >>>> arbitrary precision (i.e. no unspecified assumptions, well defined byte >>>> width, fast parsing). >>>> >>>> So in fact, you can preserve this "poor person's" significant digits >>>> encoding: if the software is doing its job, then it will go out the same >>>> way it came in! The real nastiness with floating point is when the >>>> precision loss accumulates every time an arithmetic operation happens on >>>> a cumulative sum or product. >>>> >>>> -Matt >>>> >>>> >>>> Stein, Stephen E. Dr. wrote: >>>> >>>> >>>> >>>>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >>>>> >>>>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >>>>> >>>>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >>>>> >>>>> -Steve >>>>> >>>>> -----Original Message----- >>>>> From: Mike Coleman [mailto:tu...@gm...] >>>>> Sent: Thursday, June 11, 2009 4:41 PM >>>>> To: Mass spectrometry standard development >>>>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >>>>> >>>>> I took it to mean that with "1", "1.5", "1.50", one gets an implied >>>>> level of precision. That is, "1.5" is generally understood to mean >>>>> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >>>>> about the precision of this value, unless it's explicitly stated >>>>> elsewhere. (If you have a whole set of these, then you probably can >>>>> work out the equivalent precision, but this is a bit of a stretch.) >>>>> >>>>> Mike >>>>> >>>>> >>>>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Is your question whether we can successfully round-trip the numbers? Eg. go >>>>>> from an ascii format to mzML back to originating ascii format and get the >>>>>> same exact numbers? I believe that when we pack the numbers and unpack them >>>>>> (at least in my non-validating ruby implementations) the numbers and >>>>>> significance are completely the same. E.g. 1.005 === 1.005 and not >>>>>> 1.005000000000001 >>>>>> -angel >>>>>> >>>>>> >>>>>> > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Marc S. <st...@in...> - 2009-06-12 15:04:32
|
Hi all, I think the way we store the data is widely accepted and we should not change it. If want your format human-readable, you can store the m/z values as comments or use text files. Another possiblity with mzML is to annotate each peak with a string containing the ascii representation of the m/z value. It's not human-readable because it is Base64 encoded, perhaps even zipped, but you can store the information like that if you want to. Best, Marc > Matt, > > Resolution depends on instrument, tuning and settings - I don't know the current state of reporting such information (or its reliability) in current instruments. > > We have long held all of our data in ASCII form (not just MS) - if you want flexibility and accuracy, this is the only path without inventing a new data structure. Error limits and annotation can be added as we like (peak labeling, for example). > > We will consider using comments - but I suspect no one will know they are there but us. > > Note that our focus is quite different from others - we are dealing with data that we have processed, perhaps heavily. I still ask for an optional ASCII data representation for reference data. > > -Steve > > -----Original Message----- > From: Matt Chambers [mailto:mat...@va...] > Sent: Friday, June 12, 2009 9:22 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder > > Now this I can agree with, especially with ppm representation when > appropriate. But doesn't the instrument's mass resolution and related CV > terms convey this information? And if someone doesn't write those at all > or can't write them in a machine-readable numeric representation, it > seems unlikely they will have done a proper job of rounding m/z values. > This is kind of the reason I was opposed to using strings to represent > mass resolution, but I was overruled. Perhaps we should revisit that? It > makes sense to me because it's a less redundant placement of this > precision information. > > Steve, do you agree with using XML comments to actually show > human-readable peak lists in the mzML? That seems like an orthogonal > issue to the precision one. > > -Matt > > > Stein, Stephen E. Dr. wrote: > >> that would be a nice addition - also allow ppm representation - more complex precision representations can be delayed for future versions. >> >> -----Original Message----- >> From: Fredrik Levander [mailto:Fre...@im...] >> Sent: Friday, June 12, 2009 8:28 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >> >> Wouldn't it make sense to add an optional CV term for the number of >> significant digits in a binary array? This way it would be easy to get >> back to the ASCII representation if a peak list with x number of >> decimals was converted to mzML. It might not be so useful for conversion >> of raw data, but if a peak list have been rounded to a certain number of >> decimals, that's information which shouldn't been thrown away when >> converting to mzML. The info could also be used for a viewer to show the >> right number of decimals. >> >> Fredrik >> >> Pierre-Alain Binz wrote: >> >> >>> One question to Steve and others. >>> reading mzML, as well as any othe files, has to be done with an >>> editor, being a simple text editor or a more elaborated viewer. >>> >>> Would a more elaborated XML viewer/editor that knows how to read >>> binary data and round it if needed not be an ideal "straight" reader >>> of mzML instead of using a more plain text viewer? >>> I know and myself also like to "call back" values with a defined >>> number of digits, as they were entered. And it's up to the software >>> design to "not interpret" what I have entered. But today, it's >>> relatively easy to get a XML reader that could "translate" the binary >>> arrays in a "mz Intensity" two column format with appropriate rounding >>> if necessary, so that it looks exactly as if it was an ascii table >>> (don't forget that in mzML the mz and intensity arrays are separate >>> and anyway have to be interpreted to look like a 2 column ascii table. >>> If the answer is OK, then we could stay with binary format, taking >>> care of the "precision issue" via the graphical view, and be therefore >>> compatible with the ascii precision. >>> >>> This sounds like a way to bring the technical question to a more >>> phylosophical, "ergonomic" one, but probably worth at that stage. >>> >>> Pierre-Alain >>> >>> Matthew Chambers wrote: >>> >>> >>>> No measurements I'm aware of in proteomic mass spec use more than 15 >>>> base 10 digits, which is the number of digits that double precision >>>> floats can represent without precision loss. That means that even if a >>>> value goes in as 1.5 (which can't be represented exactly), then as long >>>> as we round to the 15th digit we don't lose precision. As others have >>>> said, we can thus "round-trip" 15 digits. We get this high degree of >>>> fidelity to the source data without all the assumptions involved with >>>> the ASCII representation: I use doubles consistently then I'm always >>>> providing 15 significant digits. And if we did need more than 15, then >>>> ASCII is still a very inefficient encoding. You'd want to use arbitrary >>>> precision fixed or floating point binary types, which can't be computed >>>> on very easily or efficiently, but they are the Right Way to achieve >>>> arbitrary precision (i.e. no unspecified assumptions, well defined byte >>>> width, fast parsing). >>>> >>>> So in fact, you can preserve this "poor person's" significant digits >>>> encoding: if the software is doing its job, then it will go out the same >>>> way it came in! The real nastiness with floating point is when the >>>> precision loss accumulates every time an arithmetic operation happens on >>>> a cumulative sum or product. >>>> >>>> -Matt >>>> >>>> >>>> Stein, Stephen E. Dr. wrote: >>>> >>>> >>>> >>>>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >>>>> >>>>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >>>>> >>>>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >>>>> >>>>> -Steve >>>>> >>>>> -----Original Message----- >>>>> From: Mike Coleman [mailto:tu...@gm...] >>>>> Sent: Thursday, June 11, 2009 4:41 PM >>>>> To: Mass spectrometry standard development >>>>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >>>>> >>>>> I took it to mean that with "1", "1.5", "1.50", one gets an implied >>>>> level of precision. That is, "1.5" is generally understood to mean >>>>> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >>>>> about the precision of this value, unless it's explicitly stated >>>>> elsewhere. (If you have a whole set of these, then you probably can >>>>> work out the equivalent precision, but this is a bit of a stretch.) >>>>> >>>>> Mike >>>>> >>>>> >>>>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Is your question whether we can successfully round-trip the numbers? Eg. go >>>>>> from an ascii format to mzML back to originating ascii format and get the >>>>>> same exact numbers? I believe that when we pack the numbers and unpack them >>>>>> (at least in my non-validating ruby implementations) the numbers and >>>>>> significance are completely the same. E.g. 1.005 === 1.005 and not >>>>>> 1.005000000000001 >>>>>> -angel >>>>>> >>>>>> >>>>>> > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Stein, S. E. Dr. <ste...@ni...> - 2009-06-12 14:53:35
|
Matt, Resolution depends on instrument, tuning and settings - I don't know the current state of reporting such information (or its reliability) in current instruments. We have long held all of our data in ASCII form (not just MS) - if you want flexibility and accuracy, this is the only path without inventing a new data structure. Error limits and annotation can be added as we like (peak labeling, for example). We will consider using comments - but I suspect no one will know they are there but us. Note that our focus is quite different from others - we are dealing with data that we have processed, perhaps heavily. I still ask for an optional ASCII data representation for reference data. -Steve -----Original Message----- From: Matt Chambers [mailto:mat...@va...] Sent: Friday, June 12, 2009 9:22 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder Now this I can agree with, especially with ppm representation when appropriate. But doesn't the instrument's mass resolution and related CV terms convey this information? And if someone doesn't write those at all or can't write them in a machine-readable numeric representation, it seems unlikely they will have done a proper job of rounding m/z values. This is kind of the reason I was opposed to using strings to represent mass resolution, but I was overruled. Perhaps we should revisit that? It makes sense to me because it's a less redundant placement of this precision information. Steve, do you agree with using XML comments to actually show human-readable peak lists in the mzML? That seems like an orthogonal issue to the precision one. -Matt Stein, Stephen E. Dr. wrote: > that would be a nice addition - also allow ppm representation - more complex precision representations can be delayed for future versions. > > -----Original Message----- > From: Fredrik Levander [mailto:Fre...@im...] > Sent: Friday, June 12, 2009 8:28 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder > > Wouldn't it make sense to add an optional CV term for the number of > significant digits in a binary array? This way it would be easy to get > back to the ASCII representation if a peak list with x number of > decimals was converted to mzML. It might not be so useful for conversion > of raw data, but if a peak list have been rounded to a certain number of > decimals, that's information which shouldn't been thrown away when > converting to mzML. The info could also be used for a viewer to show the > right number of decimals. > > Fredrik > > Pierre-Alain Binz wrote: > >> One question to Steve and others. >> reading mzML, as well as any othe files, has to be done with an >> editor, being a simple text editor or a more elaborated viewer. >> >> Would a more elaborated XML viewer/editor that knows how to read >> binary data and round it if needed not be an ideal "straight" reader >> of mzML instead of using a more plain text viewer? >> I know and myself also like to "call back" values with a defined >> number of digits, as they were entered. And it's up to the software >> design to "not interpret" what I have entered. But today, it's >> relatively easy to get a XML reader that could "translate" the binary >> arrays in a "mz Intensity" two column format with appropriate rounding >> if necessary, so that it looks exactly as if it was an ascii table >> (don't forget that in mzML the mz and intensity arrays are separate >> and anyway have to be interpreted to look like a 2 column ascii table. >> If the answer is OK, then we could stay with binary format, taking >> care of the "precision issue" via the graphical view, and be therefore >> compatible with the ascii precision. >> >> This sounds like a way to bring the technical question to a more >> phylosophical, "ergonomic" one, but probably worth at that stage. >> >> Pierre-Alain >> >> Matthew Chambers wrote: >> >>> No measurements I'm aware of in proteomic mass spec use more than 15 >>> base 10 digits, which is the number of digits that double precision >>> floats can represent without precision loss. That means that even if a >>> value goes in as 1.5 (which can't be represented exactly), then as long >>> as we round to the 15th digit we don't lose precision. As others have >>> said, we can thus "round-trip" 15 digits. We get this high degree of >>> fidelity to the source data without all the assumptions involved with >>> the ASCII representation: I use doubles consistently then I'm always >>> providing 15 significant digits. And if we did need more than 15, then >>> ASCII is still a very inefficient encoding. You'd want to use arbitrary >>> precision fixed or floating point binary types, which can't be computed >>> on very easily or efficiently, but they are the Right Way to achieve >>> arbitrary precision (i.e. no unspecified assumptions, well defined byte >>> width, fast parsing). >>> >>> So in fact, you can preserve this "poor person's" significant digits >>> encoding: if the software is doing its job, then it will go out the same >>> way it came in! The real nastiness with floating point is when the >>> precision loss accumulates every time an arithmetic operation happens on >>> a cumulative sum or product. >>> >>> -Matt >>> >>> >>> Stein, Stephen E. Dr. wrote: >>> >>> >>>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >>>> >>>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >>>> >>>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >>>> >>>> -Steve >>>> >>>> -----Original Message----- >>>> From: Mike Coleman [mailto:tu...@gm...] >>>> Sent: Thursday, June 11, 2009 4:41 PM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >>>> >>>> I took it to mean that with "1", "1.5", "1.50", one gets an implied >>>> level of precision. That is, "1.5" is generally understood to mean >>>> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >>>> about the precision of this value, unless it's explicitly stated >>>> elsewhere. (If you have a whole set of these, then you probably can >>>> work out the equivalent precision, but this is a bit of a stretch.) >>>> >>>> Mike >>>> >>>> >>>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >>>> >>>> >>>> >>>>> Is your question whether we can successfully round-trip the numbers? Eg. go >>>>> from an ascii format to mzML back to originating ascii format and get the >>>>> same exact numbers? I believe that when we pack the numbers and unpack them >>>>> (at least in my non-validating ruby implementations) the numbers and >>>>> significance are completely the same. E.g. 1.005 === 1.005 and not >>>>> 1.005000000000001 >>>>> -angel >>>>> >>>>> ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Steffen N. <sne...@ip...> - 2009-06-12 14:03:28
|
On Fri, 2009-06-12 at 11:34 +0200, Marc Sturm wrote: > Error: CV term must have a unit: MS:1000515 - intensity array > Error: CV term must have a unit: MS:1000515 - intensity array The PDA is measuring absorbance of light by the sample. Our machine software reports "milli-AU". My first guesses do not validate: Semantically validating mzML file: Error: Unit CV term not allowed: UO:0000186 - dimensionless unit of term MS:1000515 - intensity array Error: Unit CV term not allowed: UO:0000157 - light unit of term MS:1000515 - intensity array Error: Unit CV term not allowed: UO:0000187 - percent of term MS:1000515 - intensity array Failed: errors are listed above! So I propose something like: [Term] id: XX:XXXXXXX name: Absorbance Units def: "A dimensionless logarithmic unit to measure the absorbance of light transmitted through a partially absorbing substance." is_a: UO:0000186 ! dimensionless unit maybe is_a: ID: PSI:1000460 Name: Unit where I am unsure whether that should go into MS:XXXXXXX or OU:XXXXXXX. Once I have some CV Term I can use, I'll fix the MzMLFile_PDA.xml Yours, Steffen * http://en.wikipedia.org/wiki/Absorbance Although absorbance does not have true units, it is quite often reported in "Absorbance Units" or AU * http://www.unc.edu/~rowlett/units/dictA.html absorbance unit (AU) a logarithmic unit used to measure optical density, the absorbance of light transmitted through a partially absorbing substance. If T is the percentage of light transmitted, then the absorbance is defined to be -log10 T absorbance units. An increase in absorbance of 1.0 AU corresponds to a reduction in transmittance by a factor of 10. If the absorbance is 1.0 AU then 10% of the light is transmitted; at 2.0 AU only 1% of the light is transmitted, and so on. BTW, there are two different "percent" in the Ontologies: ID: PSI:1000138 Name: Percent and ID: UO:0000187 Name: percent Yours, Steffen -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
From: Steffen N. <sne...@ip...> - 2009-06-12 14:03:16
|
On Fri, 2009-06-12 at 14:27 +0200, Fredrik Levander wrote: > Wouldn't it make sense to add an optional CV term for the number of > significant digits in a binary array? Couldn't one express significant digits via ID: PSI:1000014 Name: Accuracy This is currently geared towards m/z in ppm. Should this be modified to be applied to time or intensities in other binary arrays as well ?! Yours, Steffen -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
From: Matt C. <mat...@va...> - 2009-06-12 13:23:59
|
Now this I can agree with, especially with ppm representation when appropriate. But doesn't the instrument's mass resolution and related CV terms convey this information? And if someone doesn't write those at all or can't write them in a machine-readable numeric representation, it seems unlikely they will have done a proper job of rounding m/z values. This is kind of the reason I was opposed to using strings to represent mass resolution, but I was overruled. Perhaps we should revisit that? It makes sense to me because it's a less redundant placement of this precision information. Steve, do you agree with using XML comments to actually show human-readable peak lists in the mzML? That seems like an orthogonal issue to the precision one. -Matt Stein, Stephen E. Dr. wrote: > that would be a nice addition - also allow ppm representation - more complex precision representations can be delayed for future versions. > > -----Original Message----- > From: Fredrik Levander [mailto:Fre...@im...] > Sent: Friday, June 12, 2009 8:28 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder > > Wouldn't it make sense to add an optional CV term for the number of > significant digits in a binary array? This way it would be easy to get > back to the ASCII representation if a peak list with x number of > decimals was converted to mzML. It might not be so useful for conversion > of raw data, but if a peak list have been rounded to a certain number of > decimals, that's information which shouldn't been thrown away when > converting to mzML. The info could also be used for a viewer to show the > right number of decimals. > > Fredrik > > Pierre-Alain Binz wrote: > >> One question to Steve and others. >> reading mzML, as well as any othe files, has to be done with an >> editor, being a simple text editor or a more elaborated viewer. >> >> Would a more elaborated XML viewer/editor that knows how to read >> binary data and round it if needed not be an ideal "straight" reader >> of mzML instead of using a more plain text viewer? >> I know and myself also like to "call back" values with a defined >> number of digits, as they were entered. And it's up to the software >> design to "not interpret" what I have entered. But today, it's >> relatively easy to get a XML reader that could "translate" the binary >> arrays in a "mz Intensity" two column format with appropriate rounding >> if necessary, so that it looks exactly as if it was an ascii table >> (don't forget that in mzML the mz and intensity arrays are separate >> and anyway have to be interpreted to look like a 2 column ascii table. >> If the answer is OK, then we could stay with binary format, taking >> care of the "precision issue" via the graphical view, and be therefore >> compatible with the ascii precision. >> >> This sounds like a way to bring the technical question to a more >> phylosophical, "ergonomic" one, but probably worth at that stage. >> >> Pierre-Alain >> >> Matthew Chambers wrote: >> >>> No measurements I'm aware of in proteomic mass spec use more than 15 >>> base 10 digits, which is the number of digits that double precision >>> floats can represent without precision loss. That means that even if a >>> value goes in as 1.5 (which can't be represented exactly), then as long >>> as we round to the 15th digit we don't lose precision. As others have >>> said, we can thus "round-trip" 15 digits. We get this high degree of >>> fidelity to the source data without all the assumptions involved with >>> the ASCII representation: I use doubles consistently then I'm always >>> providing 15 significant digits. And if we did need more than 15, then >>> ASCII is still a very inefficient encoding. You'd want to use arbitrary >>> precision fixed or floating point binary types, which can't be computed >>> on very easily or efficiently, but they are the Right Way to achieve >>> arbitrary precision (i.e. no unspecified assumptions, well defined byte >>> width, fast parsing). >>> >>> So in fact, you can preserve this "poor person's" significant digits >>> encoding: if the software is doing its job, then it will go out the same >>> way it came in! The real nastiness with floating point is when the >>> precision loss accumulates every time an arithmetic operation happens on >>> a cumulative sum or product. >>> >>> -Matt >>> >>> >>> Stein, Stephen E. Dr. wrote: >>> >>> >>>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >>>> >>>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >>>> >>>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >>>> >>>> -Steve >>>> >>>> -----Original Message----- >>>> From: Mike Coleman [mailto:tu...@gm...] >>>> Sent: Thursday, June 11, 2009 4:41 PM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >>>> >>>> I took it to mean that with "1", "1.5", "1.50", one gets an implied >>>> level of precision. That is, "1.5" is generally understood to mean >>>> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >>>> about the precision of this value, unless it's explicitly stated >>>> elsewhere. (If you have a whole set of these, then you probably can >>>> work out the equivalent precision, but this is a bit of a stretch.) >>>> >>>> Mike >>>> >>>> >>>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >>>> >>>> >>>> >>>>> Is your question whether we can successfully round-trip the numbers? Eg. go >>>>> from an ascii format to mzML back to originating ascii format and get the >>>>> same exact numbers? I believe that when we pack the numbers and unpack them >>>>> (at least in my non-validating ruby implementations) the numbers and >>>>> significance are completely the same. E.g. 1.005 === 1.005 and not >>>>> 1.005000000000001 >>>>> -angel >>>>> >>>>> |
From: Stein, S. E. Dr. <ste...@ni...> - 2009-06-12 13:14:40
|
that would be a nice addition - also allow ppm representation - more complex precision representations can be delayed for future versions. -----Original Message----- From: Fredrik Levander [mailto:Fre...@im...] Sent: Friday, June 12, 2009 8:28 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder Wouldn't it make sense to add an optional CV term for the number of significant digits in a binary array? This way it would be easy to get back to the ASCII representation if a peak list with x number of decimals was converted to mzML. It might not be so useful for conversion of raw data, but if a peak list have been rounded to a certain number of decimals, that's information which shouldn't been thrown away when converting to mzML. The info could also be used for a viewer to show the right number of decimals. Fredrik Pierre-Alain Binz wrote: > One question to Steve and others. > reading mzML, as well as any othe files, has to be done with an > editor, being a simple text editor or a more elaborated viewer. > > Would a more elaborated XML viewer/editor that knows how to read > binary data and round it if needed not be an ideal "straight" reader > of mzML instead of using a more plain text viewer? > I know and myself also like to "call back" values with a defined > number of digits, as they were entered. And it's up to the software > design to "not interpret" what I have entered. But today, it's > relatively easy to get a XML reader that could "translate" the binary > arrays in a "mz Intensity" two column format with appropriate rounding > if necessary, so that it looks exactly as if it was an ascii table > (don't forget that in mzML the mz and intensity arrays are separate > and anyway have to be interpreted to look like a 2 column ascii table. > If the answer is OK, then we could stay with binary format, taking > care of the "precision issue" via the graphical view, and be therefore > compatible with the ascii precision. > > This sounds like a way to bring the technical question to a more > phylosophical, "ergonomic" one, but probably worth at that stage. > > Pierre-Alain > > Matthew Chambers wrote: >> No measurements I'm aware of in proteomic mass spec use more than 15 >> base 10 digits, which is the number of digits that double precision >> floats can represent without precision loss. That means that even if a >> value goes in as 1.5 (which can't be represented exactly), then as long >> as we round to the 15th digit we don't lose precision. As others have >> said, we can thus "round-trip" 15 digits. We get this high degree of >> fidelity to the source data without all the assumptions involved with >> the ASCII representation: I use doubles consistently then I'm always >> providing 15 significant digits. And if we did need more than 15, then >> ASCII is still a very inefficient encoding. You'd want to use arbitrary >> precision fixed or floating point binary types, which can't be computed >> on very easily or efficiently, but they are the Right Way to achieve >> arbitrary precision (i.e. no unspecified assumptions, well defined byte >> width, fast parsing). >> >> So in fact, you can preserve this "poor person's" significant digits >> encoding: if the software is doing its job, then it will go out the same >> way it came in! The real nastiness with floating point is when the >> precision loss accumulates every time an arithmetic operation happens on >> a cumulative sum or product. >> >> -Matt >> >> >> Stein, Stephen E. Dr. wrote: >> >>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >>> >>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >>> >>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >>> >>> -Steve >>> >>> -----Original Message----- >>> From: Mike Coleman [mailto:tu...@gm...] >>> Sent: Thursday, June 11, 2009 4:41 PM >>> To: Mass spectrometry standard development >>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >>> >>> I took it to mean that with "1", "1.5", "1.50", one gets an implied >>> level of precision. That is, "1.5" is generally understood to mean >>> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >>> about the precision of this value, unless it's explicitly stated >>> elsewhere. (If you have a whole set of these, then you probably can >>> work out the equivalent precision, but this is a bit of a stretch.) >>> >>> Mike >>> >>> >>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >>> >>> >>>> Is your question whether we can successfully round-trip the numbers? Eg. go >>>> from an ascii format to mzML back to originating ascii format and get the >>>> same exact numbers? I believe that when we pack the numbers and unpack them >>>> (at least in my non-validating ruby implementations) the numbers and >>>> significance are completely the same. E.g. 1.005 === 1.005 and not >>>> 1.005000000000001 >>>> -angel >>>> >>>> >>> ------------------------------------------------------------------------------ >>> Crystal Reports - New Free Runtime and 30 Day Trial >>> Check out the new simplified licensing option that enables unlimited >>> royalty-free distribution of the report engine for externally facing >>> server and web deployment. >>> http://p.sf.net/sfu/businessobjects >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> ------------------------------------------------------------------------------ >>> Crystal Reports - New Free Runtime and 30 Day Trial >>> Check out the new simplified licensing option that enables unlimited >>> royalty-free distribution of the report engine for externally facing >>> server and web deployment. >>> http://p.sf.net/sfu/businessobjects >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables unlimited >> royalty-free distribution of the report engine for externally facing >> server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matt C. <mat...@va...> - 2009-06-12 13:10:20
|
Yes, looks like I forgot about neutral change spectra while getting so ambitious with the mapping file changes. Either we need some basic logic in the mapping file (if spectrum is this type, this use this rule, else another rule), or this is another one of those special rules. I'll fix this later today. -Matt Fredrik Levander wrote: > Hi, I am afraid that it will not be possible to make the neutral loss > spectra valid with a required target mass for precursor and product, > since there are none those a spectra (just two moving isolation > windows). So I guess these new rules will have to be changed back to MAY > again, even if they should be required in most other types of spectra. > The other exception is precursor ion spectra, were the precursor > isolation window is also moving. > > Fredrik > > Marc Sturm wrote: > >> Hi all, >> >> good news from the example files. I just validated all examples from the >> website. Only two files contain minor errors: >> >> ------- >> >> file name: MzMLFile_PDA.mzML >> file type: mzML >> >> Validating mzML file against XML schema version 1.10 >> Success: the file is valid! >> >> Semantically validating mzML file: >> Error: CV term must have a unit: MS:1000515 - intensity array >> Error: CV term must have a unit: MS:1000515 - intensity array >> Failed: errors are listed above! >> >> ------- >> >> file name: neutral_loss_example_1.1.0.mzML >> file type: mzML >> >> Validating mzML file against XML schema version 1.10 >> Success: the file is valid! >> >> Semantically validating mzML file: >> Error: Violated mapping rule 'precursor_isolationwindow_must' at element >> '/mzML/run/spectrumList/spectrum/precursorList/precursor/isolationWindow', >> 1 should be present, 0 found! >> Error: Violated mapping rule 'product_isolationwindow_must' at element >> '/mzML/run/spectrumList/spectrum/productList/product/isolationWindow', 1 >> should be present, 0 found! >> Failed: errors are listed above! >> >> Best, >> Marc >> |
From: Fredrik L. <Fre...@im...> - 2009-06-12 13:01:41
|
Wouldn't it make sense to add an optional CV term for the number of significant digits in a binary array? This way it would be easy to get back to the ASCII representation if a peak list with x number of decimals was converted to mzML. It might not be so useful for conversion of raw data, but if a peak list have been rounded to a certain number of decimals, that's information which shouldn't been thrown away when converting to mzML. The info could also be used for a viewer to show the right number of decimals. Fredrik Pierre-Alain Binz wrote: > One question to Steve and others. > reading mzML, as well as any othe files, has to be done with an > editor, being a simple text editor or a more elaborated viewer. > > Would a more elaborated XML viewer/editor that knows how to read > binary data and round it if needed not be an ideal "straight" reader > of mzML instead of using a more plain text viewer? > I know and myself also like to "call back" values with a defined > number of digits, as they were entered. And it's up to the software > design to "not interpret" what I have entered. But today, it's > relatively easy to get a XML reader that could "translate" the binary > arrays in a "mz Intensity" two column format with appropriate rounding > if necessary, so that it looks exactly as if it was an ascii table > (don't forget that in mzML the mz and intensity arrays are separate > and anyway have to be interpreted to look like a 2 column ascii table. > If the answer is OK, then we could stay with binary format, taking > care of the "precision issue" via the graphical view, and be therefore > compatible with the ascii precision. > > This sounds like a way to bring the technical question to a more > phylosophical, "ergonomic" one, but probably worth at that stage. > > Pierre-Alain > > Matthew Chambers wrote: >> No measurements I'm aware of in proteomic mass spec use more than 15 >> base 10 digits, which is the number of digits that double precision >> floats can represent without precision loss. That means that even if a >> value goes in as 1.5 (which can't be represented exactly), then as long >> as we round to the 15th digit we don't lose precision. As others have >> said, we can thus "round-trip" 15 digits. We get this high degree of >> fidelity to the source data without all the assumptions involved with >> the ASCII representation: I use doubles consistently then I'm always >> providing 15 significant digits. And if we did need more than 15, then >> ASCII is still a very inefficient encoding. You'd want to use arbitrary >> precision fixed or floating point binary types, which can't be computed >> on very easily or efficiently, but they are the Right Way to achieve >> arbitrary precision (i.e. no unspecified assumptions, well defined byte >> width, fast parsing). >> >> So in fact, you can preserve this "poor person's" significant digits >> encoding: if the software is doing its job, then it will go out the same >> way it came in! The real nastiness with floating point is when the >> precision loss accumulates every time an arithmetic operation happens on >> a cumulative sum or product. >> >> -Matt >> >> >> Stein, Stephen E. Dr. wrote: >> >>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >>> >>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >>> >>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >>> >>> -Steve >>> >>> -----Original Message----- >>> From: Mike Coleman [mailto:tu...@gm...] >>> Sent: Thursday, June 11, 2009 4:41 PM >>> To: Mass spectrometry standard development >>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >>> >>> I took it to mean that with "1", "1.5", "1.50", one gets an implied >>> level of precision. That is, "1.5" is generally understood to mean >>> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >>> about the precision of this value, unless it's explicitly stated >>> elsewhere. (If you have a whole set of these, then you probably can >>> work out the equivalent precision, but this is a bit of a stretch.) >>> >>> Mike >>> >>> >>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >>> >>> >>>> Is your question whether we can successfully round-trip the numbers? Eg. go >>>> from an ascii format to mzML back to originating ascii format and get the >>>> same exact numbers? I believe that when we pack the numbers and unpack them >>>> (at least in my non-validating ruby implementations) the numbers and >>>> significance are completely the same. E.g. 1.005 === 1.005 and not >>>> 1.005000000000001 >>>> -angel >>>> >>>> >>> ------------------------------------------------------------------------------ >>> Crystal Reports - New Free Runtime and 30 Day Trial >>> Check out the new simplified licensing option that enables unlimited >>> royalty-free distribution of the report engine for externally facing >>> server and web deployment. >>> http://p.sf.net/sfu/businessobjects >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> ------------------------------------------------------------------------------ >>> Crystal Reports - New Free Runtime and 30 Day Trial >>> Check out the new simplified licensing option that enables unlimited >>> royalty-free distribution of the report engine for externally facing >>> server and web deployment. >>> http://p.sf.net/sfu/businessobjects >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables unlimited >> royalty-free distribution of the report engine for externally facing >> server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> |
From: Fredrik L. <Fre...@im...> - 2009-06-12 12:59:11
|
Hi, I am afraid that it will not be possible to make the neutral loss spectra valid with a required target mass for precursor and product, since there are none those a spectra (just two moving isolation windows). So I guess these new rules will have to be changed back to MAY again, even if they should be required in most other types of spectra. The other exception is precursor ion spectra, were the precursor isolation window is also moving. Fredrik Marc Sturm wrote: > Hi all, > > good news from the example files. I just validated all examples from the > website. Only two files contain minor errors: > > ------- > > file name: MzMLFile_PDA.mzML > file type: mzML > > Validating mzML file against XML schema version 1.10 > Success: the file is valid! > > Semantically validating mzML file: > Error: CV term must have a unit: MS:1000515 - intensity array > Error: CV term must have a unit: MS:1000515 - intensity array > Failed: errors are listed above! > > ------- > > file name: neutral_loss_example_1.1.0.mzML > file type: mzML > > Validating mzML file against XML schema version 1.10 > Success: the file is valid! > > Semantically validating mzML file: > Error: Violated mapping rule 'precursor_isolationwindow_must' at element > '/mzML/run/spectrumList/spectrum/precursorList/precursor/isolationWindow', > 1 should be present, 0 found! > Error: Violated mapping rule 'product_isolationwindow_must' at element > '/mzML/run/spectrumList/spectrum/productList/product/isolationWindow', 1 > should be present, 0 found! > Failed: errors are listed above! > > Best, > Marc > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Pierre-Alain B. <pie...@is...> - 2009-06-12 10:04:36
|
One question to Steve and others. reading mzML, as well as any othe files, has to be done with an editor, being a simple text editor or a more elaborated viewer. Would a more elaborated XML viewer/editor that knows how to read binary data and round it if needed not be an ideal "straight" reader of mzML instead of using a more plain text viewer? I know and myself also like to "call back" values with a defined number of digits, as they were entered. And it's up to the software design to "not interpret" what I have entered. But today, it's relatively easy to get a XML reader that could "translate" the binary arrays in a "mz Intensity" two column format with appropriate rounding if necessary, so that it looks exactly as if it was an ascii table (don't forget that in mzML the mz and intensity arrays are separate and anyway have to be interpreted to look like a 2 column ascii table. If the answer is OK, then we could stay with binary format, taking care of the "precision issue" via the graphical view, and be therefore compatible with the ascii precision. This sounds like a way to bring the technical question to a more phylosophical, "ergonomic" one, but probably worth at that stage. Pierre-Alain Matthew Chambers wrote: > No measurements I'm aware of in proteomic mass spec use more than 15 > base 10 digits, which is the number of digits that double precision > floats can represent without precision loss. That means that even if a > value goes in as 1.5 (which can't be represented exactly), then as long > as we round to the 15th digit we don't lose precision. As others have > said, we can thus "round-trip" 15 digits. We get this high degree of > fidelity to the source data without all the assumptions involved with > the ASCII representation: I use doubles consistently then I'm always > providing 15 significant digits. And if we did need more than 15, then > ASCII is still a very inefficient encoding. You'd want to use arbitrary > precision fixed or floating point binary types, which can't be computed > on very easily or efficiently, but they are the Right Way to achieve > arbitrary precision (i.e. no unspecified assumptions, well defined byte > width, fast parsing). > > So in fact, you can preserve this "poor person's" significant digits > encoding: if the software is doing its job, then it will go out the same > way it came in! The real nastiness with floating point is when the > precision loss accumulates every time an arithmetic operation happens on > a cumulative sum or product. > > -Matt > > > Stein, Stephen E. Dr. wrote: > >> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). >> >> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. >> >> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. >> >> -Steve >> >> -----Original Message----- >> From: Mike Coleman [mailto:tu...@gm...] >> Sent: Thursday, June 11, 2009 4:41 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder >> >> I took it to mean that with "1", "1.5", "1.50", one gets an implied >> level of precision. That is, "1.5" is generally understood to mean >> 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied >> about the precision of this value, unless it's explicitly stated >> elsewhere. (If you have a whole set of these, then you probably can >> work out the equivalent precision, but this is a bit of a stretch.) >> >> Mike >> >> >> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: >> >> >>> Is your question whether we can successfully round-trip the numbers? Eg. go >>> from an ascii format to mzML back to originating ascii format and get the >>> same exact numbers? I believe that when we pack the numbers and unpack them >>> (at least in my non-validating ruby implementations) the numbers and >>> significance are completely the same. E.g. 1.005 === 1.005 and not >>> 1.005000000000001 >>> -angel >>> >>> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables unlimited >> royalty-free distribution of the report engine for externally facing >> server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables unlimited >> royalty-free distribution of the report engine for externally facing >> server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Marc S. <stu...@gm...> - 2009-06-12 09:33:52
|
Hi all, good news from the example files. I just validated all examples from the website. Only two files contain minor errors: ------- file name: MzMLFile_PDA.mzML file type: mzML Validating mzML file against XML schema version 1.10 Success: the file is valid! Semantically validating mzML file: Error: CV term must have a unit: MS:1000515 - intensity array Error: CV term must have a unit: MS:1000515 - intensity array Failed: errors are listed above! ------- file name: neutral_loss_example_1.1.0.mzML file type: mzML Validating mzML file against XML schema version 1.10 Success: the file is valid! Semantically validating mzML file: Error: Violated mapping rule 'precursor_isolationwindow_must' at element '/mzML/run/spectrumList/spectrum/precursorList/precursor/isolationWindow', 1 should be present, 0 found! Error: Violated mapping rule 'product_isolationwindow_must' at element '/mzML/run/spectrumList/spectrum/productList/product/isolationWindow', 1 should be present, 0 found! Failed: errors are listed above! Best, Marc |
From: Marc S. <stu...@gm...> - 2009-06-12 09:22:10
|
Hi all, (1) I'd like to add the following CV terms: [Term] id: MS:1001483 name: intensity normalization def: "Normalization of data point intensities." [PSI:MS] is_a: MS:1000543 ! data processing action [Term] id: MS:1001484 name: m/z calibration def: "Calibration of data point m/z positions." [PSI:MS] is_a: MS:1000543 ! data processing action [Term] id: MS:1001485 name: data filtering def: "Filtering out part of the data." [PSI:MS] is_a: MS:1000543 ! data processing action (2) 'data filtering' should be the parent term of 'low intensity data point removal' and 'high intensity data point removal' then. I'll move them if noone objects. Best, Marc |
From: Matthew C. <mat...@va...> - 2009-06-11 22:00:16
|
No measurements I'm aware of in proteomic mass spec use more than 15 base 10 digits, which is the number of digits that double precision floats can represent without precision loss. That means that even if a value goes in as 1.5 (which can't be represented exactly), then as long as we round to the 15th digit we don't lose precision. As others have said, we can thus "round-trip" 15 digits. We get this high degree of fidelity to the source data without all the assumptions involved with the ASCII representation: I use doubles consistently then I'm always providing 15 significant digits. And if we did need more than 15, then ASCII is still a very inefficient encoding. You'd want to use arbitrary precision fixed or floating point binary types, which can't be computed on very easily or efficiently, but they are the Right Way to achieve arbitrary precision (i.e. no unspecified assumptions, well defined byte width, fast parsing). So in fact, you can preserve this "poor person's" significant digits encoding: if the software is doing its job, then it will go out the same way it came in! The real nastiness with floating point is when the precision loss accumulates every time an arithmetic operation happens on a cumulative sum or product. -Matt Stein, Stephen E. Dr. wrote: > Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). > > It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. > > Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. > > -Steve > > -----Original Message----- > From: Mike Coleman [mailto:tu...@gm...] > Sent: Thursday, June 11, 2009 4:41 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder > > I took it to mean that with "1", "1.5", "1.50", one gets an implied > level of precision. That is, "1.5" is generally understood to mean > 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied > about the precision of this value, unless it's explicitly stated > elsewhere. (If you have a whole set of these, then you probably can > work out the equivalent precision, but this is a bit of a stretch.) > > Mike > > > On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: > >> Is your question whether we can successfully round-trip the numbers? Eg. go >> from an ascii format to mzML back to originating ascii format and get the >> same exact numbers? I believe that when we pack the numbers and unpack them >> (at least in my non-validating ruby implementations) the numbers and >> significance are completely the same. E.g. 1.005 === 1.005 and not >> 1.005000000000001 >> -angel >> > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Stein, S. E. Dr. <ste...@ni...> - 2009-06-11 21:27:53
|
Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it. Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it. -Steve -----Original Message----- From: Mike Coleman [mailto:tu...@gm...] Sent: Thursday, June 11, 2009 4:41 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder I took it to mean that with "1", "1.5", "1.50", one gets an implied level of precision. That is, "1.5" is generally understood to mean 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied about the precision of this value, unless it's explicitly stated elsewhere. (If you have a whole set of these, then you probably can work out the equivalent precision, but this is a bit of a stretch.) Mike On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: > Is your question whether we can successfully round-trip the numbers? Eg. go > from an ascii format to mzML back to originating ascii format and get the > same exact numbers? I believe that when we pack the numbers and unpack them > (at least in my non-validating ruby implementations) the numbers and > significance are completely the same. E.g. 1.005 === 1.005 and not > 1.005000000000001 > -angel ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Brian P. <bri...@in...> - 2009-06-11 21:17:01
|
The goal of "round trip" is best served by the binary representation. Keep in mind that these values come off the machine as IEEE floats, not tidy human readable representations. The value you think of as "1.5" is actually a bit pattern that may well have the value 1.5000001, but (assuming a chain of conversion that never attempts a human readable representation) it's the same bit pattern that came off the mass spec, so it's the "right" one. - Brian -----Original Message----- From: Mike Coleman [mailto:tu...@gm...] Sent: Thursday, June 11, 2009 1:41 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder I took it to mean that with "1", "1.5", "1.50", one gets an implied level of precision. That is, "1.5" is generally understood to mean 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied about the precision of this value, unless it's explicitly stated elsewhere. (If you have a whole set of these, then you probably can work out the equivalent precision, but this is a bit of a stretch.) Mike On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: > Is your question whether we can successfully round-trip the numbers? Eg. go > from an ascii format to mzML back to originating ascii format and get the > same exact numbers? I believe that when we pack the numbers and unpack them > (at least in my non-validating ruby implementations) the numbers and > significance are completely the same. E.g. 1.005 === 1.005 and not > 1.005000000000001 > -angel ---------------------------------------------------------------------------- -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2009-06-11 21:09:29
|
With binary data, the same representation works for centroided and profile data points. I really hope you're not suggesting ASCII storage of profile mode data, where 9-12 bytes per X sample (12345.678901) would not be unusual? It's all the overhead of double precision floats (a constant 8 bytes) without the vastly higher dynamic range and taking much longer to parse. Using mzML to store the raw data for the libraries would be a great improvement over the status quo (assorted custom relational databases and ASCII archives?). There actually would be a standard representation. :) Coming up with a standard representation for application-specific annotations would be another challenge for a possibly separate format, but the raw data we can already handle. And with a reasonably optimized representation for profile mode, storing consensus profile spectra could become a reasonable approach for spectral libraries. Although I do wish there was an XML-friendly 8-byte text encoding standard, like the yenc encoding used on news://alt.bin.*, which we could choose instead of base64 to achieve practically no encoding bloat. -Matt Mike Coleman wrote: > On Thu, Jun 11, 2009 at 2:41 PM, Matthew > Chambers<mat...@va...> wrote: > >> However, NIST library folks have a quite straight-forward way to meet >> the "human readability" requirement: XML comments. There's no reason you >> can't put what looks like an MGF peak list in an XML comment with every >> mzML spectrum (although presumably not profile-mode ones!). >> > > I think this would be worse than the status quo. If this change is to > be made, though, may I suggest that the ASCII peaks be used in the > "real" XML and that the binary peaks go in the comments? :-) > > Mike > |
From: Mike C. <tu...@gm...> - 2009-06-11 20:53:03
|
On Thu, Jun 11, 2009 at 2:41 PM, Matthew Chambers<mat...@va...> wrote: > The many internal references in mzML to me means that it shouldn't be > considered a light-weight format that simple scripts could parse: > reading mzML with software takes a substantial API. I hope this is not true--I would be quite disappointed if mzML could not be easily parsed by simple scripts. > Thus the only > remaining benefit for ASCII peak representation (AFAIK) is human > readability of peak lists [...] If one starts with the assumption that mzML is in its best form and should not be changed, this conclusion follows directly. But if we're trying to decide whether mzML should be changed, this seems a bit like begging the question. > However, NIST library folks have a quite straight-forward way to meet > the "human readability" requirement: XML comments. There's no reason you > can't put what looks like an MGF peak list in an XML comment with every > mzML spectrum (although presumably not profile-mode ones!). I think this would be worse than the status quo. If this change is to be made, though, may I suggest that the ASCII peaks be used in the "real" XML and that the binary peaks go in the comments? :-) Mike |
From: Mike C. <tu...@gm...> - 2009-06-11 20:40:32
|
I took it to mean that with "1", "1.5", "1.50", one gets an implied level of precision. That is, "1.5" is generally understood to mean 1.5 +/- 0.05. If I give you the IEEE float 1.5, much less is implied about the precision of this value, unless it's explicitly stated elsewhere. (If you have a whole set of these, then you probably can work out the equivalent precision, but this is a bit of a stretch.) Mike On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote: > Is your question whether we can successfully round-trip the numbers? Eg. go > from an ascii format to mzML back to originating ascii format and get the > same exact numbers? I believe that when we pack the numbers and unpack them > (at least in my non-validating ruby implementations) the numbers and > significance are completely the same. E.g. 1.005 === 1.005 and not > 1.005000000000001 > -angel |
From: Angel P. <an...@ma...> - 2009-06-11 20:23:59
|
Hi Steve, Is your question whether we can successfully round-trip the numbers? Eg. go from an ascii format to mzML back to originating ascii format and get the same exact numbers? I believe that when we pack the numbers and unpack them (at least in my non-validating ruby implementations) the numbers and significance are completely the same. E.g. 1.005 === 1.005 and not 1.005000000000001 -angel On Thu, Jun 11, 2009 at 11:33 AM, Stein, Stephen E. Dr. < ste...@ni...> wrote: > Congrats on a new version….. > > > > However, I wanted to again state what I think is a defect in the standard – > the inability to accept an ASCII peak list. This prevents us from using mzML > it as the format for libraries or reference data. > > > > --- 1 is different than 1.0 and different than 1.00 …. > > > > this difference, to some, is non trivial and changes the meaning of > reference data when converted to binary. > > > > Also, the ability to see read the data is nice for those who want to do it. > > > > I suppose it’s addition will do too much damage to add to 1.2 – but I just > felt that I should bring it up again as our needs have not changed. > > > > -Steve Stein > > > ------------------------------ > > *From:* Eric Deutsch [mailto:ede...@sy...] > *Sent:* Tuesday, June 09, 2009 12:02 PM > > *To:* 'Mass spectrometry standard development' > *Cc:* 'Eric Deutsch' > *Subject:* Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder > > > > Present: Marc, Jim, Matt, Eric, Lennart, Pierre-Alain > > > > 1) mzML 1.1.0 > > - Released! > > + The whitespace issue in the xsd resolved before > > - Allowed binarydata data types > > + Add back in 32- and 64-bit integer. Those terms should be unobsoleted. > There were there > > + Add string array: null-terminated array of strings. Must have as many > nulls as elements. > > + Matt will add these to the CV > > + It will be implemented somehow in ProteoWizard and OpenMS > > + Matt will added another CV type which is binarydatatype and then > annotated mzArray, IntensityArray, and chargeArray with the appropriate > types > > - ASMS > > + There is a group that just put together a “unified” format for ion > mobility mass spec. Matt and Eric met him, and we will followup > > + Also had discussion with ANiML. Being done through ASTM > > - ASMS might help out with CV > > + David Sparkman may help us out. > > + Eric will update MSS WG page > > + Eric will email Juan Antonio about top page > > + Marc will double check with Andreas Römpp on units addition and then add > them as is > > > > 2) TraML development > > - Feedback from ASMS > > - Implementations > > + ProteoWizard has some implementation. OpenMS does as well by Andreas. Jim > is working on something > > - cvParams vs attributes > > + Problem with attributes is lack on units specification > > + Problem with attributes is default value ambiguity in C++ > > + change transition name to id of type xsd:string > > + Apply rule: any attribute that is not an id or a Ref should be switched > to cvParam > > + How does mzIdentML handle b9-18^2 ? Try to do he same? > > + What about string values? > > + Matt is advocating more cvParams, Pierre-Alain as well. Jim as well. > > + Make normalizationStandard should be cvParams H-PINS > > + Eric will make another rev beased on this. > > > > + Meet again next week same time > > > > > ------------------------------ > > *From:* Eric Deutsch [mailto:ede...@sy...] > *Sent:* Monday, June 08, 2009 3:41 PM > *To:* 'Mass spectrometry standard development' > *Cc:* 'Eric Deutsch' > *Subject:* PSI-MSS WG Tuesday call reminder > > > > Hi everyone, the next PSI Mass Spectrometry Standards Working Group call > will be Tuesday 8am PDT: > > > > > http://www.timeanddate.com/worldclock/fixedtime.html?day=09&month=6&year=2009&hour=16&min=0&sec=0&p1=136 > > > > 08:00 San Francisco > > 11:00 New York > > 16:00 London > > 17:00 Geneva > > > > + Germany: 08001012079 > > + Switzerland: 0800000860 > > + UK: 08081095644 > > + USA: 1-866-314-3683 > > Generic international: +44 2083222500 (UK number) > > > > access code: 297427 > > > > Agenda: > > > > 1) mzML 1.1.0 > > - Released! > > - Allowed binarydata data types > > - > > > > 2) TraML development > > - Feedback from ASMS > > - Implementations > > - cvParams vs attributes > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |