Thread: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder (Page 4)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I think the way we store the data is widely accepted and we should not 
change it.

If want your format human-readable, you can store the m/z values as 
comments or use text files.

Another possiblity with mzML is to annotate each peak with a string 
containing the ascii representation of the m/z value. It's not 
human-readable because it is Base64 encoded, perhaps even zipped, but 
you can store the information like that if you want to.

Best,
  Marc
> Matt,
>
> Resolution depends on instrument, tuning and settings - I don't know the current state of reporting such information (or its reliability) in current instruments. 
>
> We have long held all of our data in ASCII form (not just MS) - if you want flexibility and accuracy, this is the only path without inventing a new data structure. Error limits and annotation can be added as we like (peak labeling, for example).
>
> We will consider using comments - but I suspect no one will know they are there but us.
>
> Note that our focus is quite different from others - we are dealing with data that we have processed, perhaps heavily. I still ask for an optional ASCII data representation for reference data.
>
> -Steve
>
> -----Original Message-----
> From: Matt Chambers [mailto:mat...@va...] 
> Sent: Friday, June 12, 2009 9:22 AM
> To: Mass spectrometry standard development
> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder
>
> Now this I can agree with, especially with ppm representation when 
> appropriate. But doesn't the instrument's mass resolution and related CV 
> terms convey this information? And if someone doesn't write those at all 
> or can't write them in a machine-readable numeric representation, it 
> seems unlikely they will have done a proper job of rounding m/z values. 
> This is kind of the reason I was opposed to using strings to represent 
> mass resolution, but I was overruled. Perhaps we should revisit that? It 
> makes sense to me because it's a less redundant placement of this 
> precision information.
>
> Steve, do you agree with using XML comments to actually show 
> human-readable peak lists in the mzML? That seems like an orthogonal 
> issue to the precision one.
>
> -Matt
>
>
> Stein, Stephen E. Dr. wrote:
>   
>> that would be a nice addition - also allow ppm representation - more complex precision representations can be delayed for future versions.
>>
>> -----Original Message-----
>> From: Fredrik Levander [mailto:Fre...@im...] 
>> Sent: Friday, June 12, 2009 8:28 AM
>> To: Mass spectrometry standard development
>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder
>>
>> Wouldn't it make sense to add an optional CV term for the number of 
>> significant digits in a binary array? This way it would be easy to get 
>> back to the ASCII representation if a peak list with x number of 
>> decimals was converted to mzML. It might not be so useful for conversion 
>> of raw data, but if a peak list have been rounded to a certain number of 
>> decimals, that's information which shouldn't been thrown away when 
>> converting to mzML. The info could also be used for a viewer to show the 
>> right number of decimals.
>>
>> Fredrik
>>
>> Pierre-Alain Binz wrote:
>>   
>>     
>>> One question to Steve and others.
>>> reading mzML, as well as any othe files, has to be done with an 
>>> editor, being a simple text editor or a more elaborated viewer.
>>>
>>> Would a more elaborated XML viewer/editor that knows how to read 
>>> binary data and round it if needed not be an ideal "straight" reader 
>>> of mzML instead of using a more plain text viewer?
>>> I know and myself also like to "call back" values with a defined 
>>> number of digits, as they were entered. And it's up to the software 
>>> design to "not interpret" what I have entered. But today, it's 
>>> relatively easy to get a XML reader that could "translate" the binary 
>>> arrays in a "mz Intensity" two column format with appropriate rounding 
>>> if necessary, so that it looks exactly as if it was an ascii table 
>>> (don't forget that in mzML the mz and intensity arrays are separate 
>>> and anyway have to be interpreted to look like a 2 column ascii table. 
>>> If the answer is OK, then we could stay with binary format, taking 
>>> care of the "precision issue" via the graphical view, and be therefore 
>>> compatible with the ascii precision.
>>>
>>> This sounds like a way to bring the technical question to a more 
>>> phylosophical, "ergonomic" one, but probably worth at that stage.
>>>
>>> Pierre-Alain
>>>
>>> Matthew Chambers wrote:
>>>     
>>>       
>>>> No measurements I'm aware of in proteomic mass spec use more than 15 
>>>> base 10 digits, which is the number of digits that double precision 
>>>> floats can represent without precision loss. That means that even if a 
>>>> value goes in as 1.5 (which can't be represented exactly), then as long 
>>>> as we round to the 15th digit we don't lose precision. As others have 
>>>> said, we can thus "round-trip" 15 digits. We get this high degree of 
>>>> fidelity to the source data without all the assumptions involved with 
>>>> the ASCII representation: I use doubles consistently then I'm always 
>>>> providing 15 significant digits. And if we did need more than 15, then 
>>>> ASCII is still a very inefficient encoding. You'd want to use arbitrary 
>>>> precision fixed or floating point binary types, which can't be computed 
>>>> on very easily or efficiently, but they are the Right Way to achieve 
>>>> arbitrary precision (i.e. no unspecified assumptions, well defined byte 
>>>> width, fast parsing).
>>>>
>>>> So in fact, you can preserve this "poor person's" significant digits 
>>>> encoding: if the software is doing its job, then it will go out the same 
>>>> way it came in! The real nastiness with floating point is when the 
>>>> precision loss accumulates every time an arithmetic operation happens on 
>>>> a cumulative sum or product.
>>>>
>>>> -Matt
>>>>
>>>>
>>>> Stein, Stephen E. Dr. wrote:
>>>>   
>>>>       
>>>>         
>>>>> Yes, that is what I had in mind - you get drilled in that when you take a lab course in Chemistry or Physics (maybe it has been dropped in recent years). It is a poor person's way of providing error limits (the lowest significant figure contains the precision of measurement). 
>>>>>
>>>>> It is true that if only affects 10% of values, but that's enough for me to be concerned. I suppose we could put ASCII in a comment field, but physical quantities do have precisions, and stuffing measured values in those floating formats loses some of it.
>>>>>
>>>>> Sorry to say, this problem generally affects binary representations of measured values - one reason why I have liked the ASCII nature of XML - and hate to lose it.
>>>>>  
>>>>> -Steve
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mike Coleman [mailto:tu...@gm...] 
>>>>> Sent: Thursday, June 11, 2009 4:41 PM
>>>>> To: Mass spectrometry standard development
>>>>> Subject: Re: [Psidev-ms-dev] PSI-MSS WG Tuesday call reminder
>>>>>
>>>>> I took it to mean that with "1", "1.5", "1.50", one gets an implied
>>>>> level of precision.  That is, "1.5" is generally understood to mean
>>>>> 1.5 +/- 0.05.  If I give you the IEEE float 1.5, much less is implied
>>>>> about the precision of this value, unless it's explicitly stated
>>>>> elsewhere.  (If you have a whole set of these, then you probably can
>>>>> work out the equivalent precision, but this is a bit of a stretch.)
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>> On Thu, Jun 11, 2009 at 3:23 PM, Angel Pizarro<an...@ma...> wrote:
>>>>>   
>>>>>     
>>>>>         
>>>>>           
>>>>>> Is your question whether we can successfully round-trip the numbers? Eg. go
>>>>>> from an ascii format to mzML back to originating ascii format and get the
>>>>>> same exact numbers? I believe that when we pack the numbers and unpack them
>>>>>> (at least in my non-validating ruby implementations) the numbers and
>>>>>> significance are completely the same. E.g. 1.005 === 1.005 and not
>>>>>> 1.005000000000001
>>>>>> -angel
>>>>>>     
>>>>>>       
>>>>>>             
>
>
> ------------------------------------------------------------------------------
> Crystal Reports - New Free Runtime and 30 Day Trial
> Check out the new simplified licensing option that enables unlimited
> royalty-free distribution of the report engine for externally facing 
> server and web deployment.
> http://p.sf.net/sfu/businessobjects
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>
> ------------------------------------------------------------------------------
> Crystal Reports - New Free Runtime and 30 Day Trial
> Check out the new simplified licensing option that enables unlimited
> royalty-free distribution of the report engine for externally facing 
> server and web deployment.
> http://p.sf.net/sfu/businessobjects
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>