You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Coleman, M. <MK...@St...> - 2006-09-20 18:17:27
|
> Angel Pizarro: > I am cringing as I write this, since I really think you=20 > should not go this=20 > route, but look at the supplementary data tags. I am cringing with you. :-) Abusing the supplementary tags for this purpose is definitely out--this is an even more unpleasant option than going with a home-grown mzData extension. > ah, yes, but most probably you have to either zcat the file=20 > or unzip it in=20 > order to read the floats, then zip the whole file back again=20 > once finished, a=20 > situation not unlike decoding byte arrays and base64 strings.... Yes, having zipped files does imply having gzip/etc around, and you are correct that this is in some ways similar. A notable difference is that zip tools are already ubiquitous, standard, reliable, and well-understood by users. The scripts I'll have to write to decode mzData won't be. (Note, too, that it is not necessary to unzip and rezip in order to just read a compressed file. The 'zcat' program and its variations (there's surely a ruby module, for instance) can read the file without disturbing it.) > I'll add to those arguments that we should look at the=20 > computational costs of un/zipping whole files as opposed to stream=20 > en/decoding individual mzData spectra. I agree that zip'ing will have a greater cost than generating base64. I don't think the cost is great, and in any case, zip'ing isn't necessary unless you're hurting for disk space. =20 Disk is cheap. If I zip'ed these files, it would be as much to get the checksumming as to save the disk space. > 1) it can handle encoding of integers, single and double precision float=20 > arrays without loss of information As far as I know, a textual representation can also do this perfectly. > 2) comparable compression with zipped plain text of the same precision I agree that they're similar, within the bounds that I care about (2-3x). > 3) better performance with respect to accessing individual spectra vs. > compressed plain text If you mean that you can easily seek to a particular spectrum in a file (presuming that some index is already present), I agree that this is simpler and much faster. As far as I know, seeking in a zip file isn't really efficient. If I thought I was going to need to do this, I'd want to store the files uncompressed. (As a practical matter, I can't think of a reason we'd need to do this here.) Mike |
From: Brian P. <bri...@in...> - 2006-09-20 17:14:17
|
Hello All, > This works quite nicely! <-snip-> > <MyList>1.1 1.2 1.3</MyList> Sure, but in practice it's not really all that readable: make that list some realistic length and you're going to need to snork it up into a table so that you can find the n'th item in the list to match it with the n'th item in some other list. At that point, you have once again passed the file through a software tool and may as well reap the benefits of base64 encoding. On the topic of software support tools, the TPP (and the IPP) furnish a fairly broad set of tools that read mzData, including the ability to dump it to ASCII. Excellent point by Randy about ASCII representations giving a false sense of computational precision. We can't ever forget that under the hood these boxen are base2. BTW if there really are integer data to be had, then mzData/mzXML ought to be able to hold those data as integer. In the converters I've worked with I don't recall seeing any such scan data, though. (AFAIK "ion counts" are really just inferred from a digitized analog sensor signal, there's not actually anything in there going "I see one ion, two ions, three ions..." - but I'm no MS hardware expert). Brian Pratt www.insilicos.com/IPP > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Randy Julian > Sent: Wednesday, September 20, 2006 8:23 AM > To: 'Geer, Lewis (NIH/NLM/NCBI) [E]'; > psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? > > Hi, > > This works quite nicely! > > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > elementFormDefault="qualified" attributeFormDefault="unqualified"> > <xs:element name="root"> > <xs:complexType> > <xs:sequence> > <xs:element name="MyList"> > <xs:simpleType> > <xs:list > itemType="xs:float"/> > </xs:simpleType> > </xs:element> > </xs:sequence> > </xs:complexType> > </xs:element> > </xs:schema> > > Validates: > > <?xml version="1.0" encoding="UTF-8"?> > <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:noNamespaceSchemaLocation="list.xsd"> > <MyList>1.1 1.2 1.3</MyList> > </root> > > Any thoughts about the use of this in the schema? > > Randy > > -----Original Message----- > From: Geer, Lewis (NIH/NLM/NCBI) [E] [mailto:le...@nc...] > Sent: Wednesday, September 20, 2006 10:27 AM > To: Randy Julian; psi...@li... > Subject: RE: [Psidev-ms-dev] Why base64? > > Hi, > > XML-schema does allow space delimited lists: > > <xsd:simpleType name="listOfMyIntType"> > <xsd:list itemType="integer"/> > </xsd:simpleType> > > <listOfMyInt>20003 15037 95977 95945</listOfMyInt> > > Lewis > > > > -----Original Message----- > > From: Randy Julian [mailto:rkj...@in...] > > Sent: Wednesday, September 20, 2006 10:12 AM > > To: psi...@li... > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > This is a very interesting question which has come up several > > times before. > > As we work to develop dataXML (mzData 2.0) we should take > all of these > > concerns into consideration. > > > > Originally, mzData had both a binary and regular XML notation > > for both data > > vectors. The XML-schema data types where tested by most of > > the vendors who > > did not see the file size compression benefits you mention > > because they did > > not feel they had the ability to round either of the vectors > > in the way you > > suggest. Since the use case: 'user opens mzData file with > > notepad and see > > peaks' was not viewed as a major request, the vendors > > unanimously voted the > > non-binary arrays out for size and performance reasons (see > > the meeting > > notes from the PSI meeting in Nice). > > > > The loss of readability may now have larger consequences than > > we considered > > back then. Steve Stein's comments are good ones. I we now > have broad > > enough adoption that we want to be able to open the file and > > see the numbers > > written out in XML, then we should reconsider the validity of > > the use case. > > To do this with mzData 1.05 you would have to use the > > supplemental data > > vector (the alternative Angel suggested). > > > > The supplemental data vectors hold any type of XSD data > type including > > normal XML. However in mzData 1.05, the binary vectors are > > not optional, so > > you have to populate them to comply with the spec - even if > > you repeat the > > information in the supplemental vector. > > > > The suggested 'white space separated list' is not a valid XML > > data type, so > > if we want to keep with the XSD standard for validation, the > > peak lists have > > to be in markup like: > > > > <peak> > > <mz> > > <float>0.1</float> > > </mz> > > <inten> > > <float>100.1</float> > > </inten> > > </peak> > > > > or something similar. Other semantics could reduce the > > verbosity, but the > > basic idea is that we can only use valid XSD data types. > > > > As we move to dataXML, we will need to store other data > > objects besides mass > > spectra (MRM chromatograms for example), so we will have to > > come up with a > > more general data section regardless of the data types > > allowed. During this > > design phase we should decide what data types we want. > > > > As a historical note, the previous (current) LC-MS standard > > format uses > > netCDF as the data representation which is fully binary and utterly > > unreadable in any respect without an API. Thus this > > situation has existed > > in mass spectrometry for quite some time. The readability of > > these files > > has never been viewed as a serious weakness, although the > > 1.5-2x increase in > > file size over the original vendor file was the source of constant > > complaint. > > > > Just as a note for your comment #3, this is not so straight > > forward. If the > > instrument collects data using an Intel chip, floating-point > > raw data will > > most likely have a IEEE-754 representation. So any time you > > have a number > > in a file like 0.1, the internal representation was > > originally different > > (0.1 cannot be exactly represented in IEEE-754). When you > > read from the file > > into an IEEE standard format, it will not be 0.1 in any of > > the math you do. > > > > Let the PSI-MS team know what requirements you would like to > > see the HUPO > > standards meet. If there is strong user support for missing > > features, the > > team will include them in the development roadmap. > > > > Let's keep the discussion of improvements going! > > > > Randy > > > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Coleman, > > Michael > > Sent: Tuesday, September 19, 2006 4:39 PM > > To: psi...@li... > > Subject: [Psidev-ms-dev] Why base64? > > > > Hi, > > > > Does anyone know why base64 encoding is being used for peak mz and > > intensity values in the mzData format? It appears to me that > > there are > > three significant disadvantages to doing so: > > > > 1. Loss of readability. One of the primary reasons to use > XML in the > > first place is that it is human-readable--one can in > principle inspect > > and understand its contents with any text editor. > > Base64-encoding peak > > data destroys this transparency. (It also makes it more > difficult to > > write scripts to process the data.) > > > > 2. Increased file size. At least for our spectra, it > appears that a > > compressed (gzip/etc) ms2 file is about 15% smaller than the > > equivalent > > mzData file with the single-precision (32-bit) encoding, and > > 22% smaller > > than the double-precision version. The *uncompressed* > > single-precision > > mzData file is about about 15% smaller than the > uncompressed ms2 file; > > the double-precision version is almost twice as large. > (These figures > > are for 'gzip' default compression.) > > > > (Currently our ms2 files have mz values rounded to one > > decimal place and > > intensity values with about 4-5 significant places.) > > > > 3. Potential loss of precision information. For example, with > > single-precision encoding, a value originally given as > > 12345.1 might be > > encoded as 12345.0996. It's not easy to see from that > > encoding that the > > original value was given with one decimal place. > Worse-still, if the > > original value is significant to more than 7-or-so digits > and it gets > > 32-bit encoded, precision will be lost, probably in a way not > > immediately apparent to the user. (32-bit encoding will > probably be a > > temptation, given the size of the 64-bit encoding.) > > > > Even if base64-encoding cannot be dropped at this point, it > seems like > > it would be useful to add a "no encode" option, which would > > present peak > > data as the obvious whitespace-separated list of numeric values. > > > > Am I missing something here? I could not find any > discussion of this > > issue on the list. > > > > --Mike > > > > > > Mike Coleman, Scientific Programmer, +1 816 926 4419 > > Stowers Institute for Biomedical Research > > 1000 E. 50th St., Kansas City, MO 64110, USA > > > > -------------------------------------------------------------- > > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the > > chance to share your > > opinions on IT & business topics through brief surveys -- and > > earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > &CID=DEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the > > chance to share your > > opinions on IT & business topics through brief surveys -- and > > earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > &CID=DEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Jimmy E. <jk...@gm...> - 2006-09-20 17:12:23
|
I believe base64 encoding makes more sense for some large class of applications that will hopefully be digesting these files but I'm sure everyone can see the obvious benefits of plain text encoding of peak lists. The question I have is regarding the representation of space delimited lists as Lewis and Randy have drawn up. Does this address the needs of Michael, Steve, and Akhilesh and others? Hopefully they'll all chime in. My concern would be that having a horizontal, space separate list of numbers, where m/z and intensity will possibly be written in separate lists of floats and ints, doesn't really serve the notion of readability. Lots of folks are used to looking at lists of peaks as ordered in .mgf or .dta files and I'm not sure if a horizontal list of numbers (especially if it's 2 lists, one for m/z and one for intensity) gives you that same sense of readability. I don't really see any regular use case scenarios where people would be scrolling over to the 68th m/z in the list and then somehow counting over to the location of the 68th intensity to get its value. So _if_ this really doesn't address the needs of the folks who have concerns about the base64 encoding and would like like to see plain text, speak up. The last thing the format needs is more complexity in the form of another optional way of representing the data that only a handful of people will ever end up using. - Jimmy On 9/20/06, Randy Julian <rkj...@in...> wrote: > Hi, > > This works quite nicely! > > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > elementFormDefault="qualified" attributeFormDefault="unqualified"> > <xs:element name="root"> > <xs:complexType> > <xs:sequence> > <xs:element name="MyList"> > <xs:simpleType> > <xs:list > itemType="xs:float"/> > </xs:simpleType> > </xs:element> > </xs:sequence> > </xs:complexType> > </xs:element> > </xs:schema> > > Validates: > > <?xml version="1.0" encoding="UTF-8"?> > <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:noNamespaceSchemaLocation="list.xsd"> > <MyList>1.1 1.2 1.3</MyList> > </root> > > Any thoughts about the use of this in the schema? > > Randy > > -----Original Message----- > From: Geer, Lewis (NIH/NLM/NCBI) [E] [mailto:le...@nc...] > Sent: Wednesday, September 20, 2006 10:27 AM > To: Randy Julian; psi...@li... > Subject: RE: [Psidev-ms-dev] Why base64? > > Hi, > > XML-schema does allow space delimited lists: > > <xsd:simpleType name="listOfMyIntType"> > <xsd:list itemType="integer"/> > </xsd:simpleType> > > <listOfMyInt>20003 15037 95977 95945</listOfMyInt> > > Lewis > > > > -----Original Message----- > > From: Randy Julian [mailto:rkj...@in...] > > Sent: Wednesday, September 20, 2006 10:12 AM > > To: psi...@li... > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > This is a very interesting question which has come up several > > times before. > > As we work to develop dataXML (mzData 2.0) we should take all of these > > concerns into consideration. > > > > Originally, mzData had both a binary and regular XML notation > > for both data > > vectors. The XML-schema data types where tested by most of > > the vendors who > > did not see the file size compression benefits you mention > > because they did > > not feel they had the ability to round either of the vectors > > in the way you > > suggest. Since the use case: 'user opens mzData file with > > notepad and see > > peaks' was not viewed as a major request, the vendors > > unanimously voted the > > non-binary arrays out for size and performance reasons (see > > the meeting > > notes from the PSI meeting in Nice). > > > > The loss of readability may now have larger consequences than > > we considered > > back then. Steve Stein's comments are good ones. I we now have broad > > enough adoption that we want to be able to open the file and > > see the numbers > > written out in XML, then we should reconsider the validity of > > the use case. > > To do this with mzData 1.05 you would have to use the > > supplemental data > > vector (the alternative Angel suggested). > > > > The supplemental data vectors hold any type of XSD data type including > > normal XML. However in mzData 1.05, the binary vectors are > > not optional, so > > you have to populate them to comply with the spec - even if > > you repeat the > > information in the supplemental vector. > > > > The suggested 'white space separated list' is not a valid XML > > data type, so > > if we want to keep with the XSD standard for validation, the > > peak lists have > > to be in markup like: > > > > <peak> > > <mz> > > <float>0.1</float> > > </mz> > > <inten> > > <float>100.1</float> > > </inten> > > </peak> > > > > or something similar. Other semantics could reduce the > > verbosity, but the > > basic idea is that we can only use valid XSD data types. > > > > As we move to dataXML, we will need to store other data > > objects besides mass > > spectra (MRM chromatograms for example), so we will have to > > come up with a > > more general data section regardless of the data types > > allowed. During this > > design phase we should decide what data types we want. > > > > As a historical note, the previous (current) LC-MS standard > > format uses > > netCDF as the data representation which is fully binary and utterly > > unreadable in any respect without an API. Thus this > > situation has existed > > in mass spectrometry for quite some time. The readability of > > these files > > has never been viewed as a serious weakness, although the > > 1.5-2x increase in > > file size over the original vendor file was the source of constant > > complaint. > > > > Just as a note for your comment #3, this is not so straight > > forward. If the > > instrument collects data using an Intel chip, floating-point > > raw data will > > most likely have a IEEE-754 representation. So any time you > > have a number > > in a file like 0.1, the internal representation was > > originally different > > (0.1 cannot be exactly represented in IEEE-754). When you > > read from the file > > into an IEEE standard format, it will not be 0.1 in any of > > the math you do. > > > > Let the PSI-MS team know what requirements you would like to > > see the HUPO > > standards meet. If there is strong user support for missing > > features, the > > team will include them in the development roadmap. > > > > Let's keep the discussion of improvements going! > > > > Randy > > > > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Coleman, > > Michael > > Sent: Tuesday, September 19, 2006 4:39 PM > > To: psi...@li... > > Subject: [Psidev-ms-dev] Why base64? > > > > Hi, > > > > Does anyone know why base64 encoding is being used for peak mz and > > intensity values in the mzData format? It appears to me that > > there are > > three significant disadvantages to doing so: > > > > 1. Loss of readability. One of the primary reasons to use XML in the > > first place is that it is human-readable--one can in principle inspect > > and understand its contents with any text editor. > > Base64-encoding peak > > data destroys this transparency. (It also makes it more difficult to > > write scripts to process the data.) > > > > 2. Increased file size. At least for our spectra, it appears that a > > compressed (gzip/etc) ms2 file is about 15% smaller than the > > equivalent > > mzData file with the single-precision (32-bit) encoding, and > > 22% smaller > > than the double-precision version. The *uncompressed* > > single-precision > > mzData file is about about 15% smaller than the uncompressed ms2 file; > > the double-precision version is almost twice as large. (These figures > > are for 'gzip' default compression.) > > > > (Currently our ms2 files have mz values rounded to one > > decimal place and > > intensity values with about 4-5 significant places.) > > > > 3. Potential loss of precision information. For example, with > > single-precision encoding, a value originally given as > > 12345.1 might be > > encoded as 12345.0996. It's not easy to see from that > > encoding that the > > original value was given with one decimal place. Worse-still, if the > > original value is significant to more than 7-or-so digits and it gets > > 32-bit encoded, precision will be lost, probably in a way not > > immediately apparent to the user. (32-bit encoding will probably be a > > temptation, given the size of the 64-bit encoding.) > > > > Even if base64-encoding cannot be dropped at this point, it seems like > > it would be useful to add a "no encode" option, which would > > present peak > > data as the obvious whitespace-separated list of numeric values. > > > > Am I missing something here? I could not find any discussion of this > > issue on the list. > > > > --Mike > > > > > > Mike Coleman, Scientific Programmer, +1 816 926 4419 > > Stowers Institute for Biomedical Research > > 1000 E. 50th St., Kansas City, MO 64110, USA > > > > -------------------------------------------------------------- > > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the > > chance to share your > > opinions on IT & business topics through brief surveys -- and > > earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > &CID=DEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the > > chance to share your > > opinions on IT & business topics through brief surveys -- and > > earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > &CID=DEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Randy J. <rkj...@in...> - 2006-09-20 15:27:55
|
Hi, This works quite nicely! <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="MyList"> <xs:simpleType> <xs:list itemType="xs:float"/> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Validates: <?xml version="1.0" encoding="UTF-8"?> <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="list.xsd"> <MyList>1.1 1.2 1.3</MyList> </root> Any thoughts about the use of this in the schema? Randy -----Original Message----- From: Geer, Lewis (NIH/NLM/NCBI) [E] [mailto:le...@nc...] Sent: Wednesday, September 20, 2006 10:27 AM To: Randy Julian; psi...@li... Subject: RE: [Psidev-ms-dev] Why base64? Hi, XML-schema does allow space delimited lists: <xsd:simpleType name="listOfMyIntType"> <xsd:list itemType="integer"/> </xsd:simpleType> <listOfMyInt>20003 15037 95977 95945</listOfMyInt> Lewis > -----Original Message----- > From: Randy Julian [mailto:rkj...@in...] > Sent: Wednesday, September 20, 2006 10:12 AM > To: psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? > > This is a very interesting question which has come up several > times before. > As we work to develop dataXML (mzData 2.0) we should take all of these > concerns into consideration. > > Originally, mzData had both a binary and regular XML notation > for both data > vectors. The XML-schema data types where tested by most of > the vendors who > did not see the file size compression benefits you mention > because they did > not feel they had the ability to round either of the vectors > in the way you > suggest. Since the use case: 'user opens mzData file with > notepad and see > peaks' was not viewed as a major request, the vendors > unanimously voted the > non-binary arrays out for size and performance reasons (see > the meeting > notes from the PSI meeting in Nice). > > The loss of readability may now have larger consequences than > we considered > back then. Steve Stein's comments are good ones. I we now have broad > enough adoption that we want to be able to open the file and > see the numbers > written out in XML, then we should reconsider the validity of > the use case. > To do this with mzData 1.05 you would have to use the > supplemental data > vector (the alternative Angel suggested). > > The supplemental data vectors hold any type of XSD data type including > normal XML. However in mzData 1.05, the binary vectors are > not optional, so > you have to populate them to comply with the spec - even if > you repeat the > information in the supplemental vector. > > The suggested 'white space separated list' is not a valid XML > data type, so > if we want to keep with the XSD standard for validation, the > peak lists have > to be in markup like: > > <peak> > <mz> > <float>0.1</float> > </mz> > <inten> > <float>100.1</float> > </inten> > </peak> > > or something similar. Other semantics could reduce the > verbosity, but the > basic idea is that we can only use valid XSD data types. > > As we move to dataXML, we will need to store other data > objects besides mass > spectra (MRM chromatograms for example), so we will have to > come up with a > more general data section regardless of the data types > allowed. During this > design phase we should decide what data types we want. > > As a historical note, the previous (current) LC-MS standard > format uses > netCDF as the data representation which is fully binary and utterly > unreadable in any respect without an API. Thus this > situation has existed > in mass spectrometry for quite some time. The readability of > these files > has never been viewed as a serious weakness, although the > 1.5-2x increase in > file size over the original vendor file was the source of constant > complaint. > > Just as a note for your comment #3, this is not so straight > forward. If the > instrument collects data using an Intel chip, floating-point > raw data will > most likely have a IEEE-754 representation. So any time you > have a number > in a file like 0.1, the internal representation was > originally different > (0.1 cannot be exactly represented in IEEE-754). When you > read from the file > into an IEEE standard format, it will not be 0.1 in any of > the math you do. > > Let the PSI-MS team know what requirements you would like to > see the HUPO > standards meet. If there is strong user support for missing > features, the > team will include them in the development roadmap. > > Let's keep the discussion of improvements going! > > Randy > > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Coleman, > Michael > Sent: Tuesday, September 19, 2006 4:39 PM > To: psi...@li... > Subject: [Psidev-ms-dev] Why base64? > > Hi, > > Does anyone know why base64 encoding is being used for peak mz and > intensity values in the mzData format? It appears to me that > there are > three significant disadvantages to doing so: > > 1. Loss of readability. One of the primary reasons to use XML in the > first place is that it is human-readable--one can in principle inspect > and understand its contents with any text editor. > Base64-encoding peak > data destroys this transparency. (It also makes it more difficult to > write scripts to process the data.) > > 2. Increased file size. At least for our spectra, it appears that a > compressed (gzip/etc) ms2 file is about 15% smaller than the > equivalent > mzData file with the single-precision (32-bit) encoding, and > 22% smaller > than the double-precision version. The *uncompressed* > single-precision > mzData file is about about 15% smaller than the uncompressed ms2 file; > the double-precision version is almost twice as large. (These figures > are for 'gzip' default compression.) > > (Currently our ms2 files have mz values rounded to one > decimal place and > intensity values with about 4-5 significant places.) > > 3. Potential loss of precision information. For example, with > single-precision encoding, a value originally given as > 12345.1 might be > encoded as 12345.0996. It's not easy to see from that > encoding that the > original value was given with one decimal place. Worse-still, if the > original value is significant to more than 7-or-so digits and it gets > 32-bit encoded, precision will be lost, probably in a way not > immediately apparent to the user. (32-bit encoding will probably be a > temptation, given the size of the 64-bit encoding.) > > Even if base64-encoding cannot be dropped at this point, it seems like > it would be useful to add a "no encode" option, which would > present peak > data as the obvious whitespace-separated list of numeric values. > > Am I missing something here? I could not find any discussion of this > issue on the list. > > --Mike > > > Mike Coleman, Scientific Programmer, +1 816 926 4419 > Stowers Institute for Biomedical Research > 1000 E. 50th St., Kansas City, MO 64110, USA > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Geer, L. \(NIH/NLM/NCBI\) [E] <le...@nc...> - 2006-09-20 14:27:04
|
Hi, XML-schema does allow space delimited lists: <xsd:simpleType name=3D"listOfMyIntType"> <xsd:list itemType=3D"integer"/> </xsd:simpleType> <listOfMyInt>20003 15037 95977 95945</listOfMyInt> Lewis =20 > -----Original Message----- > From: Randy Julian [mailto:rkj...@in...]=20 > Sent: Wednesday, September 20, 2006 10:12 AM > To: psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? >=20 > This is a very interesting question which has come up several=20 > times before. > As we work to develop dataXML (mzData 2.0) we should take all of these > concerns into consideration. >=20 > Originally, mzData had both a binary and regular XML notation=20 > for both data > vectors. The XML-schema data types where tested by most of=20 > the vendors who > did not see the file size compression benefits you mention=20 > because they did > not feel they had the ability to round either of the vectors=20 > in the way you > suggest. Since the use case: 'user opens mzData file with=20 > notepad and see > peaks' was not viewed as a major request, the vendors=20 > unanimously voted the > non-binary arrays out for size and performance reasons (see=20 > the meeting > notes from the PSI meeting in Nice). >=20 > The loss of readability may now have larger consequences than=20 > we considered > back then. Steve Stein's comments are good ones. I we now have broad > enough adoption that we want to be able to open the file and=20 > see the numbers > written out in XML, then we should reconsider the validity of=20 > the use case. > To do this with mzData 1.05 you would have to use the=20 > supplemental data > vector (the alternative Angel suggested). >=20 > The supplemental data vectors hold any type of XSD data type including > normal XML. However in mzData 1.05, the binary vectors are=20 > not optional, so > you have to populate them to comply with the spec - even if=20 > you repeat the > information in the supplemental vector. >=20 > The suggested 'white space separated list' is not a valid XML=20 > data type, so > if we want to keep with the XSD standard for validation, the=20 > peak lists have > to be in markup like: >=20 > <peak> > <mz> > <float>0.1</float> > </mz> > <inten> > <float>100.1</float> > </inten> > </peak> >=20 > or something similar. Other semantics could reduce the=20 > verbosity, but the > basic idea is that we can only use valid XSD data types. >=20 > As we move to dataXML, we will need to store other data=20 > objects besides mass > spectra (MRM chromatograms for example), so we will have to=20 > come up with a > more general data section regardless of the data types=20 > allowed. During this > design phase we should decide what data types we want. >=20 > As a historical note, the previous (current) LC-MS standard=20 > format uses > netCDF as the data representation which is fully binary and utterly > unreadable in any respect without an API. Thus this=20 > situation has existed > in mass spectrometry for quite some time. The readability of=20 > these files > has never been viewed as a serious weakness, although the=20 > 1.5-2x increase in > file size over the original vendor file was the source of constant > complaint. >=20 > Just as a note for your comment #3, this is not so straight=20 > forward. If the > instrument collects data using an Intel chip, floating-point=20 > raw data will > most likely have a IEEE-754 representation. So any time you=20 > have a number > in a file like 0.1, the internal representation was=20 > originally different > (0.1 cannot be exactly represented in IEEE-754). When you=20 > read from the file > into an IEEE standard format, it will not be 0.1 in any of=20 > the math you do. >=20 > Let the PSI-MS team know what requirements you would like to=20 > see the HUPO > standards meet. If there is strong user support for missing=20 > features, the > team will include them in the development roadmap. >=20 > Let's keep the discussion of improvements going! >=20 > Randy >=20 >=20 > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On=20 > Behalf Of Coleman, > Michael > Sent: Tuesday, September 19, 2006 4:39 PM > To: psi...@li... > Subject: [Psidev-ms-dev] Why base64? >=20 > Hi, >=20 > Does anyone know why base64 encoding is being used for peak mz and > intensity values in the mzData format? It appears to me that=20 > there are > three significant disadvantages to doing so: >=20 > 1. Loss of readability. One of the primary reasons to use XML in the > first place is that it is human-readable--one can in principle inspect > and understand its contents with any text editor. =20 > Base64-encoding peak > data destroys this transparency. (It also makes it more difficult to > write scripts to process the data.) >=20 > 2. Increased file size. At least for our spectra, it appears that a > compressed (gzip/etc) ms2 file is about 15% smaller than the=20 > equivalent > mzData file with the single-precision (32-bit) encoding, and=20 > 22% smaller > than the double-precision version. The *uncompressed*=20 > single-precision > mzData file is about about 15% smaller than the uncompressed ms2 file; > the double-precision version is almost twice as large. (These figures > are for 'gzip' default compression.) >=20 > (Currently our ms2 files have mz values rounded to one=20 > decimal place and > intensity values with about 4-5 significant places.) >=20 > 3. Potential loss of precision information. For example, with > single-precision encoding, a value originally given as=20 > 12345.1 might be > encoded as 12345.0996. It's not easy to see from that=20 > encoding that the > original value was given with one decimal place. Worse-still, if the > original value is significant to more than 7-or-so digits and it gets > 32-bit encoded, precision will be lost, probably in a way not > immediately apparent to the user. (32-bit encoding will probably be a > temptation, given the size of the 64-bit encoding.) >=20 > Even if base64-encoding cannot be dropped at this point, it seems like > it would be useful to add a "no encode" option, which would=20 > present peak > data as the obvious whitespace-separated list of numeric values. >=20 > Am I missing something here? I could not find any discussion of this > issue on the list. =20 >=20 > --Mike >=20 >=20 > Mike Coleman, Scientific Programmer, +1 816 926 4419 > Stowers Institute for Biomedical Research > 1000 E. 50th St., Kansas City, MO 64110, USA >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > opinions on IT & business topics through brief surveys -- and=20 > earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >=20 >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > opinions on IT & business topics through brief surveys -- and=20 > earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >=20 |
From: Randy J. <rkj...@in...> - 2006-09-20 14:13:06
|
This is a very interesting question which has come up several times before. As we work to develop dataXML (mzData 2.0) we should take all of these concerns into consideration. Originally, mzData had both a binary and regular XML notation for both data vectors. The XML-schema data types where tested by most of the vendors who did not see the file size compression benefits you mention because they did not feel they had the ability to round either of the vectors in the way you suggest. Since the use case: 'user opens mzData file with notepad and see peaks' was not viewed as a major request, the vendors unanimously voted the non-binary arrays out for size and performance reasons (see the meeting notes from the PSI meeting in Nice). The loss of readability may now have larger consequences than we considered back then. Steve Stein's comments are good ones. I we now have broad enough adoption that we want to be able to open the file and see the numbers written out in XML, then we should reconsider the validity of the use case. To do this with mzData 1.05 you would have to use the supplemental data vector (the alternative Angel suggested). The supplemental data vectors hold any type of XSD data type including normal XML. However in mzData 1.05, the binary vectors are not optional, so you have to populate them to comply with the spec - even if you repeat the information in the supplemental vector. The suggested 'white space separated list' is not a valid XML data type, so if we want to keep with the XSD standard for validation, the peak lists have to be in markup like: <peak> <mz> <float>0.1</float> </mz> <inten> <float>100.1</float> </inten> </peak> or something similar. Other semantics could reduce the verbosity, but the basic idea is that we can only use valid XSD data types. As we move to dataXML, we will need to store other data objects besides mass spectra (MRM chromatograms for example), so we will have to come up with a more general data section regardless of the data types allowed. During this design phase we should decide what data types we want. As a historical note, the previous (current) LC-MS standard format uses netCDF as the data representation which is fully binary and utterly unreadable in any respect without an API. Thus this situation has existed in mass spectrometry for quite some time. The readability of these files has never been viewed as a serious weakness, although the 1.5-2x increase in file size over the original vendor file was the source of constant complaint. Just as a note for your comment #3, this is not so straight forward. If the instrument collects data using an Intel chip, floating-point raw data will most likely have a IEEE-754 representation. So any time you have a number in a file like 0.1, the internal representation was originally different (0.1 cannot be exactly represented in IEEE-754). When you read from the file into an IEEE standard format, it will not be 0.1 in any of the math you do. Let the PSI-MS team know what requirements you would like to see the HUPO standards meet. If there is strong user support for missing features, the team will include them in the development roadmap. Let's keep the discussion of improvements going! Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Coleman, Michael Sent: Tuesday, September 19, 2006 4:39 PM To: psi...@li... Subject: [Psidev-ms-dev] Why base64? Hi, Does anyone know why base64 encoding is being used for peak mz and intensity values in the mzData format? It appears to me that there are three significant disadvantages to doing so: 1. Loss of readability. One of the primary reasons to use XML in the first place is that it is human-readable--one can in principle inspect and understand its contents with any text editor. Base64-encoding peak data destroys this transparency. (It also makes it more difficult to write scripts to process the data.) 2. Increased file size. At least for our spectra, it appears that a compressed (gzip/etc) ms2 file is about 15% smaller than the equivalent mzData file with the single-precision (32-bit) encoding, and 22% smaller than the double-precision version. The *uncompressed* single-precision mzData file is about about 15% smaller than the uncompressed ms2 file; the double-precision version is almost twice as large. (These figures are for 'gzip' default compression.) (Currently our ms2 files have mz values rounded to one decimal place and intensity values with about 4-5 significant places.) 3. Potential loss of precision information. For example, with single-precision encoding, a value originally given as 12345.1 might be encoded as 12345.0996. It's not easy to see from that encoding that the original value was given with one decimal place. Worse-still, if the original value is significant to more than 7-or-so digits and it gets 32-bit encoded, precision will be lost, probably in a way not immediately apparent to the user. (32-bit encoding will probably be a temptation, given the size of the 64-bit encoding.) Even if base64-encoding cannot be dropped at this point, it seems like it would be useful to add a "no encode" option, which would present peak data as the obvious whitespace-separated list of numeric values. Am I missing something here? I could not find any discussion of this issue on the list. --Mike Mike Coleman, Scientific Programmer, +1 816 926 4419 Stowers Institute for Biomedical Research 1000 E. 50th St., Kansas City, MO 64110, USA ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Angel P. <an...@ma...> - 2006-09-20 13:27:12
|
On Wednesday 20 September 2006 07:53, Steve Stein wrote: > All, > > I also have the concerns expressed by Michael - transparency is important > to us, but precision even more so. We have long stored our data in ASCII to > avoid the problem, even though some judgement is sometimes necessary. As we > know 1.0000 and 0.9999 are very different things, usually the former is > really meant to be an integer. Also, abundances, since derived from ion > counts, are 'naturally' integral, as m/z values are real - of course data > systems need not conform to nature. I have dealt with MS formats where > everything is, in effect, integral. > > In our library, for example, we want the users to see the values that we > put there, so we use ASCII. It would be very desirable for us if the same > were offered in the XML's - otherwise we will have to go non-standard. > > Perhaps the ultimate answer is some way of associating uncertainty with > values, but I suppose this is a long way off. > Hmmm...... well the XML schema base64binary type can encode integer arrays, but in mzData 1.05 we have defined the arrays as floats in the specification, but not the schema, hence this is not actually enforced. One could encode of the intenBinaryArray data as ints, but it would still be a non-standard usage. It would be better to supply the integer intensity in the supDataBInarrayArray and describe the array in the supDataDesc tag. So what I am getting at is that your use case is handled by mzData, but it the consumer of the data would have to know that to use the supplementary data arrays as the intensity values. Note that you would still have to specify the intensity values in the intenArrayBinary as floats, since this is a requirement of the schema. angel > -Steve Stein > > p.s. (this is NOT NIST speaking, just one of its employees). > > At 9/19/2006 07:56 PM Tuesday, Brian Pratt wrote: > >Oh, and I forgot one extremely important thing: performance. It's > >expensive converting those base 10 representations back to base 2 > >for number crunching, visualization etc. It's much cheaper to read them > >directly as binary, even with the overhead of base64 > >decoding. > > > >Brian Pratt > >www.insilicos.com/IPP > > > > > -----Original Message----- > > > From: psi...@li... > > > [mailto:psi...@li...] On > > > Behalf Of Brian Pratt > > > Sent: Tuesday, September 19, 2006 4:31 PM > > > To: psi...@li... > > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > > > > > > When we developed the mzXML format we went through the same > > > questions. This is how I understood things: > > > > > > Readability: We as developers are an unusual use case. The > > > more likely use case for these formats is in visualization or > > > automated > > > processing, neither of which require direct eyeballing of the > > > peak lists under normal circumstances. Or at least that's how we saw > > > it. If you do really need to eyeball the peak lists there > > > are lots of tools available that will do the translation for you. > > > > > > Accuracy: Mass spec data in its raw form is generally stored > > > in binary formats, since mass specs are front ended by binary > > > computers. Conversion to and from base 10 human readable > > > representations introduces error. It's best to hold the data at its > > > original precision and translate out to human readable format > > > at whatever precision is deemed useful for eyeballing. > > > > > > File size: Sure, you can make files smaller by throwing away > > > precision, but as you begin to desire higher precision base64 quickly > > > becomes much more efficient. An excellent way to reduce file > > > size is to compress the peaklists before base64'ing them, as is done > > > in mzXML 3.0, and you do not sacrifice precision. > > > > > > Potential loss of precision information: That information > > > wasn't ever there, really. Again, mass specs are front ended > > > by binary > > > computers, so that base 10 precision information (does > > > '12345.099923123' mean '12345.1' or '12345.10' > > > or'12345.100'?) wasn't ever in > > > the datastream in the first place. The mass spec just wrote > > > a bunch of 32 or 64 bit binary numbers to the best of its (base 2) > > > ability. Looking at the bit patterns would be more revealing > > > of the precision, and base64 preserves them. As a developer, you > > > should be pleased that you don't have to wonder how many > > > digits of that value are for real and not just an artifact of > > > the base 2 to > > > base 10 formatting conversion - with base64 binary values > > > you're working with the original raw data, so those artifacts > > > aren't an > > > issue. > > > > > > Hope this helps, > > > > > > Brian Pratt > > > www.insilicos.com/IPP > > > > > > > -----Original Message----- > > > > From: psi...@li... > > > > [mailto:psi...@li...] On > > > > Behalf Of Coleman, Michael > > > > Sent: Tuesday, September 19, 2006 3:58 PM > > > > To: Angel Pizarro; psi...@li... > > > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > > > > > > From: Angel Pizarro > > > > > > > > > > > 1. Loss of readability. ... > > > > > > > > > > There actually is a space for "human readable spectra" in the > > > > > mzData format, > > > > > > > > I'm glad to hear that. I looked for this, but I did not > > > > > > see it in the > > > > > > > spec here > > > > > > http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > > > > > > > I was looking for something like a 'mzArray' and 'intenArray' tags, > > > > which would be the textual alternatives to 'mzArrayBinary' and > > > > 'intenArrayBinary'. Can you point me to an example? > > > > > > > > > but really who reads individual mz and intensity values? > > > > > > > > Well--I do. As a programmer I don't think it's an > > > > > > exaggeration to say > > > > > > > that I'm looking at the peak lists in our ms2 files every > > > > > > day. I find > > > > > > > being able to see at a glance that the peaks are basically sane, and > > > > their gross attributes (precision, count, etc.) very useful. > > > > > > > > Of course, as a programmer I can easy whip up a script to > > > > > > decode this > > > > > > > file format. I suspect most users would be stymied, though, > > > > and I think > > > > that that would be unfortunate. Since these files are part > > > > > > of a chain > > > > > > > of scientific argument, I think that as much as possible > > > > > > they ought to > > > > > > > be transparent and as open as possible to verification by > > > > eyeball (mine > > > > and those of our scientists) and alternative pieces of software. > > > > > > > > I'm not saying that this transparency is an absolute good. > > > > > > Perhaps it > > > > > > > is worth impairing so that we can have X, Y, and Z, which are > > > > considered > > > > more valuable. I'm not seeing what X, Y, and Z are, though. > > > > > > > > > > 2. Increased file size. ... > > > > > > > > > > Not a fair comparison. Most of the space in an mzData file is > > > > > actually taken up by the human-readable parameters and parameter > > > > > values of the spectra. > > > > > > > > Sorry, I should have been clearer. The numbers I gave were > > > > just for the > > > > peak lists (base64 vs text) and nothing else--no tags, no other > > > > metadata. The rest of the mzData fields would add more > > > > overhead, but I > > > > have no objection about that part. > > > > > > > > If we implemented mzData here today, our files would be bigger if we > > > > used the base64 encoding than if we used the textual > > > > > > numbers (as they > > > > > > > are in our ms2 files). > > > > > > > > > > 3. Potential loss of precision information. ... > > > > > > > > > > Actually the situtation may be reversed. Thermofinnigan, for > > > > > example, stores measured values coming off of the instrument > > > > > as double precision floats, later formatting the numbers as > > > > > needed with respect to the specific instruments limit of > > > > > > detection. > > > > > > > > 12345.1 may have originally been 12345.099923123 in the vendors > > > > > proprietary format. > > > > > > > > Okay, but isn't '12345.1' what I really want to see in this case > > > > (assuming that the vendor is correct about the instrument's > > > > > > accuracy)? > > > > > > > For this particular instance, the string '12345.1' tells me > > > > what I need > > > > to know, and a double-precision floating point value (e.g., > > > > 12345.10000000000036379) would sort of let me guess it (since > > > > double-precision has significantly more significant figures). But a > > > > single-precision value would leave me in a sort of gray area. > > > > That is, > > > > does '12345.099923123' mean '12345.1' or '12345.10' or > > > > '12345.100', for > > > > example? > > > > > > > > > I wrote an email a few days ago showing how to translate in ruby > > > > > the base64 arrays > > > > > > > > I saw it and it was quite useful to me. Part of the reason > > > > > > I'm asking > > > > > > > these questions is that I noticed in your examples that the > > > > base64-encoded values actually took more space than the > > > > > > original data. > > > > > > > Just to reiterate my main question, it looks like using > > > > base64 will make > > > > mzData less usable and more complex, as compared to straight > > > > text. What > > > > benefits come with it that offset these drawbacks? > > > > > > > > Mike > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > ----------- > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > Join SourceForge.net's Techsay panel and you'll get the > > > > chance to share your > > > > opinions on IT & business topics through brief surveys -- and > > > > earn cash > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > > > &CID=DEVDEV > > > > _______________________________________________ > > > > Psidev-ms-dev mailing list > > > > Psi...@li... > > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------- > > > ----------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the > > > chance to share your > > > opinions on IT & business topics through brief surveys -- and > > > earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > > &CID=DEVDEV > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > >------------------------------------------------------------------------- > >Take Surveys. Earn Cash. Influence the Future of IT > >Join SourceForge.net's Techsay panel and you'll get the chance to share > > your opinions on IT & business topics through brief surveys -- and earn > > cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > >Psidev-ms-dev mailing list > >Psi...@li... > >https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 E: an...@ma... |
From: Angel P. <an...@ma...> - 2006-09-20 13:18:38
|
On Tuesday 19 September 2006 18:58, Coleman, Michael wrote: > > I'm glad to hear that. I looked for this, but I did not see it in the > spec here > > http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > I am cringing as I write this, since I really think you should not go this route, but look at the supplementary data tags. > > Well--I do. As a programmer I don't think it's an exaggeration to say > that I'm looking at the peak lists in our ms2 files every day. I find > being able to see at a glance that the peaks are basically sane, and > their gross attributes (precision, count, etc.) very useful. ah, yes, but most probably you have to either zcat the file or unzip it in order to read the floats, then zip the whole file back again once finished, a situation not unlike decoding byte arrays and base64 strings.... > > Of course, as a programmer I can easy whip up a script to decode this > file format. I suspect most users would be stymied, though, and I think > that that would be unfortunate. Since these files are part of a chain > of scientific argument, I think that as much as possible they ought to > be transparent and as open as possible to verification by eyeball (mine > and those of our scientists) and alternative pieces of software. > This is really where mzData has failed the end user, namely in the set of tools that support it. Even basic marshal/unmarshal scripts are lacking. The "Specify it and they will come.." development hasn't panned out for us, sadly, so I am starting a development cycle here at UPenn to address these needs. Specifically a reasonably fast ruby framework for dealing with mzData (akin to some aspects of the TPP) starting off based on some code written by John Prince @ UTexas, called mspire. > Sorry, I should have been clearer. The numbers I gave were just for the > peak lists (base64 vs text) and nothing else--no tags, no other > metadata. The rest of the mzData fields would add more overhead, but I > have no objection about that part. > > If we implemented mzData here today, our files would be bigger if we > used the base64 encoding than if we used the textual numbers (as they > are in our ms2 files). Point taken. See Brian Pratt's responses as to why base64 is the way both mzData and mzXML are going (irrespective of the planned merge of the formats). I'll add to those arguments that we should look at the computational costs of un/zipping whole files as opposed to stream en/decoding individual mzData spectra. > > > > 3. Potential loss of precision information. ... Brian Pratt addressed these issues much more eloquently than me in his reply. > > Just to reiterate my main question, it looks like using base64 will make > mzData less usable and more complex, as compared to straight text. What > benefits come with it that offset these drawbacks? 1) it can handle encoding of integers, single and double precision float arrays without loss of information 2) comparable compression with zipped plain text of the same precision 3) better performance with respect to accessing individual spectra vs. compressed plain text -angel |
From: Steve S. <ste...@ni...> - 2006-09-20 11:54:17
|
All, I also have the concerns expressed by Michael - transparency is important to us, but precision even more so. We have long stored our data in ASCII to avoid the problem, even though some judgement is sometimes necessary. As we know 1.0000 and 0.9999 are very different things, usually the former is really meant to be an integer. Also, abundances, since derived from ion counts, are 'naturally' integral, as m/z values are real - of course data systems need not conform to nature. I have dealt with MS formats where everything is, in effect, integral. In our library, for example, we want the users to see the values that we put there, so we use ASCII. It would be very desirable for us if the same were offered in the XML's - otherwise we will have to go non-standard. Perhaps the ultimate answer is some way of associating uncertainty with values, but I suppose this is a long way off. -Steve Stein p.s. (this is NOT NIST speaking, just one of its employees). At 9/19/2006 07:56 PM Tuesday, Brian Pratt wrote: > >Oh, and I forgot one extremely important thing: performance. It's >expensive converting those base 10 representations back to base 2 >for number crunching, visualization etc. It's much cheaper to read them >directly as binary, even with the overhead of base64 >decoding. > >Brian Pratt >www.insilicos.com/IPP > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Brian Pratt > > Sent: Tuesday, September 19, 2006 4:31 PM > > To: psi...@li... > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > > > When we developed the mzXML format we went through the same > > questions. This is how I understood things: > > > > Readability: We as developers are an unusual use case. The > > more likely use case for these formats is in visualization or > > automated > > processing, neither of which require direct eyeballing of the > > peak lists under normal circumstances. Or at least that's how we saw > > it. If you do really need to eyeball the peak lists there > > are lots of tools available that will do the translation for you. > > > > Accuracy: Mass spec data in its raw form is generally stored > > in binary formats, since mass specs are front ended by binary > > computers. Conversion to and from base 10 human readable > > representations introduces error. It's best to hold the data at its > > original precision and translate out to human readable format > > at whatever precision is deemed useful for eyeballing. > > > > File size: Sure, you can make files smaller by throwing away > > precision, but as you begin to desire higher precision base64 quickly > > becomes much more efficient. An excellent way to reduce file > > size is to compress the peaklists before base64'ing them, as is done > > in mzXML 3.0, and you do not sacrifice precision. > > > > Potential loss of precision information: That information > > wasn't ever there, really. Again, mass specs are front ended > > by binary > > computers, so that base 10 precision information (does > > '12345.099923123' mean '12345.1' or '12345.10' > > or'12345.100'?) wasn't ever in > > the datastream in the first place. The mass spec just wrote > > a bunch of 32 or 64 bit binary numbers to the best of its (base 2) > > ability. Looking at the bit patterns would be more revealing > > of the precision, and base64 preserves them. As a developer, you > > should be pleased that you don't have to wonder how many > > digits of that value are for real and not just an artifact of > > the base 2 to > > base 10 formatting conversion - with base64 binary values > > you're working with the original raw data, so those artifacts > > aren't an > > issue. > > > > Hope this helps, > > > > Brian Pratt > > www.insilicos.com/IPP > > > > > -----Original Message----- > > > From: psi...@li... > > > [mailto:psi...@li...] On > > > Behalf Of Coleman, Michael > > > Sent: Tuesday, September 19, 2006 3:58 PM > > > To: Angel Pizarro; psi...@li... > > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > > > > From: Angel Pizarro > > > > > > > > 1. Loss of readability. ... > > > > > > > There actually is a space for "human readable spectra" in the > > > > mzData format, > > > > > > I'm glad to hear that. I looked for this, but I did not > > see it in the > > > spec here > > > > > > > > > > > http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > > > > > > I was looking for something like a 'mzArray' and 'intenArray' tags, > > > which would be the textual alternatives to 'mzArrayBinary' and > > > 'intenArrayBinary'. Can you point me to an example? > > > > > > > > > > but really who reads individual mz and intensity values? > > > > > > Well--I do. As a programmer I don't think it's an > > exaggeration to say > > > that I'm looking at the peak lists in our ms2 files every > > day. I find > > > being able to see at a glance that the peaks are basically sane, and > > > their gross attributes (precision, count, etc.) very useful. > > > > > > Of course, as a programmer I can easy whip up a script to > > decode this > > > file format. I suspect most users would be stymied, though, > > > and I think > > > that that would be unfortunate. Since these files are part > > of a chain > > > of scientific argument, I think that as much as possible > > they ought to > > > be transparent and as open as possible to verification by > > > eyeball (mine > > > and those of our scientists) and alternative pieces of software. > > > > > > I'm not saying that this transparency is an absolute good. > > Perhaps it > > > is worth impairing so that we can have X, Y, and Z, which are > > > considered > > > more valuable. I'm not seeing what X, Y, and Z are, though. > > > > > > > > > > > 2. Increased file size. ... > > > > > > > Not a fair comparison. Most of the space in an mzData file is > > > > actually taken up by the human-readable parameters and parameter > > > > values of the spectra. > > > > > > Sorry, I should have been clearer. The numbers I gave were > > > just for the > > > peak lists (base64 vs text) and nothing else--no tags, no other > > > metadata. The rest of the mzData fields would add more > > > overhead, but I > > > have no objection about that part. > > > > > > If we implemented mzData here today, our files would be bigger if we > > > used the base64 encoding than if we used the textual > > numbers (as they > > > are in our ms2 files). > > > > > > > > > > > 3. Potential loss of precision information. ... > > > > > > > Actually the situtation may be reversed. Thermofinnigan, for > > > > example, stores measured values coming off of the instrument > > > > as double precision floats, later formatting the numbers as > > > > needed with respect to the specific instruments limit of > > detection. > > > > 12345.1 may have originally been 12345.099923123 in the vendors > > > > proprietary format. > > > > > > Okay, but isn't '12345.1' what I really want to see in this case > > > (assuming that the vendor is correct about the instrument's > > accuracy)? > > > For this particular instance, the string '12345.1' tells me > > > what I need > > > to know, and a double-precision floating point value (e.g., > > > 12345.10000000000036379) would sort of let me guess it (since > > > double-precision has significantly more significant figures). But a > > > single-precision value would leave me in a sort of gray area. > > > That is, > > > does '12345.099923123' mean '12345.1' or '12345.10' or > > > '12345.100', for > > > example? > > > > > > > > > > I wrote an email a few days ago showing how to translate in ruby > > > > the base64 arrays > > > > > > I saw it and it was quite useful to me. Part of the reason > > I'm asking > > > these questions is that I noticed in your examples that the > > > base64-encoded values actually took more space than the > > original data. > > > > > > Just to reiterate my main question, it looks like using > > > base64 will make > > > mzData less usable and more complex, as compared to straight > > > text. What > > > benefits come with it that offset these drawbacks? > > > > > > Mike > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > ----------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the > > > chance to share your > > > opinions on IT & business topics through brief surveys -- and > > > earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > > &CID=DEVDEV > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > > -------------------------------------------------------------- > > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the > > chance to share your > > opinions on IT & business topics through brief surveys -- and > > earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > &CID=DEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >_______________________________________________ >Psidev-ms-dev mailing list >Psi...@li... >https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Brian P. <bri...@in...> - 2006-09-19 23:56:30
|
Oh, and I forgot one extremely important thing: performance. It's expensive converting those base 10 representations back to base 2 for number crunching, visualization etc. It's much cheaper to read them directly as binary, even with the overhead of base64 decoding. Brian Pratt www.insilicos.com/IPP > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Brian Pratt > Sent: Tuesday, September 19, 2006 4:31 PM > To: psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? > > > When we developed the mzXML format we went through the same > questions. This is how I understood things: > > Readability: We as developers are an unusual use case. The > more likely use case for these formats is in visualization or > automated > processing, neither of which require direct eyeballing of the > peak lists under normal circumstances. Or at least that's how we saw > it. If you do really need to eyeball the peak lists there > are lots of tools available that will do the translation for you. > > Accuracy: Mass spec data in its raw form is generally stored > in binary formats, since mass specs are front ended by binary > computers. Conversion to and from base 10 human readable > representations introduces error. It's best to hold the data at its > original precision and translate out to human readable format > at whatever precision is deemed useful for eyeballing. > > File size: Sure, you can make files smaller by throwing away > precision, but as you begin to desire higher precision base64 quickly > becomes much more efficient. An excellent way to reduce file > size is to compress the peaklists before base64'ing them, as is done > in mzXML 3.0, and you do not sacrifice precision. > > Potential loss of precision information: That information > wasn't ever there, really. Again, mass specs are front ended > by binary > computers, so that base 10 precision information (does > '12345.099923123' mean '12345.1' or '12345.10' > or'12345.100'?) wasn't ever in > the datastream in the first place. The mass spec just wrote > a bunch of 32 or 64 bit binary numbers to the best of its (base 2) > ability. Looking at the bit patterns would be more revealing > of the precision, and base64 preserves them. As a developer, you > should be pleased that you don't have to wonder how many > digits of that value are for real and not just an artifact of > the base 2 to > base 10 formatting conversion - with base64 binary values > you're working with the original raw data, so those artifacts > aren't an > issue. > > Hope this helps, > > Brian Pratt > www.insilicos.com/IPP > > > -----Original Message----- > > From: psi...@li... > > [mailto:psi...@li...] On > > Behalf Of Coleman, Michael > > Sent: Tuesday, September 19, 2006 3:58 PM > > To: Angel Pizarro; psi...@li... > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > > From: Angel Pizarro > > > > > > 1. Loss of readability. ... > > > > > There actually is a space for "human readable spectra" in the > > > mzData format, > > > > I'm glad to hear that. I looked for this, but I did not > see it in the > > spec here > > > > > > > http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > > > > I was looking for something like a 'mzArray' and 'intenArray' tags, > > which would be the textual alternatives to 'mzArrayBinary' and > > 'intenArrayBinary'. Can you point me to an example? > > > > > > > but really who reads individual mz and intensity values? > > > > Well--I do. As a programmer I don't think it's an > exaggeration to say > > that I'm looking at the peak lists in our ms2 files every > day. I find > > being able to see at a glance that the peaks are basically sane, and > > their gross attributes (precision, count, etc.) very useful. > > > > Of course, as a programmer I can easy whip up a script to > decode this > > file format. I suspect most users would be stymied, though, > > and I think > > that that would be unfortunate. Since these files are part > of a chain > > of scientific argument, I think that as much as possible > they ought to > > be transparent and as open as possible to verification by > > eyeball (mine > > and those of our scientists) and alternative pieces of software. > > > > I'm not saying that this transparency is an absolute good. > Perhaps it > > is worth impairing so that we can have X, Y, and Z, which are > > considered > > more valuable. I'm not seeing what X, Y, and Z are, though. > > > > > > > > 2. Increased file size. ... > > > > > Not a fair comparison. Most of the space in an mzData file is > > > actually taken up by the human-readable parameters and parameter > > > values of the spectra. > > > > Sorry, I should have been clearer. The numbers I gave were > > just for the > > peak lists (base64 vs text) and nothing else--no tags, no other > > metadata. The rest of the mzData fields would add more > > overhead, but I > > have no objection about that part. > > > > If we implemented mzData here today, our files would be bigger if we > > used the base64 encoding than if we used the textual > numbers (as they > > are in our ms2 files). > > > > > > > > 3. Potential loss of precision information. ... > > > > > Actually the situtation may be reversed. Thermofinnigan, for > > > example, stores measured values coming off of the instrument > > > as double precision floats, later formatting the numbers as > > > needed with respect to the specific instruments limit of > detection. > > > 12345.1 may have originally been 12345.099923123 in the vendors > > > proprietary format. > > > > Okay, but isn't '12345.1' what I really want to see in this case > > (assuming that the vendor is correct about the instrument's > accuracy)? > > For this particular instance, the string '12345.1' tells me > > what I need > > to know, and a double-precision floating point value (e.g., > > 12345.10000000000036379) would sort of let me guess it (since > > double-precision has significantly more significant figures). But a > > single-precision value would leave me in a sort of gray area. > > That is, > > does '12345.099923123' mean '12345.1' or '12345.10' or > > '12345.100', for > > example? > > > > > > > I wrote an email a few days ago showing how to translate in ruby > > > the base64 arrays > > > > I saw it and it was quite useful to me. Part of the reason > I'm asking > > these questions is that I noticed in your examples that the > > base64-encoded values actually took more space than the > original data. > > > > Just to reiterate my main question, it looks like using > > base64 will make > > mzData less usable and more complex, as compared to straight > > text. What > > benefits come with it that offset these drawbacks? > > > > Mike > > > > > > > > > > -------------------------------------------------------------- > > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the > > chance to share your > > opinions on IT & business topics through brief surveys -- and > > earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge > > &CID=DEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Brian P. <bri...@in...> - 2006-09-19 23:31:56
|
When we developed the mzXML format we went through the same questions. This is how I understood things: Readability: We as developers are an unusual use case. The more likely use case for these formats is in visualization or automated processing, neither of which require direct eyeballing of the peak lists under normal circumstances. Or at least that's how we saw it. If you do really need to eyeball the peak lists there are lots of tools available that will do the translation for you. Accuracy: Mass spec data in its raw form is generally stored in binary formats, since mass specs are front ended by binary computers. Conversion to and from base 10 human readable representations introduces error. It's best to hold the data at its original precision and translate out to human readable format at whatever precision is deemed useful for eyeballing. File size: Sure, you can make files smaller by throwing away precision, but as you begin to desire higher precision base64 quickly becomes much more efficient. An excellent way to reduce file size is to compress the peaklists before base64'ing them, as is done in mzXML 3.0, and you do not sacrifice precision. Potential loss of precision information: That information wasn't ever there, really. Again, mass specs are front ended by binary computers, so that base 10 precision information (does '12345.099923123' mean '12345.1' or '12345.10' or'12345.100'?) wasn't ever in the datastream in the first place. The mass spec just wrote a bunch of 32 or 64 bit binary numbers to the best of its (base 2) ability. Looking at the bit patterns would be more revealing of the precision, and base64 preserves them. As a developer, you should be pleased that you don't have to wonder how many digits of that value are for real and not just an artifact of the base 2 to base 10 formatting conversion - with base64 binary values you're working with the original raw data, so those artifacts aren't an issue. Hope this helps, Brian Pratt www.insilicos.com/IPP > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Coleman, Michael > Sent: Tuesday, September 19, 2006 3:58 PM > To: Angel Pizarro; psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? > > > From: Angel Pizarro > > > > 1. Loss of readability. ... > > > There actually is a space for "human readable spectra" in the > > mzData format, > > I'm glad to hear that. I looked for this, but I did not see it in the > spec here > > > http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > > I was looking for something like a 'mzArray' and 'intenArray' tags, > which would be the textual alternatives to 'mzArrayBinary' and > 'intenArrayBinary'. Can you point me to an example? > > > > but really who reads individual mz and intensity values? > > Well--I do. As a programmer I don't think it's an exaggeration to say > that I'm looking at the peak lists in our ms2 files every day. I find > being able to see at a glance that the peaks are basically sane, and > their gross attributes (precision, count, etc.) very useful. > > Of course, as a programmer I can easy whip up a script to decode this > file format. I suspect most users would be stymied, though, > and I think > that that would be unfortunate. Since these files are part of a chain > of scientific argument, I think that as much as possible they ought to > be transparent and as open as possible to verification by > eyeball (mine > and those of our scientists) and alternative pieces of software. > > I'm not saying that this transparency is an absolute good. Perhaps it > is worth impairing so that we can have X, Y, and Z, which are > considered > more valuable. I'm not seeing what X, Y, and Z are, though. > > > > > 2. Increased file size. ... > > > Not a fair comparison. Most of the space in an mzData file is > > actually taken up by the human-readable parameters and parameter > > values of the spectra. > > Sorry, I should have been clearer. The numbers I gave were > just for the > peak lists (base64 vs text) and nothing else--no tags, no other > metadata. The rest of the mzData fields would add more > overhead, but I > have no objection about that part. > > If we implemented mzData here today, our files would be bigger if we > used the base64 encoding than if we used the textual numbers (as they > are in our ms2 files). > > > > > 3. Potential loss of precision information. ... > > > Actually the situtation may be reversed. Thermofinnigan, for > > example, stores measured values coming off of the instrument > > as double precision floats, later formatting the numbers as > > needed with respect to the specific instruments limit of detection. > > 12345.1 may have originally been 12345.099923123 in the vendors > > proprietary format. > > Okay, but isn't '12345.1' what I really want to see in this case > (assuming that the vendor is correct about the instrument's accuracy)? > For this particular instance, the string '12345.1' tells me > what I need > to know, and a double-precision floating point value (e.g., > 12345.10000000000036379) would sort of let me guess it (since > double-precision has significantly more significant figures). But a > single-precision value would leave me in a sort of gray area. > That is, > does '12345.099923123' mean '12345.1' or '12345.10' or > '12345.100', for > example? > > > > I wrote an email a few days ago showing how to translate in ruby > > the base64 arrays > > I saw it and it was quite useful to me. Part of the reason I'm asking > these questions is that I noticed in your examples that the > base64-encoded values actually took more space than the original data. > > Just to reiterate my main question, it looks like using > base64 will make > mzData less usable and more complex, as compared to straight > text. What > benefits come with it that offset these drawbacks? > > Mike > > > > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Akhilesh P. <pa...@jh...> - 2006-09-19 23:02:12
|
I agree with Mike about the human readable part and the size issues - I insist in our lab that all files to be manipulated be 'scanned' before 'crunching.' If there are no compelling reasons, I do not see why this should not be reconsidered. Akhilesh Pandey At 05:58 PM 9/19/2006, Coleman, Michael wrote: > > From: Angel Pizarro > > > > 1. Loss of readability. ... > > > There actually is a space for "human readable spectra" in the > > mzData format, > >I'm glad to hear that. I looked for this, but I did not see it in the >spec here > > >http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > >I was looking for something like a 'mzArray' and 'intenArray' tags, >which would be the textual alternatives to 'mzArrayBinary' and >'intenArrayBinary'. Can you point me to an example? > > > > but really who reads individual mz and intensity values? > >Well--I do. As a programmer I don't think it's an exaggeration to say >that I'm looking at the peak lists in our ms2 files every day. I find >being able to see at a glance that the peaks are basically sane, and >their gross attributes (precision, count, etc.) very useful. > >Of course, as a programmer I can easy whip up a script to decode this >file format. I suspect most users would be stymied, though, and I think >that that would be unfortunate. Since these files are part of a chain >of scientific argument, I think that as much as possible they ought to >be transparent and as open as possible to verification by eyeball (mine >and those of our scientists) and alternative pieces of software. > >I'm not saying that this transparency is an absolute good. Perhaps it >is worth impairing so that we can have X, Y, and Z, which are considered >more valuable. I'm not seeing what X, Y, and Z are, though. > > > > > 2. Increased file size. ... > > > Not a fair comparison. Most of the space in an mzData file is > > actually taken up by the human-readable parameters and parameter > > values of the spectra. > >Sorry, I should have been clearer. The numbers I gave were just for the >peak lists (base64 vs text) and nothing else--no tags, no other >metadata. The rest of the mzData fields would add more overhead, but I >have no objection about that part. > >If we implemented mzData here today, our files would be bigger if we >used the base64 encoding than if we used the textual numbers (as they >are in our ms2 files). > > > > > 3. Potential loss of precision information. ... > > > Actually the situtation may be reversed. Thermofinnigan, for > > example, stores measured values coming off of the instrument > > as double precision floats, later formatting the numbers as > > needed with respect to the specific instruments limit of detection. > > 12345.1 may have originally been 12345.099923123 in the vendors > > proprietary format. > >Okay, but isn't '12345.1' what I really want to see in this case >(assuming that the vendor is correct about the instrument's accuracy)? >For this particular instance, the string '12345.1' tells me what I need >to know, and a double-precision floating point value (e.g., >12345.10000000000036379) would sort of let me guess it (since >double-precision has significantly more significant figures). But a >single-precision value would leave me in a sort of gray area. That is, >does '12345.099923123' mean '12345.1' or '12345.10' or '12345.100', for >example? > > > > I wrote an email a few days ago showing how to translate in ruby > > the base64 arrays > >I saw it and it was quite useful to me. Part of the reason I'm asking >these questions is that I noticed in your examples that the >base64-encoded values actually took more space than the original data. > >Just to reiterate my main question, it looks like using base64 will make >mzData less usable and more complex, as compared to straight text. What >benefits come with it that offset these drawbacks? > >Mike > > > > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >_______________________________________________ >Psidev-ms-dev mailing list >Psi...@li... >https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Coleman, M. <MK...@St...> - 2006-09-19 22:58:23
|
> From: Angel Pizarro > > 1. Loss of readability. ... > There actually is a space for "human readable spectra" in the=20 > mzData format,=20 I'm glad to hear that. I looked for this, but I did not see it in the spec here =09 http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData I was looking for something like a 'mzArray' and 'intenArray' tags, which would be the textual alternatives to 'mzArrayBinary' and 'intenArrayBinary'. Can you point me to an example? > but really who reads individual mz and intensity values? Well--I do. As a programmer I don't think it's an exaggeration to say that I'm looking at the peak lists in our ms2 files every day. I find being able to see at a glance that the peaks are basically sane, and their gross attributes (precision, count, etc.) very useful. Of course, as a programmer I can easy whip up a script to decode this file format. I suspect most users would be stymied, though, and I think that that would be unfortunate. Since these files are part of a chain of scientific argument, I think that as much as possible they ought to be transparent and as open as possible to verification by eyeball (mine and those of our scientists) and alternative pieces of software. I'm not saying that this transparency is an absolute good. Perhaps it is worth impairing so that we can have X, Y, and Z, which are considered more valuable. I'm not seeing what X, Y, and Z are, though. > > 2. Increased file size. ... > Not a fair comparison. Most of the space in an mzData file is=20 > actually taken up by the human-readable parameters and parameter=20 > values of the spectra. Sorry, I should have been clearer. The numbers I gave were just for the peak lists (base64 vs text) and nothing else--no tags, no other metadata. The rest of the mzData fields would add more overhead, but I have no objection about that part. If we implemented mzData here today, our files would be bigger if we used the base64 encoding than if we used the textual numbers (as they are in our ms2 files). > > 3. Potential loss of precision information. ... > Actually the situtation may be reversed. Thermofinnigan, for=20 > example, stores measured values coming off of the instrument=20 > as double precision floats, later formatting the numbers as=20 > needed with respect to the specific instruments limit of detection.=20 > 12345.1 may have originally been 12345.099923123 in the vendors=20 > proprietary format. Okay, but isn't '12345.1' what I really want to see in this case (assuming that the vendor is correct about the instrument's accuracy)? For this particular instance, the string '12345.1' tells me what I need to know, and a double-precision floating point value (e.g., 12345.10000000000036379) would sort of let me guess it (since double-precision has significantly more significant figures). But a single-precision value would leave me in a sort of gray area. That is, does '12345.099923123' mean '12345.1' or '12345.10' or '12345.100', for example? > I wrote an email a few days ago showing how to translate in ruby=20 > the base64 arrays I saw it and it was quite useful to me. Part of the reason I'm asking these questions is that I noticed in your examples that the base64-encoded values actually took more space than the original data. Just to reiterate my main question, it looks like using base64 will make mzData less usable and more complex, as compared to straight text. What benefits come with it that offset these drawbacks? Mike |
From: Angel P. <an...@ma...> - 2006-09-19 21:38:56
|
Hi Mike, I have some answers that may or may not explain all of your concerns. On Tuesday 19 September 2006 16:39, Coleman, Michael wrote: > Hi, > > Does anyone know why base64 encoding is being used for peak mz and > intensity values in the mzData format? It appears to me that there are > three significant disadvantages to doing so: > > 1. Loss of readability. One of the primary reasons to use XML in the > first place is that it is human-readable--one can in principle inspect > and understand its contents with any text editor. Base64-encoding peak > data destroys this transparency. (It also makes it more difficult to > write scripts to process the data.) There actually is a space for "human readable spectra" in the mzData format, but really who reads individual mz and intensity values? The situation is akin to microarray data, does anyone really need to see each individual probe value? The normal usage of this data is to load the entire result set into a processing or search algorithm, or turn it into a nice spectra graph, all of which are handled by software which does not have a problem with decoding the strings. > > 2. Increased file size. At least for our spectra, it appears that a > compressed (gzip/etc) ms2 file is about 15% smaller than the equivalent > mzData file with the single-precision (32-bit) encoding, and 22% smaller > than the double-precision version. The *uncompressed* single-precision > mzData file is about about 15% smaller than the uncompressed ms2 file; > the double-precision version is almost twice as large. (These figures > are for 'gzip' default compression.) > Not a fair comparison. Most of the space in an mzData file is actually taken up by the human-readable parameters and parameter values of the spectra. I'll have to do some tests to see the actual space taken by spectra, but my "feeling" is that the byte and base64 encoding is actually a better compression of the data than gzipped XML with space delimitted floats. > (Currently our ms2 files have mz values rounded to one decimal place and > intensity values with about 4-5 significant places.) > > 3. Potential loss of precision information. For example, with > single-precision encoding, a value originally given as 12345.1 might be > encoded as 12345.0996. It's not easy to see from that encoding that the > original value was given with one decimal place. Worse-still, if the > original value is significant to more than 7-or-so digits and it gets > 32-bit encoded, precision will be lost, probably in a way not > immediately apparent to the user. (32-bit encoding will probably be a > temptation, given the size of the 64-bit encoding.) Actually the situtation may be reversed. Thermofinnigan, for example, stores measured values coming off of the instrument as double precision floats, later formatting the numbers as needed with respect to the specific instruments limit of detection. 12345.1 may have originally been 12345.099923123 in the vendors proprietary format. > > Even if base64-encoding cannot be dropped at this point, it seems like > it would be useful to add a "no encode" option, which would present peak > data as the obvious whitespace-separated list of numeric values. > See my remark about who really needs to see the raw numbers. I wrote an email a few days ago showing how to translate in ruby the base64 arrays, and there is also a java example posted with the mzData specification. > Am I missing something here? I could not find any discussion of this > issue on the list. > > --Mike > > > Mike Coleman, Scientific Programmer, +1 816 926 4419 > Stowers Institute for Biomedical Research > 1000 E. 50th St., Kansas City, MO 64110, USA > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 E: an...@ma... |
From: Coleman, M. <MK...@St...> - 2006-09-19 20:39:31
|
Hi, Does anyone know why base64 encoding is being used for peak mz and intensity values in the mzData format? It appears to me that there are three significant disadvantages to doing so: 1. Loss of readability. One of the primary reasons to use XML in the first place is that it is human-readable--one can in principle inspect and understand its contents with any text editor. Base64-encoding peak data destroys this transparency. (It also makes it more difficult to write scripts to process the data.) 2. Increased file size. At least for our spectra, it appears that a compressed (gzip/etc) ms2 file is about 15% smaller than the equivalent mzData file with the single-precision (32-bit) encoding, and 22% smaller than the double-precision version. The *uncompressed* single-precision mzData file is about about 15% smaller than the uncompressed ms2 file; the double-precision version is almost twice as large. (These figures are for 'gzip' default compression.) (Currently our ms2 files have mz values rounded to one decimal place and intensity values with about 4-5 significant places.) 3. Potential loss of precision information. For example, with single-precision encoding, a value originally given as 12345.1 might be encoded as 12345.0996. It's not easy to see from that encoding that the original value was given with one decimal place. Worse-still, if the original value is significant to more than 7-or-so digits and it gets 32-bit encoded, precision will be lost, probably in a way not immediately apparent to the user. (32-bit encoding will probably be a temptation, given the size of the 64-bit encoding.) Even if base64-encoding cannot be dropped at this point, it seems like it would be useful to add a "no encode" option, which would present peak data as the obvious whitespace-separated list of numeric values. Am I missing something here? I could not find any discussion of this issue on the list. =20 --Mike Mike Coleman, Scientific Programmer, +1 816 926 4419 Stowers Institute for Biomedical Research 1000 E. 50th St., Kansas City, MO 64110, USA |
From: Angel P. <an...@ma...> - 2006-09-14 12:52:41
|
Folks, in the interest of a smaller example, and to show how one does a complete round-trip of float array <--> byte encoded string <--> base64binary, here is a irb (interactive ruby) session showing the steps and fully commented: # require the base64 encoding library irb(main):002:0> require 'base64' => true # create the array of floats irb(main):003:0> a = [123.45, 124.5634, 1234.34121] => [123.45, 124.5634, 1234.34121] # Encode the array as double precision (e.g. 64 bit) floats as a byte string # in little-endian order. A lowercase "e" would encode it as single precision # floats (32-bit). # "G" and "g" would correspond to double and single precision in big-endian # byte order. See the ruby API for the pack and unpack methods for more info # on the types of byte encoding and what the "*" actually means ;) irb(main):004:0> s = a.pack('E*') => "\315\314\314\314\314\334^@@\244\337\276\016$_@F|'f]I\223@" # encode in base64 irb(main):006:0> sb64 = Base64.encode64(s) => "zczMzMzcXkBApN++DiRfQEZ8J2ZdSZNA\n" # decode to byte string again irb(main):007:0> s2 = Base64.decode64(sb64) => "\315\314\314\314\314\334^@@\244\337\276\016$_@F|'f]I\223@" # unpack the byte string into float array irb(main):008:0> a2 = s2.unpack('E*') => [123.45, 124.5634, 1234.34121] And that is the long and short of it. Cheers! -angel On Wednesday 13 September 2006 11:25, Angel Pizarro wrote: > Hello all, > > In the hopes of fostering mzData as a format, I am putting into the > docstore an example of decoding the mzData base64binary float arrays for > m/z and intensity using Ruby, my new language of choice. > > The docstore path is Documents/PSI_MS/mzData/decode_base64.rb > Here is the URL: > > http://psidev.sourceforge.net/docstore/view.php?sess=0&parent=7&expand=1&or >der=name&sortname=ASC&id=104&action=file_details > > The code and comments should be self-explanatory, but if not, give me an > email and I will be happy to augment the file, Look for a Ruby API to > read/write mzData sometime soon! ( airware at the moment ) > > -angel -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 E: an...@ma... |
From: Angel P. <an...@ma...> - 2006-09-13 15:28:03
|
Hello all, In the hopes of fostering mzData as a format, I am putting into the docstore an example of decoding the mzData base64binary float arrays for m/z and intensity using Ruby, my new language of choice. The docstore path is Documents/PSI_MS/mzData/decode_base64.rb Here is the URL: http://psidev.sourceforge.net/docstore/view.php?sess=0&parent=7&expand=1&order=name&sortname=ASC&id=104&action=file_details The code and comments should be self-explanatory, but if not, give me an email and I will be happy to augment the file, Look for a Ruby API to read/write mzData sometime soon! ( airware at the moment ) -angel -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 E: an...@ma... |
From: Andy J. <aj...@cs...> - 2006-09-12 11:00:20
|
Hi all, Apologies if you receive multiple copies. In the Gel group, we are shortly going to start writing the spec document for GelML. Before starting I would like to have a quick discussion about the format of the document because it would be good if we adopt a similar format across all documents produced from now on. I see there are various current specs that ought to be considered. - Autogenerated docs for MI: http://psidev.sourceforge.net/mi/rel25/doc/ - Draft of mzData spec: http://psidev.sourceforge.net/docstore/browse.php?sess=593acb54e566e0e2e0808 278349f8294 <http://psidev.sourceforge.net/docstore/browse.php?sess=593acb54e566e0e2e080 8278349f8294&parent=7&expand=1&order=name&sortname=ASC> &parent=7&expand=1&order=name&sortname=ASC - We've also just put out the internal draft of the FuGE v1 specs http://fuge.sourceforge.net/Version1Candidate/. The FuGE spec is divided into two documents: 1. A user guide which has general tasks, class diagrams and general description of the model. 2. A reference manual (UML + XML Schema), which is autogenerated by an AndroMDA cartridge I've written. The difference between MI/mzData and say, GelML, is that we have primarily developed a UML model with a defined mapping to the XML Schema. Therefore the specs are likely to focus more closely on the object model. I would like to propose that for models developed on top of FuGE, it might be advantageous to adopt a similar format to the FuGE specs, whereby: i) The reference manual covers both the UML and XML Schema, and it is autogenerated, therefore minimal work. ii) I can produce a template "user guide" which can be populated manually. All the models extending from FuGE would then have consistent specs. The user guide could include some of the facets of the mzData spec, such as Goals, Requirements and Appendix on CV usage. Could group chairs take a look at the draft FuGE specs and comment on the suitability of this format (sections 3 to 5 of the FuGE user guide would not be applicable). I guess we could adopt different policies for FuGE-based and non FuGE-based format if necessary? Best wishes, Andy |
From: Randy J. <rj...@pu...> - 2006-09-11 17:55:02
|
The minutes from the 22 August Teleconference are available from the PSI docstore: http://psidev.sourceforge.net/docstore/download.php?&id=103 The next teleconference is TOMORROW (12 September 2006) at 1500 GMT: 1. Examples from dataXML model 2. Organization at the Washington PSI meeting ---- HUPO Proteomics Standards Initiative, to be held from the 25th to 27th of September 2006 in Washington DC, at the headquarters of the American Chemical Society. All meeting details, including the draft program and registration form can be found at: http://psidev.sourceforge.net/meetings/2006-09/ The call details are: PSI MS WG Teleconference 12 September 1600 British Summer (London) Time http://www.timeanddate.com/worldclock/fixedtime.html?year=2006 <http://www.timeanddate.com/worldclock/fixedtime.html?year=2006&month=8&day= 15&hour=15&min=0&sec=0> &month=9&day=12&hour=15&min=0&sec=0 Please dial the most convenient number to access the teleconference: UK: +44 870 240 7821 or +44 207 819 3600 US: West coast: +1 4089616553 East coast: +1 7183541169 Switzerland: +41 1800 9449 (If you require access from any other location, please contact Phil Jones or Lennart Martin). The passcode to access the conversation is: 8885686# |
From: Randy J. <rkj...@in...> - 2006-08-23 12:16:18
|
To improve the accessibility of the mzData documents (schema, specification, CV) hard URLs were created to the latest versions: http://psidev.sourceforge.net/ms/xml/mzdata/psi-ms-cv-latest.obo The current release version is 1.7.2 and is stored as a specifically named file: http://psidev.sourceforge.net/ms/xml/mzdata/psi-ms-cv-1.7.2.obo Additions, corrections and changes to the CV This directory also has the latest version of the specification document. Appendix A contains cvParam allowed values by section. The specification document is not complete yet, but should be useful - please help with this document by sending edits to rkj...@in... or anyone in the PSI-MS working group. http://psidev.sourceforge.net/ms/xml/mzdata/mzdata_spec.doc The schema and the HTML documentation of the schema are also there: http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.xsd http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html Please let us know if there are any problems, suggestions, comments on any of the mzData documents. Thanks, Randy |
From: <dh...@md...> - 2006-07-18 19:09:57
|
I will be out of the office starting 07/15/2006 and will not return until 08/02/2006. I will respond to your message when I return. |
From: Randy J. <rkj...@in...> - 2006-07-18 11:52:32
|
There have been a large number of people out over the last month (me included.), but we need to get back on the line to set the schedule for the next steps in the merger for mzData and forward progress on the MS ontology. Phil has graciously agreed to start the call: For today's PSI-MS teleconference: http://www.timeanddate.com/worldclock/fixedtime.html?month=7 <http://www.timeanddate.com/worldclock/fixedtime.html?month=7&day=18&year=20 06&hour=16&min=0&sec=0&p1=136> &day=18&year=2006&hour=16&min=0&sec=0&p1=136 (16:00 British Summer Time) Please dial the most convenient number to access the teleconference: UK: +44 870 240 7821 or +44 207 819 3600 US: West coast: +1 4089616553 East coast: +1 7183541169 Switzerland: +41 1800 9449 (If you require access from any other location, please let me know). The passcode to access the conversation is: 8885686# This is 1500 GMT (1600 BST) and you can look on timeanddate for your location. The agenda is to talk about the next steps and set the schedule and agenda for up coming calls and meetings - essentially to get back to work. Thanks, Randy |
From: Chris T. <chr...@eb...> - 2006-07-11 10:05:39
|
Hi all. Just a heads up, for all involved in data standardization. The quote below comes from the 'opportunities' list generated by the FDA under their 'Critical Path Initiative' (aimed at getting novel therapies to patients quicker, but without increasing risk, by addressing bottlenecks). The opportunities document is linked from the main page for the initiative, and has lots of other interesting stuff in it: http://www.fda.gov/oc/initiatives/criticalpath/ One major objective might be to get this tool -- http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/ -- to speak FuGE..? That'd then make it much simpler to leverage some of the offspring of FuGE (MAGE2, GelML, ultimately MS and now NMR formats) in that tool and others FDA might develop in the future (for example, for proteomics and metabolomics data). Anyway, the quote: "44. Development of Data Standards. Currently, clinical investigators, clinical study personnel, data managers, and FDA reviewers must cope with a plethora of data formats and conventions. Some clinical investigators report the presence of many different computer systems for data entry at their sites (for various trials), each of which uses different data conventions. Lack of standardization is not only inefficient, it multiplies the potential for error. Important standards work is underway, but much remains before the promise of shared data standards for clinical trials is realized. CDISC is paving the way by developing its Study Data Tabulation Model for describing observations in drug trials [1]. That model could someday encompass observations needed for other types of trials. Health Level 7 and CDISC are working to create standards that can be used for the exchange, management, and integration of electronic healthcare information to increase the effectiveness and efficiency of healthcare delivery [2]. In addition to improving and expanding the Model, sponsors and the FDA must undertake the hard work of retooling hardware and software to apply the new standards. This retooling includes training researchers to collect and FDA reviewers to expect data in these formats. Standardizing data archiving conventions would also enable the creation of shared data repositories, facilitating meta-analyses, data mining, and modeling to improve clinical trial design and analysis." [1] For more on CDISC (the Clinical Data Interchange Standards Consortium), see http://www.cdisc.org/. [2] See also http://www.hl7.org/. Sorry to those who get this several times... Cheers, Chris. ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://psidev.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Trish W. <wh...@pc...> - 2006-06-27 14:28:49
|
Is the PSI-MS working group having a call today and what is the call-in number? Thanks, Trish > PSI-MS working group: > > > > Following ASMS, and the push to complete the mzData NBT paper, we have not > returned to our regularly scheduled teleconferences. There are several > items to discuss and given the summer vacation season, I would like to > propose the following: > > > > 27 June 2006 - 1600 BST, PSI-MS - Teleconference > > > > (11 July will be the PSI-PI Teleconference according to Angel) > > > > 18 July 2006 - 1600 BST, PSI-MS Teleconference > > > > With additional calls if needed? > > > > In addition to the mzData/mzXML merger discussion are there other topics for > a call next week? > > > > Thanks, > > Randy > > > > > > > > |
From: Randy J. <rkj...@in...> - 2006-06-20 14:30:31
|
PSI-MS working group: Following ASMS, and the push to complete the mzData NBT paper, we have not returned to our regularly scheduled teleconferences. There are several items to discuss and given the summer vacation season, I would like to propose the following: 27 June 2006 - 1600 BST, PSI-MS - Teleconference (11 July will be the PSI-PI Teleconference according to Angel) 18 July 2006 - 1600 BST, PSI-MS Teleconference With additional calls if needed? In addition to the mzData/mzXML merger discussion are there other topics for a call next week? Thanks, Randy |