From: Chris T. <ch...@eb...> - 2004-08-19 13:54:14
|
Hi Per. > as the project manager of, Proteios (http://www.proteios.org/) - a > bioinformatics software project in proteomics, I strongly greet the > standardization work for mzData. We are currently working on integrating > mzData in our (Proteios) data model. Bravo! Please continue to forward your collective experiences to the list (or at least me). Jari Hakkinen and Frank Potthast gave an excellent presentation of Proteios at our Nice meeting last April -- I'm enormously glad that you and your team are still actively participating in this project. > Has anyone done this before - actually used mzData in a software application > (apart from validation and the like with standard XML tools)? Putting mzData > into practical use should be very much in the interest of PSI. Unfortunately > there are, in my opinion, a few things which cause (uneccessary?) > difficulties in an implementation. Some converters were written (by Kai) to go from .pkl, .dta and I think maybe some others (.mgf maybe) to mzData. They are out of data now as the format has evolved, but you can still get the source if you dig on the CVS a bit: http://cvs.sourceforge.net/viewcvs.py/psidev/psi/psi-ms/xml/mzData/converter/ There is also a guy at Leeds Uni doing an implementation [web submission -> XML-import -> DB -> XSLT rendering to HTML]. That is the extent of software implementations of the format to date afaik. > First I would suggest not using common programming language keywords (e.g > "float") as element/attribute names. This is a possible source of confusion > and makes simple mapping onto source code data structures more difficult. Hmm. Fair point (a reserved/key words thing yeah?). Something to consider for a future revision perhaps. > Secondly, I wonder whether there is any pressing need for keeping two separate > arrays in cases where the apparent meaning is rather one single array of > pairs (e.g. "intenArray", "mzArray"). Base64-encoded arrays of IEEE-754 floats (endian and precision given) were chosen to capture the data to allow (1) some compression, and importantly, (2) maximal portability for the (compressed) data captured in the format without having to parse it all out locally. Moving to two-membered arrays would (I think) mess up this portability because complex arrays are likely to be handled differently on different systems. Also, were someone interested solely in m/z values, they would have a simpler job to grab them in isolation without decompresssing and then parsing them out. Hopefully, as far as implementing the XML goes, there will be little difference between two single-membered arrays and one two-membered one. > The schema does not enforce the same > length on these two arrays. Pardon my ignorance (which is significant) but is that possible in XML (although see below)? Or did you mean in the guidance notes perhaps? I appreciate that if we had a single array of pairs this wouldn't be an issue... > I also wonder what the purpose of the attribute > length is. Can't it be removed, since the length is implicitely given by the > number of subelements? Consider the following excerpt from a valid(!) mzData > XML-file: > > <mzArray length="15"> > <float>100</float> > <float>500</float> > </mzArray> > <intenArray length="7"> > <float>10</float> > <float>100</float> > <float>20</float> > </intenArray> Well there's valid and there's valid... :) We only rely to a minimal extent on XML to do any constraining -- we see that other tools will have to do a significant amount of validation anyway (CV is a biggie) and so went for simple things like putting in lengths/counts as attributes on some parents to allow some integrity checking; i.e. I say there are three analyser stages in my mass spec, you find two, something is clearly wrong (incidentally these sort of attributes could support the functionality to check equality of array length as discussed above, but of course only _stated_ array length, not actual). Again do please let us/me know how the implementation goes -- this information is absolutely invaluable to us. And of course feel free to reply to these attempts to answer the points you raise. Cheers, Chris. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Chris Taylor (ch...@eb...) HUPO PSI: GPS -- psidev.sf.net ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |