Re: [Psidev-ms-dev] mzData issues

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

PSI-MS Developers,

Just a quick note regarding mzData 1.1 prototype. It now looks like the=
=20
smart way to go is to provide a mzData 1.1 for review in January (as=20
agreed in Geneva) with the goal of completing the release at the Spring=
=20
2006 meeting. My preference is for mzData 1.1 to be a maintenance=20
release adding new 'optional' elements which mean anyone generating 1.0=
5=20
files can continue to do so, and anyone who wants to use the new=20
features of 1.1 will not break any current parsers. The main reason for=
=20
the proposed schedule is that we are making plans to merge mzData with=20
mzXML, and I don't see how this could be done easily before January=20
allowing sufficient review time before a meeting in the Spring. What=20
does seem possible is to keep mzData as stable as possible while=20
addressing the main complaints regarding redundancy of instrument=20
parameters by grouping of 'scans' into 'experiments' and providing a=20
mechanism for providing all of the instrument parameters for a 'group'=20
of scans. We could have the merged mzData/mzXML in draft mode by the=20
Spring meeting (if we get help) and finalized by the Fall meeting which=
=20
could mean adoption in the Fall, or if there is more refinement needed=20
by the end of 2006.

Keeping mzData stable should allow developers to add tools without the=20
worry you are expressing - that the thing will move too fast and preven=
t=20
stable development. This is countered by people who say 'why should it=20
take so long? fix it and be done with it...' We could use more feedback=
=20
on this.

I have heard that the XSLT-based translator is slow on large files. Thi=
s=20
is because there is a script which splits the mzXML joint mz/inten data=
=20
vector into the separated vectors used in mzData. This could be done=20
very quickly in C/C++ or in Java. No one has written such a program yet=
,=20
but given the example of the ProteomeSystems converter, it would not=20
take anyone with a serious number of mzXML files to convert long to=20
write - I am not aware of anyone who has done this yet.

I have also not heard of any attempt to resurrect the original Java GUI=
=20
application Kai Runte wrote while at the EBI which could perform the=20
type of ASCII->mzData conversion you mentioned. This code base is in th=
e=20
CVS tree and could be picked up by a Java-capable volunteer and made=20
compatible with 1.05 (and beyond). Anyone wishing to work on this pleas=
e=20
contact me.

=46inally, there are reports of a number of viewers - several groups se=
em=20
to be working on them, and we have our own (which I'll share if you=20
want), so I don't have experience with anyone else's. This is how I=20
check the base64 strings. There are other base64/IEEE-float analysis=20
tools which you could use if you want to strip out the string for=20
testing, but I just use an mzData parsing viewer. It would be helpful i=
f=20
we organized the toolset for mzData and gave pointers to things like=20
viewers, parsers and converters. If everyone who has a working mzData=20
tool will drop a line to this mailing list, I will make sure the tools=20
section gets updated.

Randy Julian

Andy Jones wrote:

> Hi,
>
> I=92m trying to convert data from various different instruments into=20
> mzData and I have a few questions.
>
>    1. One of the instruments produces plain text output for the peak
>       list (peak [tab] intensity). Does anyone have a script or some
>       code for turning this into the mzData base 64 binary. Otherwise=
,
>       any advice for how best to do this would be welcome.
>    2. What is the current time scale for the updated version of mzDat=
a
>       that is being discussed. If I produce parsers that convert to
>       mzData v 1.05, will I need to re-write them fairly soon for the
>       next version?
>    3. What are the current plans with respect to future versions of
>       mzXML and mzData. I have tried out the ProteomeSystems
>       converter. It runs very slowly over large files but it does
>       produce valid mzData files, although I don=92t know how to chec=
k
>       if the peak list conversion is correct. Is this a viable way of
>       producing mzData if I can get mzXML files first? Does anyone
>       have any experience of using the converter in practice.
>
> Any advice would be most appreciated, cheers,
>
> Andy
>