Re: [Psidev-ms-dev] X-IMail-SPAM-URL-DBL Comments on dataXML0.9

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi a few comments after a quick glance:

1) Attribute limit (this was already a comment a few months ago)?

<sourceFile id="1" fileName="ICAT_test4.RAW" filePath="file://F:/for
Jim" fileType="RAW 2.0"

we shall not put file name or whatever "rich" info into attribute, or
will have to encode/decode it everytime. Think of all the strange
character that can come here (every one, by far do not use [\w+] into
ther file names... unfortunaltely
Tag value (so possibly with CDATA) would be more than welcome.

<fileName><![CDATA[....]]></fileName>

The sam eproblem may occur with things like

<cvParam cvLabel="TMO" accession="TMO:1000001" name="Filter" value="+ c
d Full ms2  445.35@cid35.00 [ 110.00-905.00]"/>

no?

2)
multiple charge
I don't see any example of that in the example, but how will we handle
multiple charges for
* parent peak?
* more difficult (as info is stored into an array): fragmentation peaks?

And I'm sure there is something for multiple moz for the precursor

best regards
Alex

Fredrik Levander wrote:
> Hello,
>
> After a quick inspection of dataXML0.9, the first impression is that it 
> looks very promising and that all people working with it have done some 
> great work in the fusing of the mz's.
>
> I've got some small specific comments which I guess you've already 
> discussed, but anyway:
>
> 1) The attribute "count" which can be found at some places 
> (softwareList, spectrumList etc) could be more of an obstacle than of 
> help. There is no easy way to validate that this number corresponds with 
> the actual number of list elements in the XML world. Such a validation 
> would require specific validators for each programming language. In 
> cases where the count attribute is not equal to the number of elements 
> in the list, there could be different parsing results depending on if 
> the implementation is using the 'count' or standard parsing, the later 
> ignoring the count attribute. Actually, the example file:
> http://db.systemsbiology.net/projects/PSI/dataXML/tiny1.dataXML0.9.xml
> is an example where the softwareList count="2", but the actual number of 
> elements is 3.
> I would suggest that either the 'count's are omitted, or PSIDEV should 
> at some time provide validators which verify that list lengths equals to 
> the count attribute. Another option is that the attribute is documented 
> only to be used for visual inspection of files, and that the actual 
> number of list elements can differ. Are any of the current 
> mzData-parsers using the 'count's anyway?
>
> 2) The indexing extension of dataXML.  It is evident that such an 
> indexing is useful for fast file access, and it should definitely be 
> part of the standard. However, if I understand the schema correctly (no 
> sample file yet), an indexed file would mean that the dataXML is 
> encapsuled within the <indexedDataXML>, with the indexing information at 
> the end of the file. Why not use a separate file for the indexing, which 
> references the dataXML file as an URI?  I think that would make up for 
> faster data access with the indexes in the beginning of the file, even 
> if the 'indexOffset' should allow for quick access to the index. A small 
> consideration is that the offset / indexes would differ depending on if 
> the file is opened in binary or text mode, at least for large files on a 
> Windows system. I know that it is working for RAP and mzXML, but for new 
> implementations which use other libraries and file readers there may be 
> problems. Anyway,  it should be made clear that indexes (offsets) are 
> for binary file reading (or text if that is the case)
> The fileCheckSum would also become clearer with two separate files. It 
> is quite complex to have the fileCheckSum of the file contained within 
> the file itself, since the checksum is affected by the writing of the 
> actual checksum ... If the file checksum is contained  in a separate 
> file it is clear that the checksum is the checksum of the actual 
> dataXML, excluding the index file.
> On the other hand it could be more handy to have just one file to work 
> with, but is it really such an advantage? In cases where the index file 
> is lost it would be easy to generate a new index file with a specific 
> application for generation of dataXML index files anyway.
>
> Regards
>
> Fredrik Levander
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>   

-- 
Alexandre Masselot, phD
Senior bioinformatician
www.genebio.com
voice: +41 22 702 99 00