From: Alexandre M. <ol...@ge...> - 2007-02-07 10:13:42
|
Hi a few comments after a quick glance: 1) Attribute limit (this was already a comment a few months ago)? <sourceFile id="1" fileName="ICAT_test4.RAW" filePath="file://F:/for Jim" fileType="RAW 2.0" we shall not put file name or whatever "rich" info into attribute, or will have to encode/decode it everytime. Think of all the strange character that can come here (every one, by far do not use [\w+] into ther file names... unfortunaltely Tag value (so possibly with CDATA) would be more than welcome. <fileName><![CDATA[....]]></fileName> The sam eproblem may occur with things like <cvParam cvLabel="TMO" accession="TMO:1000001" name="Filter" value="+ c d Full ms2 445.35@cid35.00 [ 110.00-905.00]"/> no? 2) multiple charge I don't see any example of that in the example, but how will we handle multiple charges for * parent peak? * more difficult (as info is stored into an array): fragmentation peaks? And I'm sure there is something for multiple moz for the precursor best regards Alex Fredrik Levander wrote: > Hello, > > After a quick inspection of dataXML0.9, the first impression is that it > looks very promising and that all people working with it have done some > great work in the fusing of the mz's. > > I've got some small specific comments which I guess you've already > discussed, but anyway: > > 1) The attribute "count" which can be found at some places > (softwareList, spectrumList etc) could be more of an obstacle than of > help. There is no easy way to validate that this number corresponds with > the actual number of list elements in the XML world. Such a validation > would require specific validators for each programming language. In > cases where the count attribute is not equal to the number of elements > in the list, there could be different parsing results depending on if > the implementation is using the 'count' or standard parsing, the later > ignoring the count attribute. Actually, the example file: > http://db.systemsbiology.net/projects/PSI/dataXML/tiny1.dataXML0.9.xml > is an example where the softwareList count="2", but the actual number of > elements is 3. > I would suggest that either the 'count's are omitted, or PSIDEV should > at some time provide validators which verify that list lengths equals to > the count attribute. Another option is that the attribute is documented > only to be used for visual inspection of files, and that the actual > number of list elements can differ. Are any of the current > mzData-parsers using the 'count's anyway? > > 2) The indexing extension of dataXML. It is evident that such an > indexing is useful for fast file access, and it should definitely be > part of the standard. However, if I understand the schema correctly (no > sample file yet), an indexed file would mean that the dataXML is > encapsuled within the <indexedDataXML>, with the indexing information at > the end of the file. Why not use a separate file for the indexing, which > references the dataXML file as an URI? I think that would make up for > faster data access with the indexes in the beginning of the file, even > if the 'indexOffset' should allow for quick access to the index. A small > consideration is that the offset / indexes would differ depending on if > the file is opened in binary or text mode, at least for large files on a > Windows system. I know that it is working for RAP and mzXML, but for new > implementations which use other libraries and file readers there may be > problems. Anyway, it should be made clear that indexes (offsets) are > for binary file reading (or text if that is the case) > The fileCheckSum would also become clearer with two separate files. It > is quite complex to have the fileCheckSum of the file contained within > the file itself, since the checksum is affected by the writing of the > actual checksum ... If the file checksum is contained in a separate > file it is clear that the checksum is the checksum of the actual > dataXML, excluding the index file. > On the other hand it could be more handy to have just one file to work > with, but is it really such an advantage? In cases where the index file > is lost it would be easy to generate a new index file with a specific > application for generation of dataXML index files anyway. > > Regards > > Fredrik Levander > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Alexandre Masselot, phD Senior bioinformatician www.genebio.com voice: +41 22 702 99 00 |