Re: [Psidev-ms-dev] X-IMail-SPAM-URL-DBL Comments on dataXML0.9

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear all,
Eric Deutsch is compiling all feedbacks for dataXML (Either made 
directly to him or to the list). All issues that can be solved before 
Lyon are welcome to be tracked. All others will be addressed in Lyon
Do not hesitate to make your comments ASAP, so that we can remove all 
blocking issues before April 23.

We'll make a status at the next phone conference, Feb 27th, 8amPST.

Regards,
Pierre-Alain

Kent Laursen wrote:
> Hi Alexandre,
>
> I can quickly answer question 1.  You're preferred form of fileName has
> been requested by multiople parties, and the unofficial consensus is
> that it should be of the form.
>
> <fileName><![CDATA[....]]></fileName>
>
> The issue of the the value of a controlled vocabulary term be in the
> element content as opposed to an attribute has also been discussed, but
> without apparent consensus.
>
> We should build a concise list that we can work from at the conference.
>
> Regards,
>
> Kent
>  
>
>   
>> -----Original Message-----
>> From: psi...@li... 
>> [mailto:psi...@li...] On 
>> Behalf Of Alexandre Masselot
>> Sent: Wednesday, February 07, 2007 5:11 AM
>> To: psi...@li...
>> Subject: Re: [Psidev-ms-dev] X-IMail-SPAM-URL-DBL Comments on 
>> dataXML0.9
>>
>>
>> Hi a few comments after a quick glance:
>>
>> 1) Attribute limit (this was already a comment a few months ago)?
>>
>> <sourceFile id="1" fileName="ICAT_test4.RAW" 
>> filePath="file://F:/for Jim" fileType="RAW 2.0"
>>
>> we shall not put file name or whatever "rich" info into 
>> attribute, or will have to encode/decode it everytime. Think 
>> of all the strange character that can come here (every one, 
>> by far do not use [\w+] into ther file names... 
>> unfortunaltely Tag value (so possibly with CDATA) would be 
>> more than welcome.
>>
>> <fileName><![CDATA[....]]></fileName>
>>
>>
>> The sam eproblem may occur with things like
>>
>> <cvParam cvLabel="TMO" accession="TMO:1000001" name="Filter" 
>> value="+ c d Full ms2  445.35@cid35.00 [ 110.00-905.00]"/>
>>
>> no?
>>
>>
>> 2)
>> multiple charge
>> I don't see any example of that in the example, but how will 
>> we handle multiple charges for
>> * parent peak?
>> * more difficult (as info is stored into an array): 
>> fragmentation peaks?
>>
>> And I'm sure there is something for multiple moz for the precursor
>>
>> best regards
>> Alex
>>
>>
>> Fredrik Levander wrote:
>>     
>>> Hello,
>>>
>>> After a quick inspection of dataXML0.9, the first 
>>>       
>> impression is that 
>>     
>>> it looks very promising and that all people working with it 
>>>       
>> have done 
>>     
>>> some great work in the fusing of the mz's.
>>>
>>> I've got some small specific comments which I guess you've already 
>>> discussed, but anyway:
>>>
>>> 1) The attribute "count" which can be found at some places 
>>> (softwareList, spectrumList etc) could be more of an 
>>>       
>> obstacle than of 
>>     
>>> help. There is no easy way to validate that this number corresponds 
>>> with the actual number of list elements in the XML world. Such a 
>>> validation would require specific validators for each programming 
>>> language. In cases where the count attribute is not equal to the 
>>> number of elements in the list, there could be different parsing 
>>> results depending on if the implementation is using the 'count' or 
>>> standard parsing, the later ignoring the count attribute. 
>>>       
>> Actually, the example file:
>>     
>> http://db.systemsbiology.net/projects/PSI/dataXML/tiny1.dataXML0.9.xml
>>     
>>> is an example where the softwareList count="2", but the 
>>>       
>> actual number 
>>     
>>> of elements is 3.
>>> I would suggest that either the 'count's are omitted, or 
>>>       
>> PSIDEV should 
>>     
>>> at some time provide validators which verify that list 
>>>       
>> lengths equals 
>>     
>>> to the count attribute. Another option is that the attribute is 
>>> documented only to be used for visual inspection of files, and that 
>>> the actual number of list elements can differ. Are any of 
>>>       
>> the current 
>>     
>>> mzData-parsers using the 'count's anyway?
>>>
>>> 2) The indexing extension of dataXML.  It is evident that such an 
>>> indexing is useful for fast file access, and it should 
>>>       
>> definitely be 
>>     
>>> part of the standard. However, if I understand the schema correctly 
>>> (no sample file yet), an indexed file would mean that the 
>>>       
>> dataXML is 
>>     
>>> encapsuled within the <indexedDataXML>, with the indexing 
>>>       
>> information 
>>     
>>> at the end of the file. Why not use a separate file for the 
>>>       
>> indexing, 
>>     
>>> which references the dataXML file as an URI?  I think that 
>>>       
>> would make 
>>     
>>> up for faster data access with the indexes in the beginning of the 
>>> file, even if the 'indexOffset' should allow for quick 
>>>       
>> access to the 
>>     
>>> index. A small consideration is that the offset / indexes 
>>>       
>> would differ 
>>     
>>> depending on if the file is opened in binary or text mode, at least 
>>> for large files on a Windows system. I know that it is 
>>>       
>> working for RAP 
>>     
>>> and mzXML, but for new implementations which use other 
>>>       
>> libraries and 
>>     
>>> file readers there may be problems. Anyway,  it should be 
>>>       
>> made clear 
>>     
>>> that indexes (offsets) are for binary file reading (or text 
>>>       
>> if that is 
>>     
>>> the case) The fileCheckSum would also become clearer with 
>>>       
>> two separate 
>>     
>>> files. It is quite complex to have the fileCheckSum of the file 
>>> contained within the file itself, since the checksum is affected by 
>>> the writing of the actual checksum ... If the file checksum is 
>>> contained  in a separate file it is clear that the checksum is the 
>>> checksum of the actual dataXML, excluding the index file.
>>> On the other hand it could be more handy to have just one 
>>>       
>> file to work 
>>     
>>> with, but is it really such an advantage? In cases where the index 
>>> file is lost it would be easy to generate a new index file with a 
>>> specific application for generation of dataXML index files anyway.
>>>
>>> Regards
>>>
>>> Fredrik Levander
>>>
>>>
>>>
>>>       
>> ----------------------------------------------------------------------
>>     
>>> --- Using Tomcat but need to do more? Need to support web services, 
>>> security?
>>> Get stuff done quickly with pre-integrated technology to 
>>>       
>> make your job easier.
>>     
>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache 
>>> Geronimo
>>>
>>>       
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=1216
>>     
>>> 42 _______________________________________________
>>> Psidev-ms-dev mailing list
>>> Psi...@li...
>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>>>   
>>>       
>> --
>> Alexandre Masselot, phD
>> Senior bioinformatician
>> www.genebio.com
>> voice: +41 22 702 99 00
>>
>>
>>
>> --------------------------------------------------------------
>> -----------
>> Using Tomcat but need to do more? Need to support web 
>> services, security?
>> Get stuff done quickly with pre-integrated technology to make 
>> your job easier.
>> Download IBM WebSphere Application Server v.1.0.1 based on 
>> Apache Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&
>> dat=121642
>> _______________________________________________
>> Psidev-ms-dev mailing list
>> Psi...@li...
>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>>
>>     
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>
>   

-- 
--
 Dr. Pierre-Alain Binz
 Swiss Institute of Bioinformatics
 Proteome Informatics Group
 1, Rue Michel Servet
 CH-1211 Geneve 4
 Switzerland
 - - - - - - - - - - - - - - - - -
 Tel: +41-22-379 50 50
 Fax: +41-22-379 58 58
 Pie...@is...
 http://www.expasy.org/people/Pierre-Alain.Binz.html