From: Kent L. <knl...@in...> - 2007-02-07 21:04:02
|
Hi Alexandre, I can quickly answer question 1. You're preferred form of fileName has been requested by multiople parties, and the unofficial consensus is that it should be of the form. <fileName><![CDATA[....]]></fileName> The issue of the the value of a controlled vocabulary term be in the element content as opposed to an attribute has also been discussed, but without apparent consensus. We should build a concise list that we can work from at the conference. Regards, Kent =20 > -----Original Message----- > From: psi...@li...=20 > [mailto:psi...@li...] On=20 > Behalf Of Alexandre Masselot > Sent: Wednesday, February 07, 2007 5:11 AM > To: psi...@li... > Subject: Re: [Psidev-ms-dev] X-IMail-SPAM-URL-DBL Comments on=20 > dataXML0.9 >=20 >=20 > Hi a few comments after a quick glance: >=20 > 1) Attribute limit (this was already a comment a few months ago)? >=20 > <sourceFile id=3D"1" fileName=3D"ICAT_test4.RAW"=20 > filePath=3D"file://F:/for Jim" fileType=3D"RAW 2.0" >=20 > we shall not put file name or whatever "rich" info into=20 > attribute, or will have to encode/decode it everytime. Think=20 > of all the strange character that can come here (every one,=20 > by far do not use [\w+] into ther file names...=20 > unfortunaltely Tag value (so possibly with CDATA) would be=20 > more than welcome. >=20 > <fileName><![CDATA[....]]></fileName> >=20 >=20 > The sam eproblem may occur with things like >=20 > <cvParam cvLabel=3D"TMO" accession=3D"TMO:1000001" name=3D"Filter"=20 > value=3D"+ c d Full ms2 445.35@cid35.00 [ 110.00-905.00]"/> >=20 > no? >=20 >=20 > 2) > multiple charge > I don't see any example of that in the example, but how will=20 > we handle multiple charges for > * parent peak? > * more difficult (as info is stored into an array):=20 > fragmentation peaks? >=20 > And I'm sure there is something for multiple moz for the precursor >=20 > best regards > Alex >=20 >=20 > Fredrik Levander wrote: > > Hello, > > > > After a quick inspection of dataXML0.9, the first=20 > impression is that=20 > > it looks very promising and that all people working with it=20 > have done=20 > > some great work in the fusing of the mz's. > > > > I've got some small specific comments which I guess you've already=20 > > discussed, but anyway: > > > > 1) The attribute "count" which can be found at some places=20 > > (softwareList, spectrumList etc) could be more of an=20 > obstacle than of=20 > > help. There is no easy way to validate that this number corresponds=20 > > with the actual number of list elements in the XML world. Such a=20 > > validation would require specific validators for each programming=20 > > language. In cases where the count attribute is not equal to the=20 > > number of elements in the list, there could be different parsing=20 > > results depending on if the implementation is using the 'count' or=20 > > standard parsing, the later ignoring the count attribute.=20 > Actually, the example file: > >=20 > http://db.systemsbiology.net/projects/PSI/dataXML/tiny1.dataXML0.9.xml > > is an example where the softwareList count=3D"2", but the=20 > actual number=20 > > of elements is 3. > > I would suggest that either the 'count's are omitted, or=20 > PSIDEV should=20 > > at some time provide validators which verify that list=20 > lengths equals=20 > > to the count attribute. Another option is that the attribute is=20 > > documented only to be used for visual inspection of files, and that=20 > > the actual number of list elements can differ. Are any of=20 > the current=20 > > mzData-parsers using the 'count's anyway? > > > > 2) The indexing extension of dataXML. It is evident that such an=20 > > indexing is useful for fast file access, and it should=20 > definitely be=20 > > part of the standard. However, if I understand the schema correctly=20 > > (no sample file yet), an indexed file would mean that the=20 > dataXML is=20 > > encapsuled within the <indexedDataXML>, with the indexing=20 > information=20 > > at the end of the file. Why not use a separate file for the=20 > indexing,=20 > > which references the dataXML file as an URI? I think that=20 > would make=20 > > up for faster data access with the indexes in the beginning of the=20 > > file, even if the 'indexOffset' should allow for quick=20 > access to the=20 > > index. A small consideration is that the offset / indexes=20 > would differ=20 > > depending on if the file is opened in binary or text mode, at least=20 > > for large files on a Windows system. I know that it is=20 > working for RAP=20 > > and mzXML, but for new implementations which use other=20 > libraries and=20 > > file readers there may be problems. Anyway, it should be=20 > made clear=20 > > that indexes (offsets) are for binary file reading (or text=20 > if that is=20 > > the case) The fileCheckSum would also become clearer with=20 > two separate=20 > > files. It is quite complex to have the fileCheckSum of the file=20 > > contained within the file itself, since the checksum is affected by=20 > > the writing of the actual checksum ... If the file checksum is=20 > > contained in a separate file it is clear that the checksum is the=20 > > checksum of the actual dataXML, excluding the index file. > > On the other hand it could be more handy to have just one=20 > file to work=20 > > with, but is it really such an advantage? In cases where the index=20 > > file is lost it would be easy to generate a new index file with a=20 > > specific application for generation of dataXML index files anyway. > > > > Regards > > > > Fredrik Levander > > > > > >=20 > ---------------------------------------------------------------------- > > --- Using Tomcat but need to do more? Need to support web services,=20 > > security? > > Get stuff done quickly with pre-integrated technology to=20 > make your job easier. > > Download IBM WebSphere Application Server v.1.0.1 based on Apache=20 > > Geronimo > >=20 > = http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D= 1216 > > 42 _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > =20 >=20 > -- > Alexandre Masselot, phD > Senior bioinformatician > www.genebio.com > voice: +41 22 702 99 00 >=20 >=20 >=20 > -------------------------------------------------------------- > ----------- > Using Tomcat but need to do more? Need to support web=20 > services, security? > Get stuff done quickly with pre-integrated technology to make=20 > your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on=20 > Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057& > dat=3D121642 > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >=20 |