From: frank g. <Fra...@nc...> - 2007-10-16 09:05:36
|
Hi I have been following this discussion and there seems to be some confusion about the CV, its use, and development. Using the OBO language this allows you to record "words" or strings. It does not allow you to model what the words represent such as restrictions, cardinality or datatypes for values (such as int, double and xml datatypes). This is a limitation of the chosen language. The PSI have developed "Guidelines for the development of Controlled Vocabularies" which is a final document and describes the recommendation's and best practice in designing CVs for the PSI. It includes and described several issues which have been raised on this list such as what the relationships of is_a and part_of semanticaly mean. In addition it includes how to normalise the natural language definitions for each RA, the maintainance procedures, obselecsing tems and the process for term addition. The Final document can be found at the following URL http://psidev.info/index.php?q=node/258 I hope these comments and the information contained within this document is helpful in the development of the MS CV Cheers Frank On 10/15/07, Fredrik Levander <Fre...@im...> wrote: > > Hi, > > My comments on mzML0.99.0 after reading (most of) the posts on the > mailing list and trying to convert a peak list into the format are as > follows: > > The standard is composed of a schema with little control and a lot of > cvParams that are controlled by a separate file. Updates to the CV does > not require schema updates, and the CV rules file should also be stable. > For the validation of files it would, as pointed out by several people, > be straightforward to automate generate an XSD which reflect the current > CV. Otherwise the semantic Java validator also does the job (and also > have other benefits when it comes to large files). For us it doesn't > matter which method is used, but the real issue is how to handle > versions of the CV. As long as nothing is deleted from the CV everything > should be fine from an implementation point of view though. > > A major problem would be if something is added to the CV which breaks > current parsers. A new compression type could be added to the CV without > notice, and if someone is using that compression type they're producing > standard compliant files, but parsers that are supposed to be standard > compliant would not be able to parse the file correctly. So, there are a > few places where I think the allowed values should be set under enum > constraints in the main standard schema, so that a new schema version is > enforced if these fields are changed. I have the feeling that CV version > will not be as controlled as the schema version. Fields that I propose > should be enums are (this is maybe one step back again...): > > In binaryDataArray: > > compressionType (no compression/zlib compression) > valueType (32-bit float, 64-bit float, 16-bit integer, 32-bit integer or > 64-bit integer) > > In spectrum: > > spectrumType (centroid, profile). > > these parameters could be attributes or cvParams (but under schema > control) if CV accession numbers are important. > > > Other comments: > > There is also an acquisitionList spectrumType attribute which probably > could be removed since we have spectrumDescription - > spectrumRepresentation (spectrumType). Only use would be if the > acquisitions were in profile mode but the peak picking algorithm that > worked on the spectra turned them into a centroid peak list and one > would like to specify this (?). > > If the spectrum is a combination of multiple scans (as specified using > acquistionList) one would normally not use the 'scan' element. The > question is then how to give the retention time? We did not succeed in > doing this in a valid way, see > > http://trac.thep.lu.se/trac/fp6-prodac/browser/trunk/mzML/FF_070504_MSMS_5B.mzML > > for a simple (but invalid way of doing it). More correct would be to put > the cvParam under the acquisition with the retention time, but this is > not allowed either. > > Why not allow softwareParam to be userParam or cvParam or must all > software that work on mzML be in the CV? > > How about having precursor m/z, intensity and charge state as > non-required attributes to ionSelection? These fields are really used in > every file. > > Final comment is though that all these things are really minor, and that > getting the standard released is what matters! > > Regards > > Fredrik > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Frank Gibson Research Associate Room 2.19, Devonshire Building School of Computing Science, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU United Kingdom Telephone: +44-191-246-4933 Fax: +44-191-246-4905 |