From: Marc S. <stu...@gm...> - 2009-08-18 16:25:43
|
Hi all, I updated our semantic validator and found that two example files are not entirely valid: 1) plgs_example.mzML Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. Error: Binary data array of type 'MS:1000516 ! charge array' cannot have the value type 'MS:1000521 ! 32-bit float'. 2) tiny.pwiz.1.1.mzML Error: CV term must have a unit: MS:1000042 - intensity Error: Name of CV term not correct: 'MS:1000042 - intensity' should be 'peak intensity' Best, Marc |
From: Matthew C. <mat...@va...> - 2009-08-18 17:09:14
|
Hi Marc, I'm afraid you're jumping the gun a bit here. The documents you mention used older versions of the CV. It seems to be: http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo?revision=1.84&view=markup for PLGS and http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo?revision=1.75&view=markup for tiny.pwiz This version difference explains the validation problems you're seeing. The integer value types were then obsoleted and the intensity term hadn't been renamed. It's up for debate whether these examples should be updated to use the newest CV, but the files are valid the way they are now. It's good to catch this though because it's something a validator will have to deal with. You can't validate a file using an old CV against the newest CV and expect it to be error-free. At the same time, I expect you use some compile-time magic for the CV like we do in pwiz so supporting older versions of the CV could be troublesome. I'm not sure how to fix the problem, but I can diagnose it. :) On another note, it was a pain in the ass to link up the CV version to the CVS version. We should definitely, definitely, definitely put RCS keywords in the OBO and mapping files so that we can match up the CVS/SVN revision to the OBO/mapping file itself. This will add a line like: remark: $Id: psi-ms.obo 1.42 2009-08-14 22:12:04Z chambm $ to the OBO file and <-- $Id: ms-mapping.xml 142 2009-08-14 22:12:04Z chambm $ --> to the mapping file. Are there any objections to the RCS keywords? -Matt Marc Sturm wrote: > Hi all, > > I updated our semantic validator and found that two example files are > not entirely valid: > > 1) plgs_example.mzML > > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > Error: Binary data array of type 'MS:1000516 ! charge array' cannot have > the value type 'MS:1000521 ! 32-bit float'. > > 2) tiny.pwiz.1.1.mzML > > Error: CV term must have a unit: MS:1000042 - intensity > Error: Name of CV term not correct: 'MS:1000042 - intensity' should be > 'peak intensity' > > Best, > Marc > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Marc S. <st...@in...> - 2009-08-18 20:26:40
|
Hi all, > I'm afraid you're jumping the gun a bit here. The documents you mention > used older versions of the CV. It seems to be: > http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo?revision=1.84&view=markup > for PLGS and > http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo?revision=1.75&view=markup > for tiny.pwiz > > This version difference explains the validation problems you're seeing. I see the problem, but I'm not happy with it :( > > The integer value types were then obsoleted and the intensity term > hadn't been renamed. It's up for debate whether these examples should be > updated to use the newest CV, but the files are valid the way they are > now. If i remember the discussion in Turku right, we agreed on the following: All changes must be backward-compatible within one version number. Otherwise the (second/third) version number has to be incremented. > It's good to catch this though because it's something a validator > will have to deal with.You can't validate a file using an old CV > against the newest CV and expect it to be error-free. Allowing all CV version will lead hundreds of dialects. In contrast to pwiz, OpenMS does not annotate the classes with arbitrary CV terms. We have modelled the functionality of mzData, mzXML and mzML with classes and fixed member variables. Thus, there is no way for us to support all CV version. The pwiz approach supports all CV version, but the problem arises later - programs using CV terms can never be sure which CV terms to expect in certain class. My conclusion is that both pwiz and OpenMS have really big problems to support all CV versions. > At the same time, > I expect you use some compile-time magic for the CV like we do in pwiz > so supporting older versions of the CV could be troublesome. I'm not > sure how to fix the problem, but I can diagnose it. :) > Right! We cannot ship all intermediate CV version with OpenMS. Downloading the CV version each time is not really an option either. There is always proxy issues or you might have no internet connection. > On another note, it was a pain in the ass to link up the CV version to > the CVS version. We should definitely, definitely, definitely put RCS > keywords in the OBO and mapping files so that we can match up the > CVS/SVN revision to the OBO/mapping file itself. This will add a line like: > remark: $Id: psi-ms.obo 1.42 2009-08-14 22:12:04Z chambm $ > to the OBO file and > <-- $Id: ms-mapping.xml 142 2009-08-14 22:12:04Z chambm $ --> > to the mapping file. > > Are there any objections to the RCS keywords? > Nope. -Marc > Marc Sturm wrote: > >> Hi all, >> >> I updated our semantic validator and found that two example files are >> not entirely valid: >> >> 1) plgs_example.mzML >> >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >> the value type 'MS:1000521 ! 32-bit float'. >> >> 2) tiny.pwiz.1.1.mzML >> >> Error: CV term must have a unit: MS:1000042 - intensity >> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >> 'peak intensity' >> >> Best, >> Marc >> >> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Fredrik L. <Fre...@im...> - 2009-08-27 07:56:23
|
Now the plgs_example is updated to use the latest CV. This also means that it has the charge arrays as encoded integers. Interestingly, the PLGS source file has some non-integer values for peak charge states (like 1.2043), so there is now some data loss introduced upon conversion. Even if non-integer charge states don't make much sense, it could be a way of flagging that a peak is a mixture of two peaks with different charge states. So the float representation we had before is maybe better in the end anyway... Fredrik |
From: Matthew C. <mat...@va...> - 2009-08-18 20:48:20
|
As I recall, the versioning discussion with the CV was about when to increment the various fields and what it would mean when they increment. It had nothing to do with the schema or mapping file versions. Obsoleting a term in the CV increments the minor version IIRC, but that doesn't mean that files using older CVs and thus legitimately using those terms should break. ProteoWizard deals with obsolete terms by marking them as such in the giant CVID enum we generate with each new CV: http://proteowizard.svn.sourceforge.net/viewvc/proteowizard/trunk/pwiz/pwiz/data/msdata/cv.hpp For dealing with term name changes (which are subminor version increments) when reading files, we ignore the term name and just assign the most current name to each term's accession number. With this design, we can approximate supporting older CV versions. We used to leave out obsolete terms in our cv.hpp and I changed that precisely for this reason. :) You did an object mapping from the CV/schema to C++ classes? So you have classes with lots of strongly typed fields that will very often not have any legitimate value? Wasn't that one of the reasons we went with the CV approach in the schema instead of a traditional attribute approach? I can't see an option about using newer CV terms than the one your code is compiled for than to download the newer CV either for a new compile and release, or on the fly when you encounter an unknown accession number. With pwiz we currently rely on the former approach: files with newer CVs will probably break older pwiz versions. But I'm open to implementing the on-the-fly CV downloading when it seems necessary. You don't have to have online access to support older CVs though. You can either archive all the CVs and ship them in a zip or something, or you can use a design like ours and make the current CV "all-inclusive" so as to support all previous CVs, including obsolete terms. -Matt Marc Sturm wrote: > Hi all, > >> I'm afraid you're jumping the gun a bit here. The documents you mention >> used older versions of the CV. It seems to be: >> http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo?revision=1.84&view=markup >> for PLGS and >> http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo?revision=1.75&view=markup >> for tiny.pwiz >> >> This version difference explains the validation problems you're seeing. >> > I see the problem, but I'm not happy with it :( > >> >> The integer value types were then obsoleted and the intensity term >> hadn't been renamed. It's up for debate whether these examples should be >> updated to use the newest CV, but the files are valid the way they are >> now. >> > If i remember the discussion in Turku right, we agreed on the following: > All changes must be backward-compatible within one version number. > Otherwise the (second/third) version number has to be incremented. > >> It's good to catch this though because it's something a validator >> will have to deal with.You can't validate a file using an old CV >> against the newest CV and expect it to be error-free. >> > Allowing all CV version will lead hundreds of dialects. In contrast to > pwiz, OpenMS does not annotate the classes with arbitrary CV terms. We > have modelled the functionality of mzData, mzXML and mzML with classes > and fixed member variables. Thus, there is no way for us to support all > CV version. The pwiz approach supports all CV version, but the problem > arises later - programs using CV terms can never be sure which CV terms > to expect in certain class. My conclusion is that both pwiz and OpenMS > have really big problems to support all CV versions. > >> At the same time, >> I expect you use some compile-time magic for the CV like we do in pwiz >> so supporting older versions of the CV could be troublesome. I'm not >> sure how to fix the problem, but I can diagnose it. :) >> >> > Right! We cannot ship all intermediate CV version with OpenMS. > Downloading the CV version each time is not really an option either. > There is always proxy issues or you might have no internet connection. > >> On another note, it was a pain in the ass to link up the CV version to >> the CVS version. We should definitely, definitely, definitely put RCS >> keywords in the OBO and mapping files so that we can match up the >> CVS/SVN revision to the OBO/mapping file itself. This will add a line like: >> remark: $Id: psi-ms.obo 1.42 2009-08-14 22:12:04Z chambm $ >> to the OBO file and >> <-- $Id: ms-mapping.xml 142 2009-08-14 22:12:04Z chambm $ --> >> to the mapping file. >> >> Are there any objections to the RCS keywords? >> >> > Nope. > > -Marc > > >> Marc Sturm wrote: >> >> >>> Hi all, >>> >>> I updated our semantic validator and found that two example files are >>> not entirely valid: >>> >>> 1) plgs_example.mzML >>> >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> Error: Binary data array of type 'MS:1000516 ! charge array' cannot have >>> the value type 'MS:1000521 ! 32-bit float'. >>> >>> 2) tiny.pwiz.1.1.mzML >>> >>> Error: CV term must have a unit: MS:1000042 - intensity >>> Error: Name of CV term not correct: 'MS:1000042 - intensity' should be >>> 'peak intensity' >>> >>> Best, >>> Marc >>> >>> >>> ------------------------------------------------------------------------------ >>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >>> trial. Simplify your report design, integration and deployment - and focus on >>> what you do best, core application coding. Discover what's new with >>> Crystal Reports now. http://p.sf.net/sfu/bobj-july >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Marc S. <st...@in...> - 2009-08-18 21:19:37
|
Hi, > As I recall, the versioning discussion with the CV was about when to > increment the various fields and what it would mean when they increment. > It had nothing to do with the schema or mapping file versions. > Obsoleting a term in the CV increments the minor version IIRC, but that > doesn't mean that files using older CVs and thus legitimately using > those terms should break. ProteoWizard deals with obsolete terms by > marking them as such in the giant CVID enum we generate with each new CV: > http://proteowizard.svn.sourceforge.net/viewvc/proteowizard/trunk/pwiz/pwiz/data/msdata/cv.hpp > For dealing with term name changes (which are subminor version > increments) when reading files, we ignore the term name and just assign > the most current name to each term's accession number. Name changes do not matter in our approach. The accession is mapped to a member. That is not problem. > With this design, > we can approximate supporting older CV versions. We used to leave out > obsolete terms in our cv.hpp and I changed that precisely for this > reason. :) > > You did an object mapping from the CV/schema to C++ classes? So you have > classes with lots of strongly typed fields that will very often not have > any legitimate value? Wasn't that one of the reasons we went with the CV > approach in the schema instead of a traditional attribute approach? > OpenMS is older than mzML. Most of the meta data classes were written before mzData 1.0. We already had tons of code that uses all the meta data, thus, starting all over was no option :) I have merged the models of mzData, mzXML and mzML by hand. Missing meta has so far been only a minor problem.... > I can't see an option about using newer CV terms than the one your code > is compiled for than to download the newer CV either for a new compile > and release, or on the fly when you encounter an unknown accession > number. With pwiz we currently rely on the former approach: files with > newer CVs will probably break older pwiz versions. Right now we update the file only for a new release as well. We'll see how that works out... -Marc |