From: Tabb, D. P. [<dt...@su...> - 2018-04-14 10:22:22
|
Hi, all! Wout, thank you for your reply on the shortcomings of my "v0.8" qcML file to represent IDPicker IDFree. I have now produced a v0.9 revision that incorporates QuaMeter IDFree accessions from the 0.0.10 CV (I'm not sure why I couldn't find those before). Two metrics were missing from the CV (MS count and MS/MS count!), and I decided on a different reporting strategy for known and unknown precursor charge states than the CV had anticipated. I have employed unitAccession tags to provide units for single-value metrics. You can find the new demonstrator file here: https://github.com/HUPO-PSI/qcML-development/blob/master/20180403-1091_Pool_start_v0.9.qc.xml Thanks! Dave ________________________________ From: David Tabb <dt...@su...> Sent: Thursday, April 12, 2018 7:03:37 AM To: psi...@li... Subject: Re: [Psidev-qc-dev] Doing my homework for qcML Well done, Wout! I like that you were able to show instrument readings this way; I remember your working on capturing these readings in the iMonDB database, so it makes sense that we'd want to represent them in qcML. As we move to the outlier.qcml, I see that you're using RawFile to point to a source qcML document (I had earlier used RawFile to point to an mzML rather than a RAW). We may want to work out just what RawFile is supposed to represent or create other terms to represent later products in the pipeline. As Wout notes, though, the RawFile section allows us to specify the file type. The "Meta-analysis settings" set thresholds for variability; as I understand it, though, these thresholds would be applied on individual metrics rather than a dimensionality-reduced set (e.g. via PCA). When we dip down to the comment below below the metaDataParameters section, though, it appears that Wout is giving weights to show how to combine scores, perhaps in a linear combination of metrics to optimize the amount of variability explained? A PCA would really only be feasible for a defined set of metrics from a bunch of input files. This same transform, however, might then be applied to new qcML files that were not part of the set used to conduct PCA. The transform itself does not tell us that a qcML is an outlier or not. Instead, it accepts the quality metrics as an input and outputs the coordinates of this qcML in the transformed space; we can then compute distances in PC space between pairs of qcML files, and an abnormally large distance in PC space implies an outlier. Wout, thank you for trying the multi-file analysis for us! Yes, I think this highlights some areas where we have been vague to date. Merci, Dave On 4/11/2018 10:48 PM, Bittremieux Wout wrote: Dear colleagues, I have also prepared two handcrafted example files: - one from the iMonDB containing instrument parameters as opposed to ID-free/ID-based spectral metrics - one from a meta-analysis to detect low-quality experiments The files include some annotations about things to discuss. As you can see especially for the meta-analysis I'm currently not sure how to correctly store this information in a qcML file. Best, Wout > On 10 Apr 2018, at 10:15, Bittremieux Wout <wou...@ua...><mailto:wou...@ua...> wrote: > > Hi Dave, > > Sorry for the delayed answer to your questions. As far as I'm aware (please someone correct me if I'm wrong): > > - OBO and OWL are two alternative file formats to specify controlled vocabularies and ontologies. We use the OBO format for our CV. This is indeed a simpler format than OWL and can be viewed relatively easily in a simple text editor. Alternatively, Martin has previously recommended OBO-Edit to visualize the relationships between the various terms. > > - In our previous discussions we have indeed said that in principle every tool gets its own range of CV accessions. This would enable a new tool to easily start producing compliant qcML files without having to check dependencies on other tools. It might make sense to reuse some trivial definitions though for metrics that don't involve any computations, such the number of MS/MS scans. On the other hand, how will downstream tools handle conflicting metrics coming from different tools? Although in that case maybe the tool authors should be the main persons responsible to worry about this rather than us. > In any case, it would be useful to explicitly document how tools can get CV accessions. > > - To report the unit for a single QC metric you can use the unitAccession/unitName/unitCvRef attributes for any XML element that is derived from the CVParamType, which qualityParameter elements are. The CVParamType is specified in our XML schema. > > - This XML schema can also be used for simple syntactic validation and some semantic validation. Because we have a mix between XML and JSON unfortunately some external semantic validation will have to be explicitly coded as well though. > Any decent XML editor should have built-in functionality to validate XML files against a schema, and otherwise there are various command-line tools and linters you can use to do that as well. However, the previously linked v0.0.10 XML schema on GitHub does not seem to be fully up to date anymore at the moment, and I think the most recent XML schema is available in Mathias' ongoing pull request. > > - As to metrics numbering I don't think we have a process for that yet. I guess now it's just first-come-first-served. > > Best, > Wout > >> On 09 Apr 2018, at 23:56, David Tabb <dt...@su...><mailto:dt...@su...> wrote: >> >> Hi, all. >> >> I have not yet received an answer to the three questions below. I have, >> however, uploaded it to qcdev, where you can find it at this URL: >> https://github.com/HUPO-PSI/qcML-development/blob/master/20180403-1091_Pool_start_v0.8.qc.xml. >> >> Merci! >> Dave >> >> On 4/5/2018 11:29 AM, David Tabb wrote: >>> Hi, all. >>> >>> At long last, I have completed my "homework" for Heidelberg! I have >>> created a draft XML to represent the qcML output for computing QuaMeter >>> IDFree metrics for a single input mzML file (see text inline below). I >>> would greatly appreciate answers to the following questions to complete >>> this project: >>> >>> 1) If I am reporting only a single value for a metric (such as >>> "XIC-WideFrac"), how do I report the unit for the metric? >>> >>> 2) Do we have an easy way to validate a draft file like this, at least >>> to determine that I have matching end tags for each one I open? I've >>> been using Emacs, which tries to help, but a dedicated XML editor might >>> be preferable. >>> >>> 3) Who officially designates what each metric will be numbered in the CV? >>> >>> Thanks, >>> Dave >>> -------------------------------------------------------- >>> <?xml version="1.0" encoding="UTF-8"?> >>> <qcML xmlns="http://www.prime-xs.eu/ms/qcml" >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>> xsi:schemaLocation="http://www.prime-xs.eu/ms/qcml >>> file:/home/walzer/psi/qcML-development/schema/v0_0_10/qcML_0_0_10.xsd" >>> version="0.0.10"> >>> <runQuality ID="ID001"> >>> <metaDataParameters ID="fileprovenance001" cvRef="?" >>> accession="?" name="?" description="do we need cv for toplevel" >>> value="all cv attributes are optional anyway"> >>> <InputFiles> >>> <RawFile >>> location="C:\Research\20171124-Lizex-Chia\1091_Pool_start.mzML" >>> id="ID001" name="1091_Pool_start.mzML"> >>> <FileFormat> >>> <cvParam cvRef="PSI-MS" accession="MS:1000584" >>> name="mzML format"/> >>> </FileFormat> >>> <!-- In the following line, I computed the md5sum for the >>> peak-picked mzML, not the RAW! --> >>> <cvParam cvRef="PSI-MS" accession="MS:1000568" name="MD5" >>> value="b583f6d2a91b4749d5a75885330f6e5d" /> >>> <cvParam cvRef="PSI-MS" accession="MS:1000747" >>> name="completion time" value="2017-12-08-T15:38:57Z" /> >>> </RawFile> >>> </InputFiles> >>> </metaDataParameters> >>> <!-- Question to consider: how should I link a metric below to the >>> concept of "liquid chromatography" or "electrospray ionization?" Is >>> this appropriate--> >>> <!-- Units I employ below: --> >>> <!-- UO:0000191 "fraction" --> >>> <!-- UO:0000010 "second" --> >>> <!-- UO:0010006 "ratio" --> >>> <!-- UO:0000189 "count" --> >>> <!-- UO:0000106 "hertz" --> >>> <!-- When a line gives a single metric, where do I indicate the >>> unit type? UO:0000191 "fraction" --> >>> <qualityParameter ID="XIC-WideFrac" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Fraction of precursor ions >>> accounting for the top half of all peak width" value="0.206807"/> >>> <qualityParameter ID="XIC-FWHM" cvRef="PSI-QC-CV" accession="QC:" >>> name="QuaMeter IDFree Metric- Distribution of peak widths for the wide >>> XICs"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="3">{'UO:0000010':[12.5377,14.2244,16.9234]}</content></qualityParameter> >>> >>> <!-- UO:0010006 is "ratio" rather than "log ratio"--> >>> <qualityParameter ID="XIC-Height" cvRef="PSI-QC-CV" accession="QC:" >>> name="QuaMeter IDFree Metric- Distribution of peak log ratio heights for >>> the wide XICs"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="3">{'UO:0010006':[0.776393,0.93114,6.6283]}</content></qualityParameter> >>> >>> <!-- In the following, where do I indicate the unit type? >>> UO:0000010 "second" --> >>> <qualityParameter ID="RT-Duration" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Highest scan time observed >>> minus the lowest scan time observed" value="4920.17"/> >>> <qualityParameter ID="RT-TIC" cvRef="PSI-QC-CV" accession="QC:" >>> name="QuaMeter IDFree Metric- Distribution of TIC accumulation as >>> fraction of RT-Duration"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="4">{'UO:0000191':[0.301236,0.13286,0.174576,0.391328]}</content></qualityParameter> >>> >>> <qualityParameter ID="RT-MS" cvRef="PSI-QC-CV" accession="QC:" >>> name="QuaMeter IDFree Metric- Distribution of MS1 acquisition as >>> fraction of RT-Duration"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="4">{'UO:0000191':[0.217794,0.272976,0.275845,0.233385]}</content></qualityParameter> >>> >>> <qualityParameter ID="RT-MSMS" cvRef="PSI-QC-CV" accession="QC:" >>> name="QuaMeter IDFree Metric- Distribution of MS2 acquisition as >>> fraction of RT-Duration"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="4">{'UO:0000191':[0.268157,0.233516,0.236373,0.261954]}</content></qualityParameter> >>> >>> <qualityParameter ID="MS1-TIC-Change" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Distribution of log ratios >>> of MS1 scan-to-scan TIC changes"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="3">{'UO:0010006':[0.870774,0.900585,4.66521]}</content></qualityParameter> >>> >>> <qualityParameter ID="MS1-TIC" cvRef="PSI-QC-CV" accession="QC:" >>> name="QuaMeter IDFree Metric- Log ratios of MS1 scan TICs"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="3">{'UO:0010006':[0.568866,0.815636,1.18124]}</content></qualityParameter> >>> >>> <!-- In the following, where do I indicate the unit type? >>> UO:0000189 "count" --> >>> <qualityParameter ID="MS1-Count" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Number of MS1 scans >>> acquired" value="7832"/> >>> <!-- In the following, where do I indicate the unit type? >>> UO:0000106 "hertz" --> >>> <qualityParameter ID="MS1-Freq-Max" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Maximum frequency for MS1 >>> scan acquisition" value="2.41814"/> >>> <qualityParameter ID="MS1-Density" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Distribution of peak >>> counts for MS1 scans"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="3">{'UO:0000189:[693,1205,1424]}</content></qualityParameter> >>> <!-- In the following, where do I indicate the unit type? >>> UO:0000189 "count" --> >>> <qualityParameter ID="MS2-Count" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Number of MS2 scans >>> acquired" value="33495"/> >>> <!-- In the following, where do I indicate the unit type? >>> UO:0000106 "hertz" --> >>> <qualityParameter ID="MS2-Freq-Max" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Maximum frequency for MS2 >>> scan acquisition" value="7.33107"/> >>> <qualityParameter ID="MS2-Density" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Distribution of peak >>> counts for MS2 scans"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="3">{'UO:0000189:[27,44,70]}</content></qualityParameter> >>> <!-- How do we specify that the values of a vector sum to 1? How >>> about this case, where two vectors together sume to 1?--> >>> <qualityParameter ID="MS2-PrecZ-Known" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Fraction of known >>> precursor charges for +1, +2, ..., n, more than n"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000008" >>> value="4">{'UO:0000191':[0,0.15047,0.0687565,0.00877743,0.000507538,0.000268697]}</content></qualityParameter> >>> >>> <!-- For the final metric, I used a reporting type where the number >>> of elements in vector is fixed: QC:3000007--> >>> <qualityParameter ID="MS2-PrecZ-Unknown" cvRef="PSI-QC-CV" >>> accession="QC:" name="QuaMeter IDFree Metric- Fraction of unknown >>> precursor charges for +1, more than +1"> >>> <content cvRef="PSI-QC-CV" accession="QC:3000007" >>> value="2">{'UO:0000191':[0.54235,0.22887]}</content></qualityParameter> >>> </runQuality> >>> <cvList> >>> <cv fullName="The HUPO-PSI QC WG metrics ontology in obo >>> format" uri="http://www.github.com/HUPO-PSI/.../" ID="PSI-QC-CV"/> >>> </cvList> >>> </qcML> >>> -------------------------------------------------------- >>> >>> On 4/5/2018 9:47 AM, David Tabb wrote: >>>> Hi, all. >>>> >>>> In creating my hand-crafted example of qcML from QuaMeter IDFree, I've >>>> installed Protege, an ontology viewer. It's useful for perusing the >>>> HUPO-PSI MS ontology (http://purl.obolibrary.org/obo/ms/4.1.2/ms.owl). >>>> >>>> I'm a bit confused, though, in how to review the qcML ontology. At >>>> present, I can find an OBO to download >>>> (https://github.com/HUPO-PSI/qcML-development/raw/master/cv/v0_0_10/qc-cv.obo), >>>> >>>> >>>> but I do not see an OWL equivalent. Helpfully, I can simply review the >>>> OBO in a text editor. >>>> >>>> In particular, I am trying to determine which QC accessions each of my >>>> IDFree metrics will represent. Does each metric generator get a >>>> particular series of numbers that relate to that software's outputs? >>>> Can I reuse a metric accession from other software if my tool generates >>>> the same values (such as the number of MS/MS scans)? >>>> >>>> Jinmeng Jia, will you be able to share your paragraph that meets the >>>> draft MIAPE QC standard with Weimin Zhu for him to present at >>>> Heidelberg? I was sorry to hear you won't be able to attend yourself. >>>> >>>> Thanks, >>>> Dave >>>> >>>> [http://cdn.sun.ac.za/100/ProductionFooter.jpg]<http://www.sun.ac.za/english/Pages/Water-crisis.aspx<http://cdn.sun.ac.za/100/ProductionFooter.jpg]%3Chttp://www.sun.ac.za/english/Pages/Water-crisis.aspx>> >>>> >>>> >>>> >>>> The integrity and confidentiality of this email is governed by these >>>> terms. Disclaimer<http://www.sun.ac.za/emaildisclaimer> >>>> Die integriteit en vertroulikheid van hierdie e-pos word deur die >>>> volgende bepalings gereël. >>>> Vrywaringsklousule<http://www.sun.ac.za/emaildisclaimer> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>>> Check out the vibrant tech community on one of the world's most >>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>>> _______________________________________________ >>>> Psidev-qc-dev mailing list >>>> Psi...@li...<mailto:Psi...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev >>> >>> >>> [http://cdn.sun.ac.za/100/ProductionFooter.jpg]<http://www.sun.ac.za/english/Pages/Water-crisis.aspx<http://cdn.sun.ac.za/100/ProductionFooter.jpg]%3Chttp://www.sun.ac.za/english/Pages/Water-crisis.aspx>> >>> >>> >>> The integrity and confidentiality of this email is governed by these >>> terms. Disclaimer<http://www.sun.ac.za/emaildisclaimer> >>> Die integriteit en vertroulikheid van hierdie e-pos word deur die >>> volgende bepalings gereël. >>> Vrywaringsklousule<http://www.sun.ac.za/emaildisclaimer> >>> >>> ------------------------------------------------------------------------------ >>> >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> _______________________________________________ >>> Psidev-qc-dev mailing list >>> Psi...@li...<mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev >> >> >> [http://cdn.sun.ac.za/100/ProductionFooter.jpg]<http://www.sun.ac.za/english/Pages/Water-crisis.aspx<http://cdn.sun.ac.za/100/ProductionFooter.jpg]%3Chttp://www.sun.ac.za/english/Pages/Water-crisis.aspx>> >> >> The integrity and confidentiality of this email is governed by these terms. Disclaimer<http://www.sun.ac.za/emaildisclaimer> >> Die integriteit en vertroulikheid van hierdie e-pos word deur die volgende bepalings gereël. Vrywaringsklousule<http://www.sun.ac.za/emaildisclaimer> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Psidev-qc-dev mailing list >> Psi...@li...<mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ > Psidev-qc-dev mailing list > Psi...@li...<mailto:Psi...@li...> > https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Psidev-qc-dev mailing list Psi...@li...<mailto:Psi...@li...> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev |