From: Johannes J. <dr....@go...> - 2008-06-23 11:31:09
|
Hi there, I'm trying to implement mzML 1.0.0 and I just stumbled across a problem: I don't understand how the supDataArrays of mzData are represented in mzML. Obviously there are two obligatory binary arrays for m/z and intensity (which can be identified by the corresponding CV terms for m/z and intensity, respectively). But I don't know how to identify the other arrays that might follow. Is there a CV term for the name of such a supplementary binary array? I couldn't find anything. How does this work? It would be very nice if you could help me with this!! Thanks in advance! Best regards, Johannes Junker |
From: Matthew C. <mat...@va...> - 2008-06-23 15:32:05
|
Hi Johannes, If the supplemental array is defined in the CV (right now it's just m/z, time, and intensity), you can use a CV term. Otherwise, you'll have to use a userParam. If it's a kind of array you think should be in the CV, you can make that suggestion as well. I know we should probably put transient array in there at least (time array can be used for time domain data). For something like charge states, that's probably going to stay a userParam AFAIK. -Matt Johannes Junker wrote: > Hi there, > > I'm trying to implement mzML 1.0.0 and I just stumbled across a problem: > I don't understand how the supDataArrays of mzData are represented in > mzML. Obviously there are two obligatory binary arrays for m/z and > intensity (which can be identified by the corresponding CV terms for m/z > and intensity, respectively). But I don't know how to identify the other > arrays that might follow. Is there a CV term for the name of such a > supplementary binary array? I couldn't find anything. How does this work? > > It would be very nice if you could help me with this!! > > Thanks in advance! > > Best regards, > Johannes Junker > > |
From: Matthew C. <mat...@va...> - 2008-06-23 15:37:13
|
Nevermind, there already are terms for charge and signal to noise arrays. :) -Matt Matthew Chambers wrote: > Hi Johannes, > > If the supplemental array is defined in the CV (right now it's just m/z, > time, and intensity), you can use a CV term. Otherwise, you'll have to > use a userParam. If it's a kind of array you think should be in the CV, > you can make that suggestion as well. I know we should probably put > transient array in there at least (time array can be used for time > domain data). For something like charge states, that's probably going to > stay a userParam AFAIK. > > -Matt > > > Johannes Junker wrote: > >> Hi there, >> >> I'm trying to implement mzML 1.0.0 and I just stumbled across a problem: >> I don't understand how the supDataArrays of mzData are represented in >> mzML. Obviously there are two obligatory binary arrays for m/z and >> intensity (which can be identified by the corresponding CV terms for m/z >> and intensity, respectively). But I don't know how to identify the other >> arrays that might follow. Is there a CV term for the name of such a >> supplementary binary array? I couldn't find anything. How does this work? >> >> It would be very nice if you could help me with this!! >> >> Thanks in advance! >> >> Best regards, >> Johannes Junker >> |
From: Marc S. <st...@in...> - 2008-06-24 06:13:34
|
Hi all, i think there should be a term 'named custom array' (we can discuss about the name) that contains the name of the array in the 'value' attribute. Putting the name in the UserParam is too unstructured in my opinion. There won't be two tools that store the name in the same way... <binaryDataArray encodedLength="12"> <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/> <cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/> <cvParam cvRef="MS" accession="MS:????????" name="named custom array" value="full width at half max"/> <binary>AAAAAAAANEAAAAAAAA</binary> </binaryDataArray> What do you think? Best, Marc > If the supplemental array is defined in the CV (right now it's just m/z, > time, and intensity), you can use a CV term. Otherwise, you'll have to > use a userParam. If it's a kind of array you think should be in the CV, > you can make that suggestion as well. I know we should probably put > transient array in there at least (time array can be used for time > domain data). For something like charge states, that's probably going to > stay a userParam AFAIK. |
From: Eric D. <ede...@sy...> - 2008-06-24 15:48:15
|
Hi Marc, I think we would be better off creating CV terms for all the kinds of arrays people want to encode. So I'm much rather get a request that someone's software wants to write out "full width at half maximum" and create a term, furnish an accession number, and thereby publicly let all writer and reader authors know that this is a legal entity that could occur. No schema change is necessary. I find this preferable to having a vague slot that could be filled with full width at half maximum full width at half max FWHM in an uncontrolled and variable way. This is our general aim for mzML. We would like to steer away from custom ways of encoding data as much as possible. Does that seem reasonable? Would you like "full width at half maximum" to be added to the CV? Thanks, Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Marc Sturm > Sent: Monday, June 23, 2008 11:14 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] mzML's binary arrays > > Hi all, > > i think there should be a term 'named custom array' (we can discuss about > the name) that contains the name of the array in the 'value' attribute. > Putting the name in the UserParam is too unstructured in my opinion. There > won't be two tools that store the name in the same way... > > <binaryDataArray encodedLength="12"> > <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" > value=""/> > <cvParam cvRef="MS" accession="MS:1000576" name="no compression" > value=""/> > <cvParam cvRef="MS" accession="MS:????????" name="named custom array" > value="full width at half max"/> > <binary>AAAAAAAANEAAAAAAAA</binary> > </binaryDataArray> > > What do you think? > > Best, > Marc > > > If the supplemental array is defined in the CV (right now it's just m/z, > > time, and intensity), you can use a CV term. Otherwise, you'll have to > > use a userParam. If it's a kind of array you think should be in the CV, > > you can make that suggestion as well. I know we should probably put > > transient array in there at least (time array can be used for time > > domain data). For something like charge states, that's probably going to > > stay a userParam AFAIK. > > > > > > ------------------------------------------------------------------------ - > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Marc S. <st...@in...> - 2008-06-25 07:16:29
|
Hi Eric, i agree that less custom encoding is desirable to make files exchangable between different tools. However in this case i would go both ways, depending on the significance of the term. (1) I use these arrays to store debug information of an algorithm and add quite a few arrays, depending on the charge states i look at: pattern_score_1 pattern_score_2 pattern_score_... (one per charge) intensity_score local_maximum trace_score Putting the name in the userParam is not a good idea because it makes these arrays unusable for other tools - too custom in my opinion. Adding a CV term for each such debug variable would however be too much. So i think the intermediate way is just right: Terms which are not generally usable should be put to a 'named custom array'. This would correpond to an optional XML attribute 'name' for the 'binaryDataArray' tag. But we have to state clearly in the documentation that for more general terms, a CV entry should be added. (2) After peak picking we store much more information than the position and intensity. The arrays there are: SignalToNoise fwhm leftWidth rightWidth maximumIntensity peakShape rValue 'SignalToNoise' is alread a CV term. 'fwhm' would be a good candidate for a CV term as well. The rest is more algorihtm-dependent and no general concept which is why we could simply store them in a 'named custom array'. What do you think? Best, Marc > Hi Marc, I think we would be better off creating CV terms for all the > kinds of arrays people want to encode. So I'm much rather get a request > that someone's software wants to write out "full width at half maximum" > and create a term, furnish an accession number, and thereby publicly let > all writer and reader authors know that this is a legal entity that > could occur. No schema change is necessary. > > I find this preferable to having a vague slot that could be filled with > > full width at half maximum > full width at half max > FWHM > > in an uncontrolled and variable way. > > This is our general aim for mzML. We would like to steer away from > custom ways of encoding data as much as possible. > > Does that seem reasonable? > > Would you like "full width at half maximum" to be added to the CV? |
From: Matt C. <mat...@va...> - 2008-06-25 12:50:27
|
Hi Marc, I can't think of any notable semantic difference between userParams and cvParams with uncontrolled string values. In both cases, you would have to deal with an extra term that only your algorithm and its downstream users know about. In both cases, the extra algorithm-specific arrays are unusable for other tools (except ones you make of course or that are made specifically to work with it). In both cases, the uncontrolled string value cannot be relied on except in very controlled circumstances. However, if your peak picking algorithm is versioned, it's exactly the kind of thing we want in the CV. We want a term to briefly describe the algorithm (which would go in dataProcessing) and also terms to describe the parameters that a user can set. At the same time, CV terms for your custom extra arrays could be added as well. -Matt Marc Sturm wrote: > Hi Eric, > > i agree that less custom encoding is desirable to make files exchangable > between different tools. However in this case i would go both ways, > depending on the significance of the term. > > (1) > I use these arrays to store debug information of an algorithm and add > quite a few arrays, depending on the charge states i look at: > > pattern_score_1 > pattern_score_2 > pattern_score_... (one per charge) > intensity_score > local_maximum > trace_score > > Putting the name in the userParam is not a good idea because it makes > these arrays unusable for other tools - too custom in my opinion. > Adding a CV term for each such debug variable would however be too much. > So i think the intermediate way is just right: > Terms which are not generally usable should be put to a 'named custom > array'. This would correpond to an optional XML attribute 'name' for the > 'binaryDataArray' tag. > But we have to state clearly in the documentation that for more general > terms, a CV entry should be added. > > (2) > After peak picking we store much more information than the position and > intensity. The arrays there are: > > SignalToNoise > fwhm > leftWidth > rightWidth > maximumIntensity > peakShape > rValue > > 'SignalToNoise' is alread a CV term. 'fwhm' would be a good candidate > for a CV term as well. > The rest is more algorihtm-dependent and no general concept which is why > we could simply store them in a 'named custom array'. > > What do you think? > > Best, > Marc > > >> Hi Marc, I think we would be better off creating CV terms for all the >> kinds of arrays people want to encode. So I'm much rather get a request >> that someone's software wants to write out "full width at half maximum" >> and create a term, furnish an accession number, and thereby publicly let >> all writer and reader authors know that this is a legal entity that >> could occur. No schema change is necessary. >> >> I find this preferable to having a vague slot that could be filled with >> >> full width at half maximum >> full width at half max >> FWHM >> >> in an uncontrolled and variable way. >> >> This is our general aim for mzML. We would like to steer away from >> custom ways of encoding data as much as possible. >> >> Does that seem reasonable? >> >> Would you like "full width at half maximum" to be added to the CV? >> |
From: Marc S. <st...@in...> - 2008-06-25 14:15:42
|
Hi Matt, > I can't think of any notable semantic difference between userParams and > cvParams with uncontrolled string values. In both cases, you would have > to deal with an extra term that only your algorithm and its downstream > users know about. In both cases, the extra algorithm-specific arrays are > unusable for other tools (except ones you make of course or that are > made specifically to work with it). In both cases, the uncontrolled > string value cannot be relied on except in very controlled circumstances. > That's not entirely true in my opinion. We have a small command line tool that can display statistics about these arrays. Without a defined way to give the array a name the statistics would look like that: array 0: min: 234 max: 435435 avg: 4545 array 1: min: 234 max: 435435 avg: 4545 With a defined way of naming arrays it would look like that: array 'some descritpion of the array content 1': min: 234 max: 435435 avg: 4545 array 'some descritpion of the array content 2': min: 234 max: 435435 avg: 4545 At least to me the second alternative looks much better. Of cause we can store the name in the userParam 'name'. But other tools would store it in the userParam 'Name' or 'custom_name' or 'custom name' ... I really think there should be a controlled way to give an array a user-defined name. There was a way in mzData (optional XML attribute 'name') and it's a step back not to have one in mzML. > However, if your peak picking algorithm is versioned, it's exactly the > kind of thing we want in the CV. We want a term to briefly describe the > algorithm (which would go in dataProcessing) and also terms to describe > the parameters that a user can set. At the same time, CV terms for your > custom extra arrays could be added as well. > We'll compile a list of the TOPP tools with short descriptions and post it on this mailing list. The parameters will most likely not be included as the current count of parameters for all tools is 538 and they might change from release to release. Best, Marc |
From: Matthew C. <mat...@va...> - 2008-06-25 15:03:32
|
Hi Marc, I admit I was hasty to say there was no difference. As you point out, the CV way makes it a "categorized" comment wheres the userParam is totally uncategorized. I still think a special term for such a comment is counter-productive for encouraging inter-compatible software. Instead, a small modification to your software would allow you to enumerate all the userParams and cvParams in the array and output them in name-value pairs. So you'd have: array 0: name="array name" type="array type" units="signal-to-noise ratio" min: 234 max: 435435 avg: 4545 Putting in "categorized" comment terms is a slippery slope IMO. It would lead to other similar terms in other places and ultimately we'd be like mzData with little or no control over the values of string variables, doom and gloom notwithstanding. ;) At least is could lead to implementations of mzML that are widely incompatible. As for 528 parameters across all your tools, how many of those are just for your data processing algorithms (that take raw data mzML as input and could write processed mzML as output, as opposed to a search engine which would write pepXML or analysisXML)? For dataProcessing, I at least would prefer algorithm-centric terms instead of tool-centric terms since many tools could implement the same algorithm and more importantly, many tools will run multiple algorithms. So we would have a term for boxcar smoothing, Savitzky-Golay smoothing, etc. Some algorithms might be coupled tightly with proprietary tools (e.g. Mascot Distiller or Protein Pilot's peak picking) but we can still call it something like the "Protein Pilot Peak Picker". :) -Matt Marc Sturm wrote: > Hi Matt, > >> I can't think of any notable semantic difference between userParams and >> cvParams with uncontrolled string values. In both cases, you would have >> to deal with an extra term that only your algorithm and its downstream >> users know about. In both cases, the extra algorithm-specific arrays are >> unusable for other tools (except ones you make of course or that are >> made specifically to work with it). In both cases, the uncontrolled >> string value cannot be relied on except in very controlled circumstances. >> >> > That's not entirely true in my opinion. We have a small command line > tool that can display statistics about these arrays. > Without a defined way to give the array a name the statistics would look > like that: > > array 0: > min: 234 > max: 435435 > avg: 4545 > array 1: > min: 234 > max: 435435 > avg: 4545 > > With a defined way of naming arrays it would look like that: > > array 'some descritpion of the array content 1': > min: 234 > max: 435435 > avg: 4545 > array 'some descritpion of the array content 2': > min: 234 > max: 435435 > avg: 4545 > > At least to me the second alternative looks much better. Of cause we can > store the name in the userParam 'name'. > But other tools would store it in the userParam 'Name' or 'custom_name' > or 'custom name' ... > I really think there should be a controlled way to give an array a > user-defined name. > There was a way in mzData (optional XML attribute 'name') and it's a > step back not to have one in mzML. > > >> However, if your peak picking algorithm is versioned, it's exactly the >> kind of thing we want in the CV. We want a term to briefly describe the >> algorithm (which would go in dataProcessing) and also terms to describe >> the parameters that a user can set. At the same time, CV terms for your >> custom extra arrays could be added as well. >> >> > We'll compile a list of the TOPP tools with short descriptions and post > it on this mailing list. > The parameters will most likely not be included as the current count of > parameters for all tools is 538 and they might change from release to > release. > > Best, > Marc > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Marc S. <st...@in...> - 2008-06-26 06:45:39
|
Hi Mat, > I admit I was hasty to say there was no difference. As you point out, > the CV way makes it a "categorized" comment wheres the userParam is > totally uncategorized. I still think a special term for such a comment > is counter-productive for encouraging inter-compatible software. > > Instead, a small modification to your software would allow you to > enumerate all the userParams and cvParams in the array and output them > in name-value pairs. So you'd have: > array 0: > name="array name" type="array type" units="signal-to-noise ratio" > min: 234 > max: 435435 > avg: 4545 > > Putting in "categorized" comment terms is a slippery slope IMO. It would > lead to other similar terms in other places and ultimately we'd be like > mzData with little or no control over the values of string variables, > doom and gloom notwithstanding. ;) At least is could lead to > implementations of mzML that are widely incompatible. > No offense , but your suggestion is only a workaround and ignores the real problem. What if there are 10, 20 or 50. We also have a GUI for the statistics output. There, we simply do not have the space to display all userParams. I still think there should be a designated place for a user-definable name or identifier. Otherwise, the meaning of all arrays that have no explicit CV name is lost (@Eric - this is what i meant when i said that the userParam is unusable for other tools). I really do not care, how this is implemented, but it should exist. Putting vital information like this into userParam is the best way to produce non-inter-compatible files IMHO. > As for 528 parameters across all your tools, how many of those are just > for your data processing algorithms (that take raw data mzML as input > and could write processed mzML as output, as opposed to a search engine > which would write pepXML or analysisXML)? For dataProcessing, I at least > would prefer algorithm-centric terms instead of tool-centric terms since > many tools could implement the same algorithm and more importantly, many > tools will run multiple algorithms. So we would have a term for boxcar > smoothing, Savitzky-Golay smoothing, etc. Some algorithms might be > coupled tightly with proprietary tools (e.g. Mascot Distiller or Protein > Pilot's peak picking) but we can still call it something like the > "Protein Pilot Peak Picker". :) > All of out tools only perform one algorithm. The parameters are used to fine-tune the behavior of the algorithm. They are quite implementation-specific and therefor sometimes change. We can see about that after we've compiled the list of tools and short descriptions. Best, Marc |
From: Pierre-Alain B. <pie...@is...> - 2008-06-27 08:24:41
|
Hi, in general terms, userParams are sets of params that are difficult to align in a common CV and that might be tools specific. In order to comply with MIAPE (and in more general term with the idea that the provided information should be sufficient to understand how the data are obtained, in technical terms), tools are knowing themselves what are the relevant and important parameters to provide. It is therefore the responsibility of the tool provider to write a mzML document with the appropriate set of params (both cvParams and userParams) in order to sufficiently annotate the supported data. I understand that it might be difficult to constrain userParams uses, but the tools might be generating a doc that defines their own params definitions for 3rd party tools that would need these terms. Best Pierre-Alain Marc Sturm wrote: > Hi Mat, > >> I admit I was hasty to say there was no difference. As you point out, >> the CV way makes it a "categorized" comment wheres the userParam is >> totally uncategorized. I still think a special term for such a comment >> is counter-productive for encouraging inter-compatible software. >> >> Instead, a small modification to your software would allow you to >> enumerate all the userParams and cvParams in the array and output them >> in name-value pairs. So you'd have: >> array 0: >> name="array name" type="array type" units="signal-to-noise ratio" >> min: 234 >> max: 435435 >> avg: 4545 >> >> Putting in "categorized" comment terms is a slippery slope IMO. It would >> lead to other similar terms in other places and ultimately we'd be like >> mzData with little or no control over the values of string variables, >> doom and gloom notwithstanding. ;) At least is could lead to >> implementations of mzML that are widely incompatible. >> >> > No offense , but your suggestion is only a workaround and ignores the > real problem. What if there are 10, 20 or 50. We also have a GUI for the > statistics output. There, we simply do not have the space to display all > userParams. > > I still think there should be a designated place for a user-definable > name or identifier. Otherwise, the meaning of all arrays that have no > explicit CV name is lost (@Eric - this is what i meant when i said that > the userParam is unusable for other tools). I really do not care, how > this is implemented, but it should exist. Putting vital information like > this into userParam is the best way to produce non-inter-compatible > files IMHO. > >> As for 528 parameters across all your tools, how many of those are just >> for your data processing algorithms (that take raw data mzML as input >> and could write processed mzML as output, as opposed to a search engine >> which would write pepXML or analysisXML)? For dataProcessing, I at least >> would prefer algorithm-centric terms instead of tool-centric terms since >> many tools could implement the same algorithm and more importantly, many >> tools will run multiple algorithms. So we would have a term for boxcar >> smoothing, Savitzky-Golay smoothing, etc. Some algorithms might be >> coupled tightly with proprietary tools (e.g. Mascot Distiller or Protein >> Pilot's peak picking) but we can still call it something like the >> "Protein Pilot Peak Picker". :) >> >> > All of out tools only perform one algorithm. The parameters are used to > fine-tune the behavior of the algorithm. They are quite > implementation-specific and therefor sometimes change. We can see about > that after we've compiled the list of tools and short descriptions. > > > Best, > Marc > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Eric D. <ede...@sy...> - 2008-06-26 03:17:11
|
Hi Marc, you make some good points, but it goes against our general design plan. We will have to discuss this at the next telecon, but I definitely want to hear from the other designers. One question. You say below: > Putting the name in the userParam is not a good idea because it makes > these arrays unusable for other tools - too custom in my opinion. Why is a userParam unusable for other tools? Any tool can use a userParam or a cvParam as it sees fit. It's just that cvParams are officially sanctioned concepts and the userParam is whatever you want it to be. We'll discuss at the next call. Thanks, Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Marc Sturm > Sent: Wednesday, June 25, 2008 12:17 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] mzML's binary arrays > > Hi Eric, > > i agree that less custom encoding is desirable to make files exchangable > between different tools. However in this case i would go both ways, > depending on the significance of the term. > > (1) > I use these arrays to store debug information of an algorithm and add > quite a few arrays, depending on the charge states i look at: > > pattern_score_1 > pattern_score_2 > pattern_score_... (one per charge) > intensity_score > local_maximum > trace_score > > Putting the name in the userParam is not a good idea because it makes > these arrays unusable for other tools - too custom in my opinion. > Adding a CV term for each such debug variable would however be too much. > So i think the intermediate way is just right: > Terms which are not generally usable should be put to a 'named custom > array'. This would correpond to an optional XML attribute 'name' for the > 'binaryDataArray' tag. > But we have to state clearly in the documentation that for more general > terms, a CV entry should be added. > > (2) > After peak picking we store much more information than the position and > intensity. The arrays there are: > > SignalToNoise > fwhm > leftWidth > rightWidth > maximumIntensity > peakShape > rValue > > 'SignalToNoise' is alread a CV term. 'fwhm' would be a good candidate > for a CV term as well. > The rest is more algorihtm-dependent and no general concept which is why > we could simply store them in a 'named custom array'. > > What do you think? > > Best, > Marc > > > Hi Marc, I think we would be better off creating CV terms for all the > > kinds of arrays people want to encode. So I'm much rather get a request > > that someone's software wants to write out "full width at half maximum" > > and create a term, furnish an accession number, and thereby publicly let > > all writer and reader authors know that this is a legal entity that > > could occur. No schema change is necessary. > > > > I find this preferable to having a vague slot that could be filled with > > > > full width at half maximum > > full width at half max > > FWHM > > > > in an uncontrolled and variable way. > > > > This is our general aim for mzML. We would like to steer away from > > custom ways of encoding data as much as possible. > > > > Does that seem reasonable? > > > > Would you like "full width at half maximum" to be added to the CV? > > > > ------------------------------------------------------------------------ - > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Marc S. <st...@in...> - 2008-07-04 09:13:17
|
Hi all, attached is a list of TOPP tools that will be able to process mzML (as soon as we have a stable implementation). I guess a "TOPP software" subsection in the "software" section would be best. Thanks in advance to whoever adds our tools to the CV. Best, Marc TOPP software -- TOPP (The OpenMS proteomics pipeline) software for mass spectrometry. |- BaselineFilter -- Removes the baseline from profile spectra using a top-hat filter. |- DBExporter -- Exports data from an OpenMS database to a file. |- DBImporter -- Imports data to an OpenMS database. |- FileConverter -- Converts between different MS file formats. |- FileFilter -- Extracts or manipulates portions of data from peak, feature or consensus feature files. |- FileMerger -- Merges several MS files into one file. |- InternalCalibration -- Applies an internal calibration. |- MapAligner -- Corrects retention time distortions between maps. |- MapNormalizer -- Normalizes peak intensities in an MS run. |- NoiseFilter -- Removes noise from profile spectra by using different smoothing techniques. |- PeakPicker -- Finds mass spectrometric peaks in profile mass spectra. |- Resampler -- Transforms an LC/MS map into a resampled map or a png image. |- SpectraFilter -- Applies a filter to peak spectra. |- TOFCalibration -- Applies time of flight calibration. |
From: sneumann <sne...@ip...> - 2008-06-25 10:24:52
|
Moin! Sehe ich das richtig, dass der FeatureFinder demnächst neben FeatureXML auch mzML ausspuckt ? Gruss, Steffen On Mi, 2008-06-25 at 09:16 +0200, Marc Sturm wrote: > Hi Eric, > > i agree that less custom encoding is desirable to make files exchangable > between different tools. However in this case i would go both ways, > depending on the significance of the term. > > (1) > I use these arrays to store debug information of an algorithm and add > quite a few arrays, depending on the charge states i look at: > > pattern_score_1 > pattern_score_2 > pattern_score_... (one per charge) > intensity_score > local_maximum > trace_score > > Putting the name in the userParam is not a good idea because it makes > these arrays unusable for other tools - too custom in my opinion. > Adding a CV term for each such debug variable would however be too much. > So i think the intermediate way is just right: > Terms which are not generally usable should be put to a 'named custom > array'. This would correpond to an optional XML attribute 'name' for the > 'binaryDataArray' tag. > But we have to state clearly in the documentation that for more general > terms, a CV entry should be added. > > (2) > After peak picking we store much more information than the position and > intensity. The arrays there are: > > SignalToNoise > fwhm > leftWidth > rightWidth > maximumIntensity > peakShape > rValue > > 'SignalToNoise' is alread a CV term. 'fwhm' would be a good candidate > for a CV term as well. > The rest is more algorihtm-dependent and no general concept which is why > we could simply store them in a 'named custom array'. > > What do you think? > > Best, > Marc > > > Hi Marc, I think we would be better off creating CV terms for all the > > kinds of arrays people want to encode. So I'm much rather get a request > > that someone's software wants to write out "full width at half maximum" > > and create a term, furnish an accession number, and thereby publicly let > > all writer and reader authors know that this is a legal entity that > > could occur. No schema change is necessary. > > > > I find this preferable to having a vague slot that could be filled with > > > > full width at half maximum > > full width at half max > > FWHM > > > > in an uncontrolled and variable way. > > > > This is our general aim for mzML. We would like to steer away from > > custom ways of encoding data as much as possible. > > > > Does that seem reasonable? > > > > Would you like "full width at half maximum" to be added to the CV? > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
From: Slotta, D. (NIH/NLM/N. [E] <sl...@nc...> - 2008-06-25 13:44:40
|
A translation for those of you lucky enough not have an office mate whose native language is German: "Hey, If I am reading this right, the FeatureFinder will soon spit out mzML as well as FeatureXML?" Greetings, Steffen" Douglas > -----Original Message----- > From: sneumann [mailto:sne...@ip...] > Sent: Wednesday, June 25, 2008 6:23 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] mzML's binary arrays > > Moin! > > Sehe ich das richtig, dass der FeatureFinder demnächst neben FeatureXML > auch mzML ausspuckt ? > > Gruss, > Steffen > > On Mi, 2008-06-25 at 09:16 +0200, Marc Sturm wrote: > > Hi Eric, > > > > i agree that less custom encoding is desirable to make files > > exchangable between different tools. However in this case i would go > > both ways, depending on the significance of the term. > > > > (1) > > I use these arrays to store debug information of an algorithm and add > > quite a few arrays, depending on the charge states i look at: > > > > pattern_score_1 > > pattern_score_2 > > pattern_score_... (one per charge) > > intensity_score > > local_maximum > > trace_score > > > > Putting the name in the userParam is not a good idea because it makes > > these arrays unusable for other tools - too custom in my opinion. > > Adding a CV term for each such debug variable would however be too > much. > > So i think the intermediate way is just right: > > Terms which are not generally usable should be put to a 'named custom > > array'. This would correpond to an optional XML attribute 'name' for > > the 'binaryDataArray' tag. > > But we have to state clearly in the documentation that for more > > general terms, a CV entry should be added. > > > > (2) > > After peak picking we store much more information than the position > > and intensity. The arrays there are: > > > > SignalToNoise > > fwhm > > leftWidth > > rightWidth > > maximumIntensity > > peakShape > > rValue > > > > 'SignalToNoise' is alread a CV term. 'fwhm' would be a good candidate > > for a CV term as well. > > The rest is more algorihtm-dependent and no general concept which is > > why we could simply store them in a 'named custom array'. > > > > What do you think? > > > > Best, > > Marc > > > > > Hi Marc, I think we would be better off creating CV terms for all > > > the kinds of arrays people want to encode. So I'm much rather get a > > > request that someone's software wants to write out "full width at > half maximum" > > > and create a term, furnish an accession number, and thereby > publicly > > > let all writer and reader authors know that this is a legal entity > > > that could occur. No schema change is necessary. > > > > > > I find this preferable to having a vague slot that could be filled > > > with > > > > > > full width at half maximum > > > full width at half max > > > FWHM > > > > > > in an uncontrolled and variable way. > > > > > > This is our general aim for mzML. We would like to steer away from > > > custom ways of encoding data as much as possible. > > > > > > Does that seem reasonable? > > > > > > Would you like "full width at half maximum" to be added to the CV? > > > > > > > > --------------------------------------------------------------------- > - > > --- Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for just about anything > > Open Source. > > http://sourceforge.net/services/buy/index.php > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- > IPB Halle AG Massenspektrometrie & Bioinformatik > Dr. Steffen Neumann http://www.IPB-Halle.DE > Weinberg 3 http://msbi.bic-gh.de > 06120 Halle Tel. +49 (0) 345 5582 - 1470 > +49 (0) 345 5582 - 0 > sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
From: Angel P. <an...@ma...> - 2008-07-07 12:53:30
|
awesome. Is the parser pwiz based or your own code? -angel On Fri, Jul 4, 2008 at 5:13 AM, Marc Sturm < st...@in...> wrote: > Hi all, > > attached is a list of TOPP tools that will be able to process mzML (as soon > as we have a stable implementation). > I guess a "TOPP software" subsection in the "software" section would be > best. > > Thanks in advance to whoever adds our tools to the CV. > > Best, > Marc > > TOPP software -- TOPP (The OpenMS proteomics pipeline) software for mass > spectrometry. > |- BaselineFilter -- Removes the baseline from profile spectra using a > top-hat filter. > |- DBExporter -- Exports data from an OpenMS database to a file. > |- DBImporter -- Imports data to an OpenMS database. > |- FileConverter -- Converts between different MS file formats. > |- FileFilter -- Extracts or manipulates portions of data from peak, > feature or consensus feature files. > |- FileMerger -- Merges several MS files into one file. > |- InternalCalibration -- Applies an internal calibration. > |- MapAligner -- Corrects retention time distortions between maps. > |- MapNormalizer -- Normalizes peak intensities in an MS run. > |- NoiseFilter -- Removes noise from profile spectra by using different > smoothing techniques. > |- PeakPicker -- Finds mass spectrometric peaks in profile mass spectra. > |- Resampler -- Transforms an LC/MS map into a resampled map or a png > image. > |- SpectraFilter -- Applies a filter to peak spectra. > |- TOFCalibration -- Applies time of flight calibration. > > > > > |
From: Marc S. <st...@in...> - 2008-07-07 13:34:22
|
Hi Angel, we will have our own parser for two reasons 1) to avoid the overhead of converting the pwiz data structures to our data structures 2) to avoid another dependency (we already have a bunch) Best, Marc Angel Pizarro wrote: > Is the parser pwiz based or your own code? -angel |