[Psidev-pi-dev] Issue 42 in psi-pi: Issues with the CV

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Comment #42 on issue 42 by matthew....@vanderbilt.edu: Issues with the CV
http://code.google.com/p/psi-pi/issues/detail?id=42

Hi David, sorry for the long reply to follow...

RE: spectrumID reference
I'm aware of the decision to use mzML's id as the spectrumID but I'm  
bringing the
point back up because the issue of non-mzML inputs was not discussed at the  
time
(AFAIK). I do not see the justification for using the id instead of the  
nativeID when
the latter must always exist for any input format whereas the former only  
makes sense
from an actual mzML file.

RE: MGF ids
Having CV terms for various format attributes is not a terrible thing, but  
I worry
because the scope is potentially much bigger than MGF->DAT->analysisXML.  
All of the
non-mzML input formats that could potentially be used to generate an  
intermediate
search result format and then converted to analysisXML will more often than  
not have
this problem. Trying to account for the various transformations of the  
identifiers
that could happen from this translation seems like a lost cause to me. The  
exception
would be very specific pipelines where the inputs and outputs are tightly  
controlled
and in those cases, userParams seem more appropriate than cvParams. Even in  
the case
of MGF->DAT->analysisXML, some of your MGF inputs may be completely lacking  
in title,
rt, and scan attributes, because they're all optional, so without an index  
it's all
screwed! :(

Just think of the combinations:
modern vendor formats: Thermo RAW, Waters RAW, WIFF, YEP, BAF, FID,  
MassHunter, Shimadzu
open formats: mzML, mzXML, mzData, MGF, DTA, MS[12], PKL,
search result formats: pepXML, SQT, OUT, SRF, DAT, X! Tandem

As I understand it, your specific use case is: take existing DAT files that  
were
searched from MGFs with (unique?) title/RT/scan attributes and convert to  
analysisXML
in a way that a generic reader can directly go back to the MGF data.

The generic version of that use case is: take existing search results in  
any format
that were searched from any spectra format and convert to analysisXML in a  
way that a
generic reader can directly go back to the data in the input spectra format.

Supporting the specific use case and not the generic one makes me cringe a  
bit, which
is why I chimed in on the issue. Can't users just re-search their data and  
output
directly to analysisXML with the index attribute intact? :P

-- 
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings