From: Paul R. <rud...@ni...> - 2009-07-01 15:04:52
|
Matt and other Fearless PSI volunteers - If you think you may be able to squeeze spectral libraries into an evolving format, I'd like to help out. Here are a couple of items that might be useful: 1) Spectral libraries are typically one of two types: 'replicate' or 'consensus.' The latter indicating that multiple spectra identifying the same peptide ion have been used to generate the final spectrum. This also means that the stats in the annotation may have additional variance values. 2) The first section of metadata is evidence supporting that spectrum. This includes things like search engine scores, sample sources and quality metrics -- e.g., fraction of unexplained abundance, similarity of replicates, etc. Many of these are internal to us and not used by all library generators, however, some might well be required. Here's that section from the first spectrum in the human library (ion trap), Name: AAAAAAAAAAAAAAAGAGAGAK/1 MW: 1596.846 Comment: Spec=Consensus Pep=Tryptic Fullname=R.AAAAAAAAAAAAAAAGAGAGAK.Q/1 Mods=0 Parent=1596.846 Inst=it Mz_diff=-0.066 Mz_exact=1596.8457 Mz_av=1597.771 Protein="IPI00220844.1|SWISS-PROT:P55011-3|ENSEMBL:ENSP00000340878 Tax_Id=9606 Splice Isoform 2 of Solute carrier family 12 member 2" Pseq=36 Organism="human" Se=3^X2:ex=2.25065e-009/2.249e-009,td=6.9122e+010/6.888e+010,sd=0/0,hs=59.9/4,bs=1.3e-012,b2=4.5e-009,bd=1.38e+011^O2:ex=0.00144675/0.001383,td=402850/3.852e+005,pr=4.235e-009/4.035e-009,bs=6.35e-005,b2=0.00283,bd=788000^P2:sc=27.5/0.8,dc=18/0.4,ps=2.015/0.125,bs=0 Sample=1/human_ncrr_hprob_cam,2,2 Nreps=2/2 Missing=0.2229/0.0094 Parent_med=1596.78/0.16 Max2med_orig=29.6/5.3 Dotfull=0.728/0.000 Dot_cons=0.870/0.005 Unassign_all=0.171 Unassigned=0.000 Dotbest=0.88 Flags=0,0,0 Naa=22 DUScorr=0.85/3.3/15 Dottheory=0.83 Pfin=1.5e+013 Probcorr=1 Tfratio=1.6e+008 Pfract=0 Num peaks: 109 3) The business end - the spectrum - is a basepeak normalized, annotated peaklist. 452.9 847 "? 2/2 14.3" 462.8 535 "b7-35/-0.46 2/2 5.5" 480.3 620 "b7-18/0.04 2/2 10.4" 498.2 2165 "b7/-0.06 2/2 9.8" 524.0 1397 "? 2/2 8.8" 527.0 381 "Int/AAAAGAGA/0.2 2/2 2.1" 531.1 1140 "y7/-0.19 2/2 7.9" 537.2 675 "? 2/2 8.9" 551.2 910 "b8-18/-0.10 2/2 4.7" 569.1 4712 "b8/-0.20 2/2 25.4" 581.2 289 "? 2/2 3.1" 593.7 249 "? 2/1 2.6" 595.1 1182 "? 2/2 7.3" 602.2 2209 "y8/-0.13 2/2 21.5" 608.6 335 "? 2/2 3.9" 622.2 1625 "b9-18/-0.13 2/2 6.4" 626.2 767 "Int/AAAAAAAAG/0.1 2/2 1.8" [..] This is MSP format (also used for small molecules), and it is simple ASCII. Additional "formats" are usually borne for the purposes of searching (e.g, indexing). MSP is easy to work with but it is not a standard -- a common, cross-domain representation would be better :) Paul Matthew Chambers wrote: > I brought this issue up in the call because I think that the "evidence" > tag in traML is too weak to go in a standard - I was proposing that a > better way to do it might be to refer to an mzIdentML file for a more > complete context of the evidence. > > I know very little about metabolomics, but the spectra libraries we're > talking about are indeed proteomics-oriented. But I don't think you need > to shut up - indeed, I think this may be another case where we can and > should standardize within and even across domains because the > annotations of a peptide's spectrum could be even more rich and detailed > than that of a smaller molecule. In the spectral library domain just for > peptides, there are at least 5 common formats already > (http://peptide.nist.gov): MSP/ASCII, a multi-file NIST binary format, > SpectraST, BiblioSpec, and X!Hunter. I'm sure you have a few more in the > metabolomics domain. A lack of standards makes life so much more > interesting, don't you think? ;) > > -Matt > > > Steffen Neumann wrote: > >> On Tue, 2009-06-30 at 08:32 -0700, Eric Deutsch wrote: >> >> >>> +Can mzIdentML encode a spectral library? >>> >>> >> I am unsure whether this was discussed during the conference call, >> or is left as an open point to the list. >> >> Anyway, I am inclined to say "no" about this, >> for two reasons: >> >> 1) I don't know enough about analys^H^H^H^H^mzIdentML, >> because my very brief looks made it look like proteomics-only. >> (Or are you actually referring to a proteomics-spectral library?! >> in that case I'll shut up and you can skip the rest of this mail.) >> >> 2) A spectral library will (at least in the future) >> contain sets of spectra (different eV, MS1-MS^n, ...) >> and associated annotations, which might be as complicated >> as a molecule and its fragmentation brake-down products. >> This requires a rich set of links between individual peaks >> and their (molecular) annotation. >> >> So for small molecules (read: metabolomics stuff) we have started >> to create mzAnnotate under the umbrella of the Metabolomics Standards >> Initiative (MSI). http://msi-workgroups.sourceforge.net/exchange-format/ >> >> We have drafted some use cases, shown on >> http://sourceforge.net/apps/mediawiki/metware/index.php?title=MzAnnotate >> and prepared a converter for both the spectral library www.massbank.jp >> and our own MassFrontier clone MetFrag, and will present these >> on the MSI mailing list soon. >> >> Yours, >> Steffen >> >> >> > > ------------------------------------------------------------------------------ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |