From: John S. G. <joh...@eb...> - 2010-03-09 16:03:50
|
Robert Winkler wrote: > I am working on protein derivatization chemistry and their subsequent LC-MS/MS characterization, and would like to integrate those methods in existing proteomics workflows, i.e. PRIDE (Protein MODs) XML, PRIDE converter and OMSSA. > > Clearly, for the informatics processing we need the molecular weight; which is no problem for the mono-isotopic weight, but less trivial for the average mass. > > These are the values as calculated by the Molecular Weight Calculator: > > C14H13N3O2S (Dabsyl Modification) > Monoisotopic: 287.0728438 > Average: 287.33804 > > C22H16N2O7S2 (Uniblue A Modification) > Monoisotopic: 484.0398906 > Average: 484.50372 > > However, different other programs give slightly different results for the average mass, e.g. for Uniblue A: > > Chemsketch (ACDlabs): av. 484.5016 > Online Calculator (Webqc.org): av. 484.5048 > > 1) So, which average weights should we take? For practical applications it might not matter, but anyway the information should be consistent. > > 2) Would it be possible to define the chemical modifications as chemical formula, e.g. delta C14H13N3O2S ? So, changed IUPAC values could be incorporated easier. > > 3) Would it be possible for PRIDE to define the chemical modification only, and to give the amino acid as an option? > The modifications in PSI-MOD are based on the formulas, the masses are recalculated every release using the latest atomic weight data assembled in the file AtomTabl.XML available with the PSI-MOD files at http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/mod/data/ The most current IUPAC report for chemical average atomic weights is 2007. The three sources for atomic mass, isotope mass and isotope ratios are cited in the header of AtomTabl.XML and the urls for those sources are being added to the next version. With most average molecular weight calculations, the numbers are "hard-wired" in the software and you have no idea what "era" the masses belong to. The element sulfur, in both the reagents you mentioned, is particularly bad in this respect since the average mass in defined to only three decimal places and has changed several times in the last twenty years. Several of the elements that are encountered in protein modifications (chlorine, selenium, and molybdenum in particular) vary so much in their natural isotopic abundance that the IUPAC can define their average mass with a precision of only two decimal places. Keeping more than two decimal places of precision for average masses is computationally meaningless. It is also important to note that in PSI-MOD a mass correction is made in the monoisotopic masses for intrinsic charge, a correction that is usually neglected, and this results in some differences with other sources for the monoisotopic masses. In PSI-MOD, many, but not all, of the modifications are defined for each source amino acid, with parent nodes for the modification regardless of amino acid identity. It should be possible to use those parent nodes in your PRIDE submissions. If you notice an apparently missing nodes in the ontology for a protein modification, submit a request, providing, if possible, a literature citation for the chemical modification. -- John S. Garavelli EMBL Outstation, European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambs CB10 1SD joh...@eb... (044/0)-1223-492529 |