Re: [Psidev-mod-vocab] Average Molecular Weight of Chemical Modifications in Proteomics Workflows

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Robert Winkler wrote:
> I am working on protein derivatization chemistry and their subsequent LC-MS/MS characterization, and would like to integrate those methods in existing proteomics workflows, i.e. PRIDE (Protein MODs) XML, PRIDE converter and OMSSA.
>  
> Clearly, for the informatics processing we need the molecular weight; which is no problem for the mono-isotopic weight, but less trivial for the average mass.
>
> These are the values as calculated by the Molecular Weight Calculator:
>
> C14H13N3O2S (Dabsyl Modification)
> Monoisotopic: 287.0728438
> Average: 287.33804
>
> C22H16N2O7S2 (Uniblue A Modification)
> Monoisotopic: 484.0398906
> Average: 484.50372
>
> However, different other programs give slightly different results for the average mass, e.g. for Uniblue A:
>
> Chemsketch (ACDlabs): av. 484.5016
> Online Calculator (Webqc.org): av. 484.5048
>  
> 1)      So, which average weights should we take? For practical applications it might not matter, but anyway the information should be consistent.
>
> 2)      Would it be possible to define the chemical modifications as chemical formula, e.g. delta C14H13N3O2S ? So, changed IUPAC values could be incorporated easier.
>
> 3)      Would it be possible for PRIDE to define the chemical modification only, and to give the amino acid as an option? 
>   
The modifications in PSI-MOD are based on the formulas, the masses are 
recalculated
every release using the latest atomic weight data assembled in the file 
AtomTabl.XML
available with the PSI-MOD files at
http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/mod/data/
The most current IUPAC report for chemical average atomic weights is 
2007.  The
three sources for atomic mass, isotope mass and isotope ratios are cited 
in the header
of AtomTabl.XML and the urls for those sources are being added to the 
next version.
With most average molecular weight calculations, the numbers are 
"hard-wired" in the
software and you have no idea what "era" the masses belong to.  The 
element sulfur,
in both the reagents you mentioned, is particularly bad in this respect 
since the average
mass in defined to only three decimal places and has changed several 
times in the last
twenty years.  Several of the elements that are encountered in protein 
modifications
(chlorine, selenium, and molybdenum in particular) vary so much in their 
natural
isotopic abundance that the IUPAC can define their average mass with a 
precision
of only two decimal places.  Keeping more than two decimal places of 
precision for
average masses is computationally meaningless.

It is also important to note that in PSI-MOD a mass correction is made 
in the
monoisotopic masses for intrinsic charge, a correction that is usually 
neglected,
and this results in some differences with other sources for the monoisotopic
masses.

In PSI-MOD, many, but not all, of the modifications are defined for each 
source
amino acid, with parent nodes for the modification regardless of amino acid
identity.  It should be possible to use those parent nodes in your PRIDE 
submissions.
If you notice an apparently missing nodes in the ontology for a protein 
modification,
submit a request, providing, if possible, a literature citation for the 
chemical
modification.

-- 
                         John S. Garavelli
                         EMBL Outstation, European
                            Bioinformatics Institute
                         Wellcome Trust Genome Campus
                         Hinxton, Cambs  CB10 1SD
                         joh...@eb...
                         (044/0)-1223-492529