Re: [Open-ms-general] AASequence::parseString_: get n-term modifications based on their mass

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Petra,
I implemented the mass tag at N terminus in my branch
https://github.com/timosachsenberg/OpenMS/tree/feature/AASequence_refactoringwhich
contains some additional refactoring of the AASequence class.
I will try to get it merged into the main development branch soon so it is
easier accessible to other developers.
Cheers,
Timo

2014-04-08 17:57 GMT+02:00 Petra Gutenbrunner <pg...@sa...>:

> Hello OpenMS Team,
>
> I'm working with the TOPPAS Version 1.12.0 and have a question about
> AASequence.
>
> Is there a possibility to get the n-term modification based on the mass
> using AASequence::parseString_ with a peptide string like
> "[+42.011]MKKFIILLSLLILLPLTAASKPLIPIMK"?
> Because it works good for a peptide string like
> "ESN[+0.984]Q[+0.984]RWCSDGFEFCCDNGER", but not with a peptide that
> includes a n-term modification.
>
> I need that function for the MSGF+ Adapter (downloaded the Adapter from
> https://github.com/mwalzer/OpenMS/tree/MSGF+-Adapter) that parses a .tsv
> to a .idXML file.
>
> Here is a brief description of the code:
> The MSGF+ Adapter uses the function java -cp MSGFPlus.jar
> edu.ucsd.msjava.ui.MzIDToTsv to convert the output type .mzid to .tsv.
> After that, this file will be converted to .idXML.
> This works fine, except you use n-term modifications as search parameter
> then the conversion to .idXML will fail.
>
> The reason for that is:
>
> TSV-File:
> #SpecFile       SpecID  ScanNum FragMethod      Precursor
> IsotopeError    PrecursorError(ppm)     Charge  Peptide Protein DeNovoScore
>     MSGFScore       SpecEValue      EValue
> E3_CID1.mgf.mzML        scan=7271       7271    CID     885.1037        0
>       0.0689582      3       K.HSQVFSTAEDNQSAVTIHVLQGER.K
>  sp|P0A6Y8|DNAK_ECOLI    196     187     3.0058742E-27   9.914368E-21
> E3_CID1.mgf.mzML        scan=7271       7271    CID     885.1037        0
>       0.0689582       3       K.HSQVFSTAEDNQSAVTIHVLQGER.K
>  tr|U6N2B8|U6N2B8_ECOLI  196     187     3.0058742E-27   9.914368E-21
> E3_CID1.mgf.mzML        scan=355        355     CID     795.2796        0
>       -16.960752      3       K.ESN+0.984Q+0.984RWCSDGFEFCCDNGER.L
>  tr|C4ZVW6|C4ZVW6_ECOBW  128     -37     0.034291964     109461.836
> E3_CID1.mgf.mzML        scan=1436       1436    CID     1054.35 0
> 12.8515005      3       -.+42.011MKKFIILLSLLILLPLTAASKPLIPIMK.T
> sp|P75961|YCFZ_ECOLI    148     -27     0.082810305     281889.78
> E3_CID1.mgf.mzML        scan=1436       1436    CID     1054.35 0
> 12.8515005      3       -.+42.011MKKFIILLSLLILLPLTAASKPLIPIMK.T
> tr|U6N512|U6N512_ECOLI  148     -27     0.082810305     281889.78
>
> Code example:
>
> for (CsvFile::Iterator it = tsvfile.begin() + 1 ; it != tsvfile.end();
> ++it)
> {
>   vector<String> elements;
>   it->split("\t", elements);
>   ...
>   sequence =
> AASequence(modifySequence(changeKomma(cutSequence(elements[8]))));
> //sequence must be cutted and modified
>   ...
>   PeptideHit p_hit(score, rank, charge, sequence);
>   precursor_mz = elements[4].toDouble();
>   peptide_hits[scanNumber].insertHit(p_hit);
>   peptide_hits[scanNumber].setMetaValue("MZ", precursor_mz);
>   ...
> }
>
> The column peptide from .tsv should be converted as object AASequence, to
> create the object PeptideHit:
> But before you can create a AASequence object the peptide value must be
> converted to the right structure, otherwise the peptide couldn't be parsed
> with AASequence::parseString_:
>
> K.ESN+0.984Q+0.984RWCSDGFEFCCDNGER.L    ->
>  ESN[+0.984]Q[+0.984]RWCSDGFEFCCDNGER    (non n-term)
> -.+42.011MKKFIILLSLLILLPLTAASKPLIPIMK.T ->
>  [+42.011]MKKFIILLSLLILLPLTAASKPLIPIMK   n-term
>
> After executing sequence = AASequence(...); the sequence should look like:
> ESN[+0.984]Q[+0.984]RWCSDGFEFCCDNGER    ->
>  ESN(Deamidated)Q(Deamidated)RWCSDGFEFCCDNGER
> [+42.011]MKKFIILLSLLILLPLTAASKPLIPIMK   ->
>  (Acetyl)MKKFIILLSLLILLPLTAASKPLIPIMK
>
> I will get the output of the first row of the example above, because the
> modification is in the middle of the peptide. It works because of
> AASequence::parseString_ in line 999: It returns the modification based on
> the one letter code and the delta mass.
> if (tag.hasPrefix("+") || tag.hasPrefix("-"))
> {
>   // delta mass
>   double delta_mass = tag.toDouble();
>   const Residue* result = NULL;
>
>   if (tag.hasSubstring("."))
>   {
>         // we have a float, look for an exact match
>         const ResidueModification * mod =
> ModificationsDB::getInstance()->getBestModificationsByDiffMonoMass(res_ptr->getOneLetterCode(),
> delta_mass, 1.0);
>         if (mod != NULL)
>         {
>           result = ResidueDB::getInstance()->getModifiedResidue(res_ptr,
> mod->getId());
>         }
>         ...
>   }
>   ...
> }
>
> But if it is an n-term modification, I got the error: "Error: Unexpected
> internal error (the element '+42.011' could not be found)".
>
> This error is because of AASequence::parseString_ in line 838: it checks
> if the first part of the peptide is a n-term modification, but in only
> works with modification names and not with masses.
>
> #if (!split.empty() && !split[0].empty() && split[0][0] == '(')
> if (!split.empty() && !split[0].empty() && (split[0][0] == '(' ||
> split[0][0] == '[')) // NEW: otherwise N-Terms with the format [+42.011]
> will be ignored
> {
>   String mod = split[0]; // mod = '[+42.011]'
>   mod.trim();
>   mod.erase(mod.begin());
>   mod.erase(mod.end() - 1); // mod = +42.011
>   n_term_mod_ =
> &ModificationsDB::getInstance()->getTerminalModification(mod,
> ResidueModification::N_TERM); // Error caused by this line
>
>   split.erase(split.begin());
> }
>
> getTerminalModification calls the method searchTerminalModifications(mods,
> mod_name, term_spec); that includes the code
> if (!modification_names_.has(name))
> {
>   throw Exception::ElementNotFound(__FILE__, __LINE__,
> __PRETTY_FUNCTION__, name);
> }
> Now the modification with the name "+42.011" does not exist, and this
> causes the error.
>
> If there is no other solution I could suggest following:
> - make a new overloaded constructor for AASequence included a list of
> possible modifications as parameter.
> - extend the check if it is a n_term_mod with another check if it is a
> number, and when it is so, call a new method that gives you back the
> modification that fits best based on the list of possible modifications and
> the given mass.
>
> Thanks for your support and kind regards,
> Petra
>
>
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Open-ms-general mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/open-ms-general
>