From: Timo S. <sac...@gm...> - 2014-04-09 09:40:08
|
Hi Petra, I implemented the mass tag at N terminus in my branch https://github.com/timosachsenberg/OpenMS/tree/feature/AASequence_refactoringwhich contains some additional refactoring of the AASequence class. I will try to get it merged into the main development branch soon so it is easier accessible to other developers. Cheers, Timo 2014-04-08 17:57 GMT+02:00 Petra Gutenbrunner <pg...@sa...>: > Hello OpenMS Team, > > I'm working with the TOPPAS Version 1.12.0 and have a question about > AASequence. > > Is there a possibility to get the n-term modification based on the mass > using AASequence::parseString_ with a peptide string like > "[+42.011]MKKFIILLSLLILLPLTAASKPLIPIMK"? > Because it works good for a peptide string like > "ESN[+0.984]Q[+0.984]RWCSDGFEFCCDNGER", but not with a peptide that > includes a n-term modification. > > I need that function for the MSGF+ Adapter (downloaded the Adapter from > https://github.com/mwalzer/OpenMS/tree/MSGF+-Adapter) that parses a .tsv > to a .idXML file. > > Here is a brief description of the code: > The MSGF+ Adapter uses the function java -cp MSGFPlus.jar > edu.ucsd.msjava.ui.MzIDToTsv to convert the output type .mzid to .tsv. > After that, this file will be converted to .idXML. > This works fine, except you use n-term modifications as search parameter > then the conversion to .idXML will fail. > > The reason for that is: > > TSV-File: > #SpecFile SpecID ScanNum FragMethod Precursor > IsotopeError PrecursorError(ppm) Charge Peptide Protein DeNovoScore > MSGFScore SpecEValue EValue > E3_CID1.mgf.mzML scan=7271 7271 CID 885.1037 0 > 0.0689582 3 K.HSQVFSTAEDNQSAVTIHVLQGER.K > sp|P0A6Y8|DNAK_ECOLI 196 187 3.0058742E-27 9.914368E-21 > E3_CID1.mgf.mzML scan=7271 7271 CID 885.1037 0 > 0.0689582 3 K.HSQVFSTAEDNQSAVTIHVLQGER.K > tr|U6N2B8|U6N2B8_ECOLI 196 187 3.0058742E-27 9.914368E-21 > E3_CID1.mgf.mzML scan=355 355 CID 795.2796 0 > -16.960752 3 K.ESN+0.984Q+0.984RWCSDGFEFCCDNGER.L > tr|C4ZVW6|C4ZVW6_ECOBW 128 -37 0.034291964 109461.836 > E3_CID1.mgf.mzML scan=1436 1436 CID 1054.35 0 > 12.8515005 3 -.+42.011MKKFIILLSLLILLPLTAASKPLIPIMK.T > sp|P75961|YCFZ_ECOLI 148 -27 0.082810305 281889.78 > E3_CID1.mgf.mzML scan=1436 1436 CID 1054.35 0 > 12.8515005 3 -.+42.011MKKFIILLSLLILLPLTAASKPLIPIMK.T > tr|U6N512|U6N512_ECOLI 148 -27 0.082810305 281889.78 > > Code example: > > for (CsvFile::Iterator it = tsvfile.begin() + 1 ; it != tsvfile.end(); > ++it) > { > vector<String> elements; > it->split("\t", elements); > ... > sequence = > AASequence(modifySequence(changeKomma(cutSequence(elements[8])))); > //sequence must be cutted and modified > ... > PeptideHit p_hit(score, rank, charge, sequence); > precursor_mz = elements[4].toDouble(); > peptide_hits[scanNumber].insertHit(p_hit); > peptide_hits[scanNumber].setMetaValue("MZ", precursor_mz); > ... > } > > The column peptide from .tsv should be converted as object AASequence, to > create the object PeptideHit: > But before you can create a AASequence object the peptide value must be > converted to the right structure, otherwise the peptide couldn't be parsed > with AASequence::parseString_: > > K.ESN+0.984Q+0.984RWCSDGFEFCCDNGER.L -> > ESN[+0.984]Q[+0.984]RWCSDGFEFCCDNGER (non n-term) > -.+42.011MKKFIILLSLLILLPLTAASKPLIPIMK.T -> > [+42.011]MKKFIILLSLLILLPLTAASKPLIPIMK n-term > > After executing sequence = AASequence(...); the sequence should look like: > ESN[+0.984]Q[+0.984]RWCSDGFEFCCDNGER -> > ESN(Deamidated)Q(Deamidated)RWCSDGFEFCCDNGER > [+42.011]MKKFIILLSLLILLPLTAASKPLIPIMK -> > (Acetyl)MKKFIILLSLLILLPLTAASKPLIPIMK > > I will get the output of the first row of the example above, because the > modification is in the middle of the peptide. It works because of > AASequence::parseString_ in line 999: It returns the modification based on > the one letter code and the delta mass. > if (tag.hasPrefix("+") || tag.hasPrefix("-")) > { > // delta mass > double delta_mass = tag.toDouble(); > const Residue* result = NULL; > > if (tag.hasSubstring(".")) > { > // we have a float, look for an exact match > const ResidueModification * mod = > ModificationsDB::getInstance()->getBestModificationsByDiffMonoMass(res_ptr->getOneLetterCode(), > delta_mass, 1.0); > if (mod != NULL) > { > result = ResidueDB::getInstance()->getModifiedResidue(res_ptr, > mod->getId()); > } > ... > } > ... > } > > But if it is an n-term modification, I got the error: "Error: Unexpected > internal error (the element '+42.011' could not be found)". > > This error is because of AASequence::parseString_ in line 838: it checks > if the first part of the peptide is a n-term modification, but in only > works with modification names and not with masses. > > #if (!split.empty() && !split[0].empty() && split[0][0] == '(') > if (!split.empty() && !split[0].empty() && (split[0][0] == '(' || > split[0][0] == '[')) // NEW: otherwise N-Terms with the format [+42.011] > will be ignored > { > String mod = split[0]; // mod = '[+42.011]' > mod.trim(); > mod.erase(mod.begin()); > mod.erase(mod.end() - 1); // mod = +42.011 > n_term_mod_ = > &ModificationsDB::getInstance()->getTerminalModification(mod, > ResidueModification::N_TERM); // Error caused by this line > > split.erase(split.begin()); > } > > getTerminalModification calls the method searchTerminalModifications(mods, > mod_name, term_spec); that includes the code > if (!modification_names_.has(name)) > { > throw Exception::ElementNotFound(__FILE__, __LINE__, > __PRETTY_FUNCTION__, name); > } > Now the modification with the name "+42.011" does not exist, and this > causes the error. > > If there is no other solution I could suggest following: > - make a new overloaded constructor for AASequence included a list of > possible modifications as parameter. > - extend the check if it is a n_term_mod with another check if it is a > number, and when it is so, call a new method that gives you back the > modification that fits best based on the list of possible modifications and > the given mass. > > Thanks for your support and kind regards, > Petra > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees > _______________________________________________ > Open-ms-general mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-ms-general > |