From: Nico P. <npf...@in...> - 2009-01-22 10:16:30
|
Hello Mathias, thanks for your interest in this work. > I'm having trouble understanding how to use the OpenMS tools for > retention time prediction. > > Let's picture I have 2 LCMS repeat experiments, same organism, with both > MS and MS/MS info in the experiments. > My aim is to compare the predicted RT's with the RT corresponding to the > MS/MS peptides. > > After submitting the experiment files (mzXML) to Mascot, I get the > corresponding IdXML files. > I use the first file as input-file to generate the SVM model, and use > the second IdXML file to predict the peptide-RT's against. > I use IDFilter to obtain an IdXML file that will be used as training set. > I'm using the OLIGO kernel type (maybe it is better to use the POLY > kernel?). > This is exactly how it should work. You should use OLIGO. If you expect a shift in retention time between the two experiments, you should use the MapAligner to your data before submitting it to Mascot. > How do I know what parameters are suited to IDFilter and RTModel to get > a good training set? Is it right that the OpenMS TOPP tools for > retention time prediction give NET values? If you want to use NET values then you have to specify the total time of the gradient via the parameter total_gradient_time in RTModel and RTPredict. We extended the functionality here such that you do not still need normalized elution times (in the latest development version and in the upcoming release). This means that if you let total_gradient_time at value 1, the predicted retention times will be in the range of your measured retention times. For getting a high confidence data set for training you can use the -pep_fraction parameter if Mascot is your ID engine. If this parameter is set to 1 the peptide identifications will be filtered according to the significance threshold score of Mascot (all identifications with smaller score will be filtered out). If you want to allow peptide identifications with a score which is 80% of the significance threshold value or bigger you can set the parameter to 0.8 and so on (have a look at Fig. 6 of Pfeifer et al. 2007 for an application of this). For high confidence training sets you should set pep_fraction to 1. Another general possibility is to use the FalseDiscoveryRate tool. Therefore you have to generate two IdXML files for your training set. The first one is the file you already have and the second one should be constructed by searching with the same parameters against a decoy database. Then you can use FalseDiscoveryRate to estimate false discovery rates or q values: FalseDiscoveryRate -fwd_in identifications_to_standard_database.IdXML -rev_in identifications_to_decoy_database.IdXML -out identifications_with_significance_measure.IdXML -peptides_only -q_values In the output file all scores are replaced by the q_values/FDRs. The original Mascot score is stored as MetaInfo. This means that you can then directly filter for q_values/FDRs by using IDFilter with the pep_score option. Since IdXML has a parameter to store whether a higher score is better or a lower score, the IDFilter with e.g. -pep_score 0.01 will filter out all identifications with a higher q_value/FDR than 0.01 (1%). I would prefer q values because they are directly suitable as filter threshold. For a high confidence training set, you should not set the pep_score higher than 0.10. How big was your training set? Depending on the quality of your RTs maybe you have too little training data. > I also tried plotting MS/MS RT > against the predicted RT of the peptides of its own LCMS experiment. I > get a similar fuzzy cloud with even points far away from the diagonal. > > Can you explain me what I am doing wrong? > I suppose some parameters in the RTModel.ini file do not fit well to your data. If you send me your RTModel ini file I can have a look at this. The C parameter for the CV should be in the range of the maximal RT (0.001, 0.01, 0.1, 1 if you use NET and 1, 10, 100, 1000 if you use total_gradient_time=1). Sigma should be probed between 2 and 12. and nu should be between 0.3 and 0.7. If this does not help you could also send me your IdXml file and I could have a look at this. Best regards, Nico |