The tide-search command fails to parse the attached MGF file, irrespective of whether I use the pwiz or mstoolkit parser. In one case, it complains that it can't identify the format; in the other it complains about the line specifying the charge state.
Here is a description of MGF:
http://www.matrixscience.com/help/data_file_help.html
I can't see any problem with the file I'm using.
bash-4.1$ rm -fr crux-output; /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux tide-search small.mgf ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
INFO: Creating index from '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
INFO: Running tide-index...
INFO: Writing results to output directory 'crux-output/tide-search.tempindex'.
INFO: Reading /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta and computing unmodified peptides...
INFO: Writing decoy fasta...
INFO: Reading proteins
INFO: Precomputing theoretical spectra...
INFO: Beginning tide-search.
WARNING: The output directory 'crux-output' already exists.
Existing files will not be overwritten.
INFO: CPU: pyrrolysine.gs.washington.edu
INFO: Wed Jun 22 12:09:42 PDT 2016
INFO: Running tide-search...
INFO: Reading index crux-output/tide-search.tempindex
INFO: Converting small.mgf to spectrumrecords format
ERROR: [SpectrumList_MGF::parseSpectrum] Error parsing line at offset 122: CHARGE=3+
FATAL: Error converting small.mgf to spectrumrecords format
bash-4.1$ rm -fr crux-output ; /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux tide-search --spectrum-parser mstoolkit small.mgf ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
INFO: Creating index from '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
INFO: Running tide-index...
INFO: Writing results to output directory 'crux-output/tide-search.tempindex'.
INFO: Reading /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta and computing unmodified peptides...
INFO: Writing decoy fasta...
INFO: Reading proteins
INFO: Precomputing theoretical spectra...
INFO: Beginning tide-search.
WARNING: The output directory 'crux-output' already exists.
Existing files will not be overwritten.
INFO: CPU: pyrrolysine.gs.washington.edu
INFO: Wed Jun 22 12:10:10 PDT 2016
INFO: Running tide-search...
INFO: Reading index crux-output/tide-search.tempindex
INFO: Converting small.mgf to spectrumrecords format
Unknown file format
FATAL: MSToolkit: Error reading spectra file: /net/noble/vol4/noble/user/noble/proj/crux-projects/2016gaussian/results/bill/2016-06-22segfault/small.mgf
Important update: for pwiz, the problem only shows up if I compile in Debug mode. If I use Release mode and the pwiz parser, the search completes successfully.
bash-4.1$ rm -fr crux-output ; /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Release/src/crux tide-search small.mgf ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
INFO: Creating index from '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
INFO: Running tide-index...
INFO: Writing results to output directory 'crux-output/tide-search.tempindex'.
INFO: Reading /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta and computing unmodified peptides...
INFO: Writing decoy fasta...
INFO: Reading proteins
INFO: Precomputing theoretical spectra...
INFO: Beginning tide-search.
WARNING: The output directory 'crux-output' already exists.
Existing files will not be overwritten.
INFO: CPU: pyrrolysine.gs.washington.edu
INFO: Wed Jun 22 12:15:18 PDT 2016
INFO: Running tide-search...
INFO: Reading index crux-output/tide-search.tempindex
INFO: Converting small.mgf to spectrumrecords format
INFO: Parser could not determine scan numbers for this file, using ordinal numbers as scan numbers.
INFO: Reading spectra file crux-output/small.mgf.spectrumrecords.tmp
INFO: Sorting spectra
INFO: Running search
INFO: Time per spectrum-charge combination: 0.001960 s.
INFO: Average number of candidates per spectrum-charge combination: 0.965116
INFO: Elapsed time: 0.414 s
INFO: Finished crux tide-search.
INFO: Return Code:0
bash-4.1$
It looks like another boost::lexical_cast issue, ProteoWizard tries to convert the charge but doesn't do any handling of the '+' as far as I can tell. I'll fix this issue in ProteoWizard and check if it resolves the issue.
Actually, I made a fix for this and went to test it but I tried running it without the fix and I didn't get any error with either mstoolkit or proteowizard.
Crud, you are right -- I just verified this. Apparently, the behavior only
appears in the crux-sfx-topn branch. Sean, what is the feasibility of
merging from the trunk into your branch, now that the mods handling is all
fixed?
Bill
On Fri, Jun 24, 2016 at 11:02 AM, Kaipo kaipot@users.sf.net wrote:
Related
Issues: #400
Wasn't easy, but merge is done. Xlink Code will need to be debugged esp
with the inclusion of mods.
Sean
On Fri, Jun 24, 2016 at 1:11 PM, William S Noble wsnoble@users.sf.net
wrote: