Menu

#400 tide-search fails to parse MGF file (in search-for-xlinks branch)

post v2.0
open
Kaipo
None
2016-09-07
2016-06-22
No

The tide-search command fails to parse the attached MGF file, irrespective of whether I use the pwiz or mstoolkit parser. In one case, it complains that it can't identify the format; in the other it complains about the line specifying the charge state.

Here is a description of MGF:

http://www.matrixscience.com/help/data_file_help.html

I can't see any problem with the file I'm using.

bash-4.1$ rm -fr crux-output; /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux tide-search small.mgf ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
INFO: Creating index from '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
INFO: Running tide-index...
INFO: Writing results to output directory 'crux-output/tide-search.tempindex'.
INFO: Reading /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta and computing unmodified peptides...
INFO: Writing decoy fasta...
INFO: Reading proteins
INFO: Precomputing theoretical spectra...
INFO: Beginning tide-search.
WARNING: The output directory 'crux-output' already exists.
Existing files will not be overwritten.
INFO: CPU: pyrrolysine.gs.washington.edu
INFO: Wed Jun 22 12:09:42 PDT 2016
INFO: Running tide-search...
INFO: Reading index crux-output/tide-search.tempindex
INFO: Converting small.mgf to spectrumrecords format
ERROR: [SpectrumList_MGF::parseSpectrum] Error parsing line at offset 122: CHARGE=3+

FATAL: Error converting small.mgf to spectrumrecords format
bash-4.1$ rm -fr crux-output ; /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux tide-search --spectrum-parser mstoolkit small.mgf ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
INFO: Creating index from '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
INFO: Running tide-index...
INFO: Writing results to output directory 'crux-output/tide-search.tempindex'.
INFO: Reading /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta and computing unmodified peptides...
INFO: Writing decoy fasta...
INFO: Reading proteins
INFO: Precomputing theoretical spectra...
INFO: Beginning tide-search.
WARNING: The output directory 'crux-output' already exists.
Existing files will not be overwritten.
INFO: CPU: pyrrolysine.gs.washington.edu
INFO: Wed Jun 22 12:10:10 PDT 2016
INFO: Running tide-search...
INFO: Reading index crux-output/tide-search.tempindex
INFO: Converting small.mgf to spectrumrecords format
Unknown file format
FATAL: MSToolkit: Error reading spectra file: /net/noble/vol4/noble/user/noble/proj/crux-projects/2016gaussian/results/bill/2016-06-22segfault/small.mgf

1 Attachments

Related

Issues: #400

Discussion

  • William S Noble

    William S Noble - 2016-06-22

    Important update: for pwiz, the problem only shows up if I compile in Debug mode. If I use Release mode and the pwiz parser, the search completes successfully.

    bash-4.1$ rm -fr crux-output ; /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Release/src/crux tide-search small.mgf ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
    INFO: Creating index from '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
    INFO: Running tide-index...
    INFO: Writing results to output directory 'crux-output/tide-search.tempindex'.
    INFO: Reading /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta and computing unmodified peptides...
    INFO: Writing decoy fasta...
    INFO: Reading proteins
    INFO: Precomputing theoretical spectra...
    INFO: Beginning tide-search.
    WARNING: The output directory 'crux-output' already exists.
    Existing files will not be overwritten.
    INFO: CPU: pyrrolysine.gs.washington.edu
    INFO: Wed Jun 22 12:15:18 PDT 2016
    INFO: Running tide-search...
    INFO: Reading index crux-output/tide-search.tempindex
    INFO: Converting small.mgf to spectrumrecords format
    INFO: Parser could not determine scan numbers for this file, using ordinal numbers as scan numbers.
    INFO: Reading spectra file crux-output/small.mgf.spectrumrecords.tmp
    INFO: Sorting spectra
    INFO: Running search
    INFO: Time per spectrum-charge combination: 0.001960 s.
    INFO: Average number of candidates per spectrum-charge combination: 0.965116
    INFO: Elapsed time: 0.414 s
    INFO: Finished crux tide-search.
    INFO: Return Code:0
    bash-4.1$

     
  • Kaipo

    Kaipo - 2016-06-22

    It looks like another boost::lexical_cast issue, ProteoWizard tries to convert the charge but doesn't do any handling of the '+' as far as I can tell. I'll fix this issue in ProteoWizard and check if it resolves the issue.

     
  • Kaipo

    Kaipo - 2016-06-24

    Actually, I made a fix for this and went to test it but I tried running it without the fix and I didn't get any error with either mstoolkit or proteowizard.

     
    • William S Noble

      William S Noble - 2016-06-24

      Crud, you are right -- I just verified this. Apparently, the behavior only
      appears in the crux-sfx-topn branch. Sean, what is the feasibility of
      merging from the trunk into your branch, now that the mods handling is all
      fixed?

      Bill

      On Fri, Jun 24, 2016 at 11:02 AM, Kaipo kaipot@users.sf.net wrote:

      Actually, I made a fix for this and went to test it but I tried running it
      without the fix and I didn't get any error with either mstoolkit or
      proteowizard.


      Status: open
      Milestone: post v2.0
      Labels: High priority
      Created: Wed Jun 22, 2016 07:13 PM UTC by William S Noble
      Last Updated: Wed Jun 22, 2016 08:52 PM UTC
      Owner: Kaipo
      Attachments:

      The tide-search command fails to parse the attached MGF file, irrespective
      of whether I use the pwiz or mstoolkit parser. In one case, it complains
      that it can't identify the format; in the other it complains about the line
      specifying the charge state.

      Here is a description of MGF:

      http://www.matrixscience.com/help/data_file_help.html

      I can't see any problem with the file I'm using.

      bash-4.1$ rm -fr crux-output;
      /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux
      tide-search small.mgf
      ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
      INFO: Creating index from
      '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
      INFO: Running tide-index...
      INFO: Writing results to output directory
      'crux-output/tide-search.tempindex'.
      INFO: Reading
      /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
      and computing unmodified peptides...
      INFO: Writing decoy fasta...
      INFO: Reading proteins
      INFO: Precomputing theoretical spectra...
      INFO: Beginning tide-search.
      WARNING: The output directory 'crux-output' already exists.
      Existing files will not be overwritten.
      INFO: CPU: pyrrolysine.gs.washington.edu
      INFO: Wed Jun 22 12:09:42 PDT 2016
      INFO: Running tide-search...
      INFO: Reading index crux-output/tide-search.tempindex
      INFO: Converting small.mgf to spectrumrecords format
      ERROR: [SpectrumList_MGF::parseSpectrum] Error parsing line at offset
      122: CHARGE=3+

      FATAL: Error converting small.mgf to spectrumrecords format
      bash-4.1$ rm -fr crux-output ;
      /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux
      tide-search --spectrum-parser mstoolkit small.mgf
      ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
      INFO: Creating index from
      '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
      INFO: Running tide-index...
      INFO: Writing results to output directory
      'crux-output/tide-search.tempindex'.
      INFO: Reading
      /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
      and computing unmodified peptides...
      INFO: Writing decoy fasta...
      INFO: Reading proteins
      INFO: Precomputing theoretical spectra...
      INFO: Beginning tide-search.
      WARNING: The output directory 'crux-output' already exists.
      Existing files will not be overwritten.
      INFO: CPU: pyrrolysine.gs.washington.edu
      INFO: Wed Jun 22 12:10:10 PDT 2016
      INFO: Running tide-search...
      INFO: Reading index crux-output/tide-search.tempindex
      INFO: Converting small.mgf to spectrumrecords format
      Unknown file format
      FATAL: MSToolkit: Error reading spectra file:
      /net/noble/vol4/noble/user/noble/proj/crux-projects/2016gaussian/results/bill/2016-06-22segfault/small.mgf


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cruxtoolkit/issues/400/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Issues: #400

      • sjoemac

        sjoemac - 2016-06-25

        Wasn't easy, but merge is done. Xlink Code will need to be debugged esp
        with the inclusion of mods.

        Sean

        On Fri, Jun 24, 2016 at 1:11 PM, William S Noble wsnoble@users.sf.net
        wrote:

        Crud, you are right -- I just verified this. Apparently, the behavior only
        appears in the crux-sfx-topn branch. Sean, what is the feasibility of
        merging from the trunk into your branch, now that the mods handling is all
        fixed?

        Bill

        On Fri, Jun 24, 2016 at 11:02 AM, Kaipo kaipot@users.sf.net wrote:

        Actually, I made a fix for this and went to test it but I tried running
        it
        without the fix and I didn't get any error with either mstoolkit or
        proteowizard.


        Status: open
        Milestone: post v2.0
        Labels: High priority
        Created: Wed Jun 22, 2016 07:13 PM UTC by William S Noble
        Last Updated: Wed Jun 22, 2016 08:52 PM UTC
        Owner: Kaipo
        Attachments:

        The tide-search command fails to parse the attached MGF file,
        irrespective
        of whether I use the pwiz or mstoolkit parser. In one case, it complains
        that it can't identify the format; in the other it complains about the
        line
        specifying the charge state.

        Here is a description of MGF:

        http://www.matrixscience.com/help/data_file_help.html

        I can't see any problem with the file I'm using.

        bash-4.1$ rm -fr crux-output;

        /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux
        tide-search small.mgf
        ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        INFO: Creating index from

        '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
        INFO: Running tide-index...
        INFO: Writing results to output directory
        'crux-output/tide-search.tempindex'.
        INFO: Reading

        /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        and computing unmodified peptides...
        INFO: Writing decoy fasta...
        INFO: Reading proteins
        INFO: Precomputing theoretical spectra...
        INFO: Beginning tide-search.
        WARNING: The output directory 'crux-output' already exists.
        Existing files will not be overwritten.
        INFO: CPU: pyrrolysine.gs.washington.edu
        INFO: Wed Jun 22 12:09:42 PDT 2016
        INFO: Running tide-search...
        INFO: Reading index crux-output/tide-search.tempindex
        INFO: Converting small.mgf to spectrumrecords format
        ERROR: [SpectrumList_MGF::parseSpectrum] Error parsing line at offset
        122: CHARGE=3+

        FATAL: Error converting small.mgf to spectrumrecords format
        bash-4.1$ rm -fr crux-output ;

        /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux
        tide-search --spectrum-parser mstoolkit small.mgf
        ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        INFO: Creating index from

        '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
        INFO: Running tide-index...
        INFO: Writing results to output directory
        'crux-output/tide-search.tempindex'.
        INFO: Reading

        /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        and computing unmodified peptides...
        INFO: Writing decoy fasta...
        INFO: Reading proteins
        INFO: Precomputing theoretical spectra...
        INFO: Beginning tide-search.
        WARNING: The output directory 'crux-output' already exists.
        Existing files will not be overwritten.
        INFO: CPU: pyrrolysine.gs.washington.edu
        INFO: Wed Jun 22 12:10:10 PDT 2016
        INFO: Running tide-search...
        INFO: Reading index crux-output/tide-search.tempindex
        INFO: Converting small.mgf to spectrumrecords format
        Unknown file format
        FATAL: MSToolkit: Error reading spectra file:

        /net/noble/vol4/noble/user/noble/proj/crux-projects/2016gaussian/results/bill/2016-06-22segfault/small.mgf

        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/cruxtoolkit/issues/400/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        ** [issues:#400] tide-search fails to parse MGF file**

        Status: open
        Milestone: post v2.0
        Labels: High priority
        Created: Wed Jun 22, 2016 07:13 PM UTC by William S Noble
        Last Updated: Fri Jun 24, 2016 06:02 PM UTC
        Owner: Kaipo
        Attachments:

        • small.mgf (sourceforge.net)
          (274.9 kB; application/octet-stream)

        The tide-search command fails to parse the attached MGF file, irrespective
        of whether I use the pwiz or mstoolkit parser. In one case, it complains
        that it can't identify the format; in the other it complains about the line
        specifying the charge state.

        Here is a description of MGF:

        http://www.matrixscience.com/help/data_file_help.html

        I can't see any problem with the file I'm using.

        bash-4.1$ rm -fr crux-output;
        /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux
        tide-search small.mgf
        ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        INFO: Creating index from
        '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
        INFO: Running tide-index...
        INFO: Writing results to output directory
        'crux-output/tide-search.tempindex'.
        INFO: Reading
        /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        and computing unmodified peptides...
        INFO: Writing decoy fasta...
        INFO: Reading proteins
        INFO: Precomputing theoretical spectra...
        INFO: Beginning tide-search.
        WARNING: The output directory 'crux-output' already exists.
        Existing files will not be overwritten.
        INFO: CPU: pyrrolysine.gs.washington.edu
        INFO: Wed Jun 22 12:09:42 PDT 2016
        INFO: Running tide-search...
        INFO: Reading index crux-output/tide-search.tempindex
        INFO: Converting small.mgf to spectrumrecords format
        ERROR: [SpectrumList_MGF::parseSpectrum] Error parsing line at offset 122:
        CHARGE=3+

        FATAL: Error converting small.mgf to spectrumrecords format
        bash-4.1$ rm -fr crux-output ;
        /net/noble/vol1/home/noble/proj/crux/branches/crux-sfx-topn/Debug/src/crux
        tide-search --spectrum-parser mstoolkit small.mgf
        ~peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        INFO: Creating index from
        '/net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta'
        INFO: Running tide-index...
        INFO: Writing results to output directory
        'crux-output/tide-search.tempindex'.
        INFO: Reading
        /net/noble/vol1/home/peaklist/proj/data/crosslink/ribosome/ecoli_K12_ribo_02.fasta
        and computing unmodified peptides...
        INFO: Writing decoy fasta...
        INFO: Reading proteins
        INFO: Precomputing theoretical spectra...
        INFO: Beginning tide-search.
        WARNING: The output directory 'crux-output' already exists.
        Existing files will not be overwritten.
        INFO: CPU: pyrrolysine.gs.washington.edu
        INFO: Wed Jun 22 12:10:10 PDT 2016
        INFO: Running tide-search...
        INFO: Reading index crux-output/tide-search.tempindex
        INFO: Converting small.mgf to spectrumrecords format
        Unknown file format
        FATAL: MSToolkit: Error reading spectra file:
        /net/noble/vol4/noble/user/noble/proj/crux-projects/2016gaussian/results/bill/2016-06-22segfault/small.mgf


        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/cruxtoolkit/issues/400/>

        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>

         
  • William S Noble

    William S Noble - 2016-09-07
    • labels: High priority -->
    • summary: tide-search fails to parse MGF file --> tide-search fails to parse MGF file (in search-for-xlinks branch)
     

Log in to post a comment.

MongoDB Logo MongoDB