Menu

#224 strange behavior of Crux Percolator in Windows

Percolator
closed
None
2015-05-21
2014-12-11
No

After more careful testing of Crux Percolator in Windows, I see the following, some of which is different from what I asserted at the Crux developers meeting today.

Background: I am trying to use Percolator to process results from searches on crosslinked peptides. The representation of a crosslinked peptide will necessarily be more complicated than a string from the standard 2-letter amino acid alphabet. Furthermore, at this stage, I am not generating standard Crux tab-limited output. Instead I am post-processing my search results to generate a feature file which I want to be equivalent to the new tab-delimited .pin format.

When I put a .txt suffix on my feature file, Crux Percolator complains at the console with many iterations of

ERROR: No sequence found...

and terminates without doing anything useful. When I put a .pin suffix on my feature file, Crux Percolator seems to run normally, as judged by what appears on the console, and produces somewhat useful output files. However, many instances of messages like

ERROR: The modification symbol '2' is not valid.
WARNING: There is an unidentifiable modification in sequence <mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr> at position 14.</mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr>

subsequently appear on the console, I assume as a by-product of processing for the output files. The output files have several columns which are all zeros, with names like 'charge' and 'spectrum precursor m/z'; it appears Percolator attempted to calculate these without the requisite information being available. Also, the peptide sequences in the output files have all non-standard characters stripped out.

So, in summary, one can bypass sqt2pin/make-pin style pre-processing in Crux Percolator by naming the input with suffix .pin. However there is still some post-Percolator processing built into Crux which assumes the data came from an MS/MS experiment, and tries to conjure certain output fields accordingly. Stand-alone Percolator does not have this last behavior.

Related

Issues: #224

Discussion

  • William S Noble

    William S Noble - 2014-12-18
    • labels: --> High priority
    • assigned_to: Kaipo
     
  • Kaipo

    Kaipo - 2015-01-16

    Hi Jeff,
    The post-processing is a necessary step to output the non-standard Percolator outputs (e.g. mzid, pepxml), since Percolator's internal objects must be converted to Crux objects before they can be written.
    If you want the native Percolator output, you can use
    --original-output T".

     
    • William S Noble

      William S Noble - 2015-01-16

      I think the idea in Percolator is supposed to be that you can either
      provide a PIN file, in which case, as Jeff says, Percolator will assume
      that the data came from an MS/MS experiment, or you can provide a
      tab-delimited text file using the "--feature-in-file" option, in which case
      Percolator will just do the machine learning part but makes no assumptions
      about the meanings of the various input features.

      Jeff, why don't you want to use the feature-in-file option and a
      tab-delimited file as input?

      Bill

      On Thu, Jan 15, 2015 at 4:16 PM, Kaipo kaipot@users.sf.net wrote:

      Hi Jeff,
      The post-processing is a necessary step to output the non-standard
      Percolator outputs (e.g. mzid, pepxml), since Percolator's internal objects
      must be converted to Crux objects before they can be written.
      If you want the native Percolator output, you can use
      --original-output T".


      Status: open
      Milestone: Percolator
      Labels: High priority
      Created: Thu Dec 11, 2014 09:42 PM UTC by Jeff Howbert
      Last Updated: Thu Dec 18, 2014 07:29 PM UTC
      Owner: Kaipo

      After more careful testing of Crux Percolator in Windows, I see the
      following, some of which is different from what I asserted at the Crux
      developers meeting today.

      Background: I am trying to use Percolator to process results from searches
      on crosslinked peptides. The representation of a crosslinked peptide will
      necessarily be more complicated than a string from the standard 2-letter
      amino acid alphabet. Furthermore, at this stage, I am not generating
      standard Crux tab-limited output. Instead I am post-processing my search
      results to generate a feature file which I want to be equivalent to the new
      tab-delimited .pin format.

      When I put a .txt suffix on my feature file, Crux Percolator complains at
      the console with many iterations of

      ERROR: No sequence found...

      and terminates without doing anything useful. When I put a .pin suffix on
      my feature file, Crux Percolator seems to run normally, as judged by what
      appears on the console, and produces somewhat useful output files. However,
      many instances of messages like

      ERROR: The modification symbol '2' is not valid.
      WARNING: There is an unidentifiable modification in sequence
      <mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr> at position 14.</mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr>

      subsequently appear on the console, I assume as a by-product of processing
      for the output files. The output files have several columns which are all
      zeros, with names like 'charge' and 'spectrum precursor m/z'; it appears
      Percolator attempted to calculate these without the requisite information
      being available. Also, the peptide sequences in the output files have all
      non-standard characters stripped out.

      So, in summary, one can bypass sqt2pin/make-pin style pre-processing in
      Crux Percolator by naming the input with suffix .pin. However there is
      still some post-Percolator processing built into Crux which assumes the
      data came from an MS/MS experiment, and tries to conjure certain output
      fields accordingly. Stand-alone Percolator does not have this last behavior.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cruxtoolkit/issues/224/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Issues: #224

  • William S Noble

    William S Noble - 2015-03-06
    • assigned_to: Kaipo --> Jeff Howbert
     
  • Jeff Howbert

    Jeff Howbert - 2015-03-23

    I reinvestigated this Issue using a Linux binary (not Windows) built from trunk on 3/11/15, with a different .pin file than previously (attached). The behaviors previously reported are still observed, with some variations.

    1) Changing the extension from .pin to .txt causes Percolator to complain at
    the console with many iterations of

    ERROR: No sequence found...

    and terminate without doing anything useful.

    2) With this .pin file, the SVM training and PSM-level analysis seem to work properly, as judged by the .log and percolator.XXX.txt.psms files. However, the subsequent peptide-level analysis fails, with this message:

    FATAL: PSMID should be (((target|decoy)_fileidx)|filestem)_scan_charge_rank, but was 121212_F2-ReACT-PA-BDP-XL-4hr-1.txt_12536

    It appears Crux Percolator is looking at the Scan_Id field for encoded information on scan, charge, and rank, and not finding them. Previously, my .pin file had ScanId's constructed to hold this information, so the peptide-level analysis succeeded, although there several fields created in the output which were filled with zeros.

    For the record:

    • I tried setting --feature-in-file T on the command line, as suggested by Bill. It was rejected as an invalid parameter (see Issue #221).
    • I tried setting --original-output T on the command line, as suggested by Kaipo. It did not change the failure modes in any way.
    • I am using stand-alone Percolator on a regular basis with this and other .pin files, and do not see any of these problems.
     
    • Kaipo

      Kaipo - 2015-04-01

      Jeff , could you give this patch a try?

       

      Last edit: Kaipo 2015-04-01
  • Jeff Howbert

    Jeff Howbert - 2015-05-20

    I applied Kaipo's updated version of the patch (from 2015-05-13) to a fresh copy of the trunk checked out on 2015-05-19. The patched code was compiled and tested on a Linux machine.

    I ran tests on a small pin file, named as either test.pin or test.txt. Peptide strings in the pin file contain non-standard characters capture crosslink information, e.g. -.KVKRNSTPPLSLFGQLLWR3-7TPEEIRKTFNIK_40444.-.

    Running Percolator on these two files, with or without --feature-in-file T, gave these results.

    test.pin (--feature-in-file F)

    Percolator runs and produces more or less useful output. Before the patch, the console was filled with many repetitions of messages like:

    ERROR: The modification symbol '2' is not valid.
    WARNING: There is an unidentifiable modification in sequence <mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr> at position 14.</mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr>

    After the patch, these no longer appear.

    Otherwise, the behavior is the same as before the patch. In particular, the output files have several columns which are all zeros, with names like 'charge' and 'spectrum precursor m/z'; it appears Percolator attempted to calculate these without the requisite information being available. Also, the peptide sequences in the output files have all non-standard characters stripped out.

    test.txt (--feature-in-file F)

    Behavior unchanged by patch. Console has many repetitions of message:

    ERROR: No sequence found...

    and Percolator terminates without doing anything useful.

    test.pin --feature-in-file T

    Results identical to test.pin (--feature-in-file F).

    test.txt --feature-in-file T

    Results identical to test.pin (--feature-in-file F).

    Summary: --feature-in-file parameter is now recognized (i.e. not rejected as invalid), but setting it to T doesn't cause Percolator to treat its input as a generic, non-proteomic feature file. Changing the file suffix does not help.

     

    Last edit: Jeff Howbert 2015-05-20
  • Kaipo

    Kaipo - 2015-05-20

    Hi Jeff, Percolator should be treating the input as a generic feature file with feature-in-file=T. Can you try to turn on original-output=T and see if that works?

     
  • Jeff Howbert

    Jeff Howbert - 2015-05-21

    Hi Kaipo,

    When I set --original-output=T, it gets rid of all the nonsense columns in the output and suppresses the deletion of non-standard characters from my peptide strings. In other words, I get the same output as from stand-alone Percolator, just as you predicted.

    Additionally setting --feature-in-file=T does not change the behavior on my test.pin file. However, it does allow my test.txt file to be recognized as valid Percolator input; with this flag on it gets processed exactly like the test.pin file.

    I think you cann apply the percolator_fixes patch to trunk and close this issue, along with Issue #221.

    Thanks,

    Jeff

     
  • Kaipo

    Kaipo - 2015-05-21

    Thanks Jeff, it's committed.

     
  • Kaipo

    Kaipo - 2015-05-21
    • labels: High priority -->
    • status: open --> closed
     

Log in to post a comment.