Crux-Toolkit / Issues / #224 strange behavior of Crux Percolator in Windows

William S Noble - 2014-12-18

labels: --> High priority

assigned_to: Kaipo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kaipo - 2015-01-16

Hi Jeff,
The post-processing is a necessary step to output the non-standard Percolator outputs (e.g. mzid, pepxml), since Percolator's internal objects must be converted to Crux objects before they can be written.
If you want the native Percolator output, you can use
--original-output T".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- William S Noble - 2015-01-16
  
  I think the idea in Percolator is supposed to be that you can either
  provide a PIN file, in which case, as Jeff says, Percolator will assume
  that the data came from an MS/MS experiment, or you can provide a
  tab-delimited text file using the "--feature-in-file" option, in which case
  Percolator will just do the machine learning part but makes no assumptions
  about the meanings of the various input features.
  
  Jeff, why don't you want to use the feature-in-file option and a
  tab-delimited file as input?
  
  Bill
  
  On Thu, Jan 15, 2015 at 4:16 PM, Kaipo kaipot@users.sf.net wrote:
  
  Hi Jeff,
  The post-processing is a necessary step to output the non-standard
  Percolator outputs (e.g. mzid, pepxml), since Percolator's internal objects
  must be converted to Crux objects before they can be written.
  If you want the native Percolator output, you can use
  --original-output T".
  
  [issues:#224] http://sourceforge.net/p/cruxtoolkit/issues/224 strange
  behavior of Crux Percolator in Windows *
  
  Status: open
  Milestone: Percolator
  Labels: High priority
  Created: Thu Dec 11, 2014 09:42 PM UTC by Jeff Howbert
  Last Updated: Thu Dec 18, 2014 07:29 PM UTC
  Owner: Kaipo
  
  After more careful testing of Crux Percolator in Windows, I see the
  following, some of which is different from what I asserted at the Crux
  developers meeting today.
  
  Background: I am trying to use Percolator to process results from searches
  on crosslinked peptides. The representation of a crosslinked peptide will
  necessarily be more complicated than a string from the standard 2-letter
  amino acid alphabet. Furthermore, at this stage, I am not generating
  standard Crux tab-limited output. Instead I am post-processing my search
  results to generate a feature file which I want to be equivalent to the new
  tab-delimited .pin format.
  
  When I put a .txt suffix on my feature file, Crux Percolator complains at
  the console with many iterations of
  
  ERROR: No sequence found...
  
  and terminates without doing anything useful. When I put a .pin suffix on
  my feature file, Crux Percolator seems to run normally, as judged by what
  appears on the console, and produces somewhat useful output files. However,
  many instances of messages like
  
  ERROR: The modification symbol '2' is not valid.
  WARNING: There is an unidentifiable modification in sequence
  <mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr> at position 14.</mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr>
  
  subsequently appear on the console, I assume as a by-product of processing
  for the output files. The output files have several columns which are all
  zeros, with names like 'charge' and 'spectrum precursor m/z'; it appears
  Percolator attempted to calculate these without the requisite information
  being available. Also, the peptide sequences in the output files have all
  non-standard characters stripped out.
  
  So, in summary, one can bypass sqt2pin/make-pin style pre-processing in
  Crux Percolator by naming the input with suffix .pin. However there is
  still some post-Percolator processing built into Crux which assumes the
  data came from an MS/MS experiment, and tries to conjure certain output
  fields accordingly. Stand-alone Percolator does not have this last behavior.
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/cruxtoolkit/issues/224/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  Related
  
  Issues: ~~#224~~
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

William S Noble - 2015-03-06

assigned_to: Kaipo --> Jeff Howbert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Howbert - 2015-03-23

I reinvestigated this Issue using a Linux binary (not Windows) built from trunk on 3/11/15, with a different .pin file than previously (attached). The behaviors previously reported are still observed, with some variations.

1) Changing the extension from .pin to .txt causes Percolator to complain at
the console with many iterations of

ERROR: No sequence found...

and terminate without doing anything useful.

2) With this .pin file, the SVM training and PSM-level analysis seem to work properly, as judged by the .log and percolator.XXX.txt.psms files. However, the subsequent peptide-level analysis fails, with this message:

FATAL: PSMID should be (((target|decoy)_fileidx)|filestem)_scan_charge_rank, but was 121212_F2-ReACT-PA-BDP-XL-4hr-1.txt_12536

It appears Crux Percolator is looking at the Scan_Id field for encoded information on scan, charge, and rank, and not finding them. Previously, my .pin file had ScanId's constructed to hold this information, so the peptide-level analysis succeeded, although there several fields created in the output which were filled with zeros.

For the record:

I tried setting --feature-in-file T on the command line, as suggested by Bill. It was rejected as an invalid parameter (see Issue #221).

I tried setting --original-output T on the command line, as suggested by Kaipo. It did not change the failure modes in any way.

I am using stand-alone Percolator on a regular basis with this and other .pin files, and do not see any of these problems.

pin_150323a_00.pin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kaipo - 2015-04-01
  
  Jeff , could you give this patch a try?
  
  Last edit: Kaipo 2015-04-01
  
  percolator_fixes.diff
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Howbert - 2015-05-20

I applied Kaipo's updated version of the patch (from 2015-05-13) to a fresh copy of the trunk checked out on 2015-05-19. The patched code was compiled and tested on a Linux machine.

I ran tests on a small pin file, named as either test.pin or test.txt. Peptide strings in the pin file contain non-standard characters capture crosslink information, e.g. -.KVKRNSTPPLSLFGQLLWR3-7TPEEIRKTFNIK_40444.-.

Running Percolator on these two files, with or without --feature-in-file T, gave these results.

test.pin (--feature-in-file F)

Percolator runs and produces more or less useful output. Before the patch, the console was filled with many repetitions of messages like:

ERROR: The modification symbol '2' is not valid.
WARNING: There is an unidentifiable modification in sequence <mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr> at position 14.</mgkdnkehkesk*1-28*geaiavaiaqmstvdlascdhgvvasvkrcimerdlypr>

After the patch, these no longer appear.

Otherwise, the behavior is the same as before the patch. In particular, the output files have several columns which are all zeros, with names like 'charge' and 'spectrum precursor m/z'; it appears Percolator attempted to calculate these without the requisite information being available. Also, the peptide sequences in the output files have all non-standard characters stripped out.

test.txt (--feature-in-file F)

Behavior unchanged by patch. Console has many repetitions of message:

ERROR: No sequence found...

and Percolator terminates without doing anything useful.

test.pin --feature-in-file T

Results identical to test.pin (--feature-in-file F).

test.txt --feature-in-file T

Results identical to test.pin (--feature-in-file F).

Summary: --feature-in-file parameter is now recognized (i.e. not rejected as invalid), but setting it to T doesn't cause Percolator to treat its input as a generic, non-proteomic feature file. Changing the file suffix does not help.

Last edit: Jeff Howbert 2015-05-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kaipo - 2015-05-20

Hi Jeff, Percolator should be treating the input as a generic feature file with feature-in-file=T. Can you try to turn on original-output=T and see if that works?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Howbert - 2015-05-21

Hi Kaipo,

When I set --original-output=T, it gets rid of all the nonsense columns in the output and suppresses the deletion of non-standard characters from my peptide strings. In other words, I get the same output as from stand-alone Percolator, just as you predicted.

Additionally setting --feature-in-file=T does not change the behavior on my test.pin file. However, it does allow my test.txt file to be recognized as valid Percolator input; with this flag on it gets processed exactly like the test.pin file.

I think you cann apply the percolator_fixes patch to trunk and close this issue, along with Issue #221.

Thanks,

Jeff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kaipo - 2015-05-21

Thanks Jeff, it's committed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kaipo - 2015-05-21

labels: High priority -->

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

strange behavior of Crux Percolator in Windows

Software toolkit for tandem mass spectrometry analysis

Milestone

Searches

Help

#224 strange behavior of Crux Percolator in Windows

Related

Discussion

Related