Name | Modified | Size | Downloads / Week |
---|---|---|---|
ProteinLasso | 2011-10-30 | ||
Readme.txt | 2011-10-30 | 2.9 kB | |
Totals: 2 Items | 2.9 kB | 0 |
ProteinLasso: In this paper, we formulate the protein inference problem as a constrained Lasso regression problem and then solve it with a fast pathwise coordinate descent algorithm. The new inference algorithm ProteinLasso explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. ProteinLasso is implemented in Java and can run on any Java Virtual Machine (JVM) regardless of computer architecture. How to try the program: 1. Install and run Eclipse into your computer. Important prerequisite: you must first install the Java Development Kit (JDK). Eclipse requires a JRE or JDK to start because Eclipse is itself a - sophisticated - Java application with millions of lines of Java code. JDK and Eclipse are both free which are available at: http://www.eclipse.org/downloads/. During the Eclipse startup, it will ask you to specify the location of the workspace - where you will store the source codes. 2. Download the source codes from this website into your computer. Click on the link Program - PseudoRandom.java to download this Java file into the folder you chose as the workspace. 3. Execute ProteinLasso with Eclipse. You can use the Import Wizard to import the ProteinLasso project into workspace. From the main menu bar, select " File > Import and select General". The Import wizard opens. Select "General > Existing Project into Workspace" and click "Next". Choose "Select root directory" and click the associated "Browse" to locate the directory where you put the ProteinLasso project. Under "Projects" select the ProteinLasso project to import. Click "Finish" to start the import. Select Coordinate.java and click the white arrow in a green circle to start the Java program. The input data is put in the folder "real_data". You can change the file name in the main function to run your own data. Input and output files. 1. Input files peptideFile input file with a list of sequences and confidence score for candidate peptide identifications. detectfile input file with detectability for all tryptic peptides from all candidate proteins. Peptide detectabilities can be obtained from http://darwin.informatics.indiana.edu/applications/PeptideDetectabilityPredictor. 2. Output files resultFile output file with the full inference result including the probabilities for the proteins 3. Formats of input and output files 3.1. Accepted formats for peptide identification file pospepfile Format (tab delimited): pospep1 peptide_probability1 pospep2 peptide_probability2 ... Note: peptide_probability refers to the peptide spectra matching score. It can be obtained from software such as Peptideprophet. 3.2. Format of detfile (tab delimited): All_peptide candidate_protein detectability 3.3. Format of output resultFile(tab delimited): All_protein protein probability