Download Latest Version residue finder 4 sourcef.zip (801.9 kB)
Email in envelope

Get an email when there's a new version of ResidueFinder

Home
Name Modified Size InfoDownloads / Week
README.txt 2021-01-27 9.2 kB
residue finder 4 sourcef.zip 2021-01-27 801.9 kB
Totals: 2 Items   811.1 kB 0
#MutationFinder Copyright (c) 2007 Regents of the University of Colorado
#ResidueFinder Modifications of MutationFinder Copyright (c) 2015 Board of Trustees, University of Illinois
# License type is MIT (see http://opensource.org/licenses/MIT or MUTATIONFINDER_HOME/doc/license.txt or ResidueFinder_home/doc/license for RF.txt)

original MutationFinder(MF) program located at http://mutationfinder.sourceforge.net/

Instructions for working with the residue_finder.py
---
MODIFICATIONS FOR USE AS ResidueFinder(RF) by ton e becker 
*all same as MF except only use is with python except output includes a file of unique mentions in number order and
rename extra regexes to regex.txt depending on what qualities you desire.

*Unpack main zip folder file

READ ME FOR RF 

1. Install python 2.7 (untested in later versions). 

2. Put the main folder in your desired folder to use/create all files in this folder(all files used must be her for the program to find them).  

3. a)Choose your desired regex and rename the file as regex.txt. OR b)copy the desired regex file and rename the copy as regex.txt

4. Create input MF format file as described below in RF readme("PMID" "tab character" "entire article string on one line", new line for each PMID article ).  

5. Use your operating system(OS) command line interface to get to your desired folder. 

6. Enter python ./residue_finder.py INPUT_FILE_NAME.txt (May or may not require the initial . or ./ in your OS.)  

7. Wait for output, time dependent on file size/processor speed/regex choice.

8. When run is finished, if you used 3a) rename the file “regex.txt” to its original name OR if you used 3b) delete the file regex.txt prior to running with a different regex  

9. Output files are INPUT_FILE_NAME.txt.mf and INPUT_FILE_NAME.txt.mf.rf.out  The .mf entries are all the mentions in the input documents of the particular residue in mutation finder format; for example "D83X" should be read as residue D in location 83.  The .out entries consist of all the times that a particular residue is mentioned at least once in a paper, in residue finder format; for example "D83" should be read as residue D in location 83. Note also that all output from ResidueFinder uses the single letter designation for the amino acid, even if the original text uses three letter designation. Thus for example if ResidueFinder finds "Asp83" in the text it will return "D83X" in the .mf output and "D83" in the .out output

Example: output files where the input is two papers, PMID 19917730 and PMID 12124848

.mf  output is:

19917730   T449X V438X V438X D83X D83X D83X

12124848   V23X V23X A20X

.out output is:

19917730   D83  V438 T449

12124848   A20  V23

10. If repeated with same input name and you wish to keep old output you must rename output files first.
 



----
below is the original MutationFinder readme
----
INSTALLATION NOTES
----
Download MutationFinder from http://mutationfinder.sourceforge.net. Unpack
the project with the command:

tar -xvzf MutationFinder<version_number>.tar.gz

You will now have a new directory called MutationFinder in your current
working directory.

After downloading and unpacking the system, if you plan to use the
mutation_finder.py script from any location outside of the install
directory, it is necessary to update the mutation_finder_home variable
in mutation_finder.py. Change the value of this variable to the full
path where your mutation_finder.py file lives.

For example, change:
mutation_finder_home = './'

to:
mutation_finder_home = '/path/to/MutationFinder'

____
RUNNING MutationFinder
____
If you have a file formatted as described in (Caporaso et al., 2007),
you can apply MutationFinder with the following steps:

> cd MutationFinder
> ./mutation_finder.py /path/to/your/input/file

A new file will be created in the current working directory called 
 input_filename.mf 

A non-default output directory can be specified with the -o flag. Run:

> ./mutation_finder.py -h

for more information on parameters which can be passed to mutation_finder.py

____ 
INPUT FILE FORMAT
____
The input files to be processed by MutationFinder should contain one 'document'
per line. Each line should be tab-delimited and contain two fields: a document 
identifier and the document text. See the devo_set.txt and test_set.txt files in
the MutationFinder/corpora directory for examples.

----
USAGE EXAMPLES
----
Examples for using the code (tested on MacOS X and Linux -- Windows tests 
 to follow). These assume that your current working directory is the 
 directory where the code has been unpacked.

# Apply MutationFinder to the test set discussed in (Caporaso et al., 2007);
# the results will be written to test_set.txt.mf
> ./mutation_finder.py test_set.txt
# Compare the output of MutationFinder to the gold standard data and 
# print the results
> ./performance.py test_set.txt.mf test_gold_std.txt 
# Run the unit tests for the mutation_finder.py script
> ./test_mutation_finder.py -v
# Run the unit tests for the performance.py script
> ./test_performance.py -v

Additional information on using these scripts is available by passing '-h'
to either script via the command line:

> mutation_finder.py -h
> performance.py -h

---
NOTES ON THE INCLUDED FILES
---

MutationFinder/
 |
 |- mutation_finder.py: the mutation finder script -- this is the system
 |   presented in (Caporaso et al., 2007) and can be applied to any
 |   text conforming to the format discussed in that paper. (For examples
 |   of the input format, see devo_set.txt and test_set.txt.) The '-b'
 |   option allows the user to apply the baseline system rather than 
 |   MutationFinder to the input texts. 
 |- test_mutation_finder.py: tests of the mutation finder script
 |- regex.txt: the collection of regular expressions used in MutationFinder. This
 |   file is read in by MutationFinder. Lines beginning with '#' are comments. 
 |   These are perl-style regular expressions, but make use of named capturing 
 |   groups. Lines ending with the text:
 |       [CASE_SENSITIVE]
 |   will yield case sensitive regular expressions. In the default regex.txt
 |   file, there is only one case sensitive regular expression on the first
 |   line. Refer to this as an example for how to define a case sensitive
 |   regular expression. 
 |
 |   If you are unfamiliar with the idea of named capturing groups, a regular
 |   expression feature introduced in Python, there is an introductory
 |   discussion of it here:
 |       http://www.regular-expressions.info/named.html
 |   In case that page disappears, you should be able to get information
 |   by googling for 'named group regular expression'.
 |- performance.py: the performance judgment script -- this compares the 
 |   output of the extraction system with to the gold standard answers
 |   and provides data on the three performance metrics discussed in
 |   (Caporaso et al., 2007): Extracted Mentions, Normalized Mutations,
 |   and Document Retrieval 
 |- test_performance.py: unit tests of the performance judgment script
|
 |- doc/ : documentation and licensing information
    |- README.txt: this file, contains general information about the package,
    |   and discussion and usage examples for python implementation of 
    |   MutationFinder
    |- README_java.txt: discussion and usage examples for java implementation
    |   of MutationFinder
    |- README_perl.txt: discussion and usage examples for perl implementation
    |   of MutationFinder
    |- license.txt: the license agreement associated with the MutationFinder 
    |   release and all associated corpora files, unit tests, and scripts.
 |
 |- corpora/ : test and development corpora and gold standards
    |- devo_set.txt: the development set texts, one abstract per line with 
    |   identifiers set as the PubMed identifiers of the source articles
    |- devo_gold_std.txt: the gold standard 'answers' -- these are the human-
    |   annotated mutations identified in the development set texts
    |- test_set.txt: the test set texts, one abstract per line with 
    |   identifiers set as the PubMed identifiers of the source articles
    |- test_gold_std.txt: the gold standard 'answers' -- these are the human-
    |   annotated mutations identified in the test set texts
 |- java/ source code for java implementation -- see 
 |    MutationFinder/doc/README_java.txt
 |- perl/ source code for perl implementation -- see 
 |    MutationFinder/doc/README_perl.txt



Please direct any questions to the author: gregcaporaso@gmail.com


----
Citing MutationFinder
----
Please cite MutationFinder with the following reference:
MutationFinder: A high-performance system for extracting point mutation 
mentions from text;  J. Gregory Caporaso, William A. Baumgartner Jr., David 
A. Randolph, K. Bretonnel Cohen, and Lawrence Hunter; Bioinformatics, 2007; 
doi: 10.1093/bioinformatics/btm235;

The article is publicly available from the Bioinformatics Journal's website - 
search under the doi cited above. (Once it goes to press we'll provide a direct
url for accessing the article. It is currently available via Advance Access,
and I expect the URL may change.) The article is an Open Access publication.

Source: README.txt, updated 2021-01-27