PerfectCrossMatch Code
Brought to you by:
anate25,
marijaharamija
File | Date | Author | Commit |
---|---|---|---|
data | 2011-03-29 | marijaharamija | [r24] eplets |
Jmol.jar | 2011-03-29 | anate25 | [r18] add jmol.py |
Patient.py | 2011-03-29 | anate25 | [r17] add Patient.py |
PerfectMatch.sh | 2011-03-29 | anate25 | [r19] add match.py |
README.txt | 2011-03-29 | anate25 | [r19] add match.py |
StructureCompare.py | 2011-03-29 | anate25 | [r19] add match.py |
alignment.py | 2011-03-29 | anate25 | [r15] add alignment.py |
eplets.py | 2011-03-29 | anate25 | [r16] add eplets.py |
jmol.py | 2011-03-29 | anate25 | [r18] add jmol.py |
jmol_script | 2011-03-29 | anate25 | [r18] add jmol.py |
jmol_script_basic | 2011-03-29 | anate25 | [r18] add jmol.py |
main.py | 2011-03-29 | anate25 | [r19] add match.py |
match.py | 2011-03-29 | anate25 | [r19] add match.py |
recipientsDB.py | 2011-03-29 | anate25 | [r17] add Patient.py |
sequences.py | 2011-03-29 | anate25 | [r19] add match.py |
PerfectCrossMatch Final project for Python course Master in Bioinformatics for Health Sciences https://sourceforge.net/projects/perfectmatch/ README file (v 1.0) March 2011 Contents: 1.) Introduction to PerfectMatch 2.) Requirements 3.) How to run the program 4.) Files and directories details 5.) Program input and output 6.) Parameters that can be changed 7.) Methods used 1. Introduction: PerfectMatch is a computer algorithm that determines recipients's HLA compatibility with a given donor, which is essential in case of organ transplantation or platelet transfusion. Program finds the best possible recipients for a given donor, based on structural or sequence criteria. 2. Requirements: For running PerfectMatch following programs/packages have to be installed: - python 2.7 or higher (available at http://www.python.org/getit/) - clustalw 2.0.10 or higher (available at http://www.clustal.org/download/2.0.10/) - java 1.6(SE 6) or higher (available at http://www.java.com/en/download/index.jsp) - biopython 1.56 (available at http://www.biopython.org/wiki/Download or by command 'sudo apt-get install python-biopython') 3. How to run the program: Prior to running PerfectMatch, make sure all paths are correct. In file PerfectMatch.sh there are three enviromental variables: CLUSTAL_EXE - path where clustalw is installed JAVA_EXE - path where java is installed JMOL_EXE - Jmol.jar is located inside perfectmatch directory and ran from it. It shouldn't be removed unless you want to use jmol already installed on your computer. In that case change this path to one corresponding to your jmol installation For running the program run an executable file PerfectMatch.sh (from terminal: './PerfectMatch.sh'). 4. Files and directories details: List of modules: ---------------- main.py - Lets the user to choose what kind of input and output he would like to get. Patient.py - Consists of classes Donor and Recipient, both are inherited from class Patient. Each Patient has an ID and a phenotype, which is list of 6 HLA alleles (A,B,C) Each Donor has two lists of ranked recipients - one is ranked by sequence alignment criteria and the other is ranked by structural criteria. Each Recipient has their rank in the 2 lists described for Donor class, the number of shared alleles with the donor, alignment score and the number and names of mismatches (total and ones on protein's surface). recipientsDB.py - This module consists of one function that build the database of recipients. The default implementation generates 100 recipients by random, but can easily be replaced with one that reads in a file. match.py - Consists of classes MatchStructure and MatchSequence. Each class has its own implementation for the function match, that calculates a ranked list of recipients, for a given donor and recipients DB. StructureCompare.py - This module consists of all the functions being used to calculte recipients' ranked list by the structural criteria. alignment.py - This module consists of all the functions being used to calculte recipients' ranked list by the sequence alignment criteria. eplets.py - Reads in the file of eplets and saves for each allele its list of eplets in a dictionary. jmol.py - Generates a dynamic jmol script based on run-time parameters and executes Jmol. Variable Parameters: -------------------- main.py: surface_positions - This parameter defines the positions of amino acids that are the most crucial for HLA compatibility. It is a list of tuples, where each tuple consists of the start and end positions of a certain epitope region. This list can be easily modified or extended to include other regions as well. recipientsDB.py: DBsize - This is the number of recipients in the randomly generated database. Data: ----- All data required for running our program is located under ./data directory. eplets.txt - This file contains all the information about HLA alleles and their eplet content. It is read and processed by eplets.py. A_prot.fasta - These files contain the fasta protein sequences for HLA A,B and C alleles. B_prot.fasta They are processed by an external scipt ,sequences.py, that generates one fasta C_prot.fasta file - seqs.fasts. seqs.fasts - This file contains all the protein sequences respectively to their annotation in eplets.txt. 3PWJ.pdb - PDB file of Human Class I MHC HLA-A2 in complex with a peptide, used as a template to display the mismatches. 5. Program input and output: User can choose from various possibilities: INPUT - input donor's phenotype or use one created randomly - input number of top-ranked recipients he wants to see OUTPUT - get a ranked list of recipients created by structural criteria - get a ranked list of recipients created by sequence criteria - get ranking of certain patient (by both criteria) - get ranking of certain patient and a jmol structural view of his mismatches 7. Methods used: In organ transplantation, it is well know that recipients with zero-antigen mismatches(all identical alleles) have the highest success. But even in cases with mismatches, a lot of transplantations go well. We are showing here two methods that compare donor's and recipient's alleles and get their HLA compatibility based on either structure or sequence of proteins encoded by HLA alleles. Structural criteria: Ranked list of recipients is created in following order: 1. Number of identical alleles - best case scenario is that recipient has as many identical alleles to donor as possible. This means that he will have less posibility to produce antibodies and refuse transplantation. 2. Number of surface mismatches - given the structure of protein encoded by HLA and it's function (to present foreign pathogen to a T cell) we can asses which parts of structure are more important for its function -ones that are in surface around the peptide binding site are ones that are being recognized by T cells (they are called epitopes). We defined alpha chain helix area as surface "black listed" mismatches, and in ranking, after number of identical alleles, lower number of surface mismatches puts recipient higher on list. 3. Total number of mismatches - overall measure of difference between donor's and recipient's alleles Sequence criteria: Unlike structural criteria, here we compare sequence similarity between donor's and recipient's alleles. Ranked list of recipients is created in following order: 1. Number of identical alleles 2. Score - From clustal multiple sequence alignment, pairwise alignment scores are taken, from between each one of the donor's allele with all recipient's alleles. The best one is taken as a score (because, to be resistant to donor's antigens, it is sufficient that recipient has high similarity with one of donor's alleles). Values for all alleles are averaged into one score. Download source code by Apache Subversion command: svn co https://perfectmatch.svn.sourceforge.net/svnroot/perfectmatch perfectmatch Contact us at: anate25@gmail.com marija.haramija@gmail.com