KIR Genotype program - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
KGP_v1.0.zip	2011-11-24	405.1 kB
README.txt	2011-04-17	6.5 kB
Totals: 2 Items		411.6 kB

KGP: A Program for Determining KIR Genotypic Diversity within a Population and Comparing With Global Populations
Komal Manpreet Singh, Manpreet Singh

1. KGP- KIR Genotype Program
KGP is a program that reveals the KIR genotypic diversity within a dataset using binary coded KIR genotypic patterns generated by the presence (1) and absence (0) of 16 KIR genes on a diploid chromosome. The flexibility of the tool parses the data in such a way that individuals or populations sharing the same genotype pattern are grouped together. It documents the exact number of distinct genotypes thus deciphering the KIR genotypic diversity in large datasets of individuals or populations. The program also maps the KIR genotypic frequencies across populations for comparisons. The package has two separate programs 1) KGP for Individual Population: Enumerates the distinct KIR genotype patterns within a given population. 2) KGP for population comparison: Comparison of KIR genotypes with other populations by grouping populations sharing the same genotype patterns along with their frequencies mapped across populations.

2. Files Included in the Package
A) KGP for Individual Population
parsegenotypedata.pl
raw_data_genotype.txt (Input File)
output_analysis_genotype.xls (Output File)

B) KGP for Population Comparison
parseGenotypeDataAndFrequencies.pl
raw_data_genotype.txt (Input File 1)
raw_data_frequency.txt (Input File 2)
output_analysis_genotype.xls (Output File)

3. How to Run KGP
KGP.pl runs on Windows, Linux, and UNIX system if a Perl interpreter is installed.
If you want a Perl interpreter installed, please refer to http://www.perl.org/get.html to see how to obtain and use the Perl interpreter on your system.
To run KGP, you can either run the script parsegenotypedata.pl or parseGenotypeDataAndFrequencies.pl (if you already installed a Perl interpreter in you system.)
A. KGP for Individual Population: parsegenotypedata.pl
B. KGP for Population Comparison: parseGenotypeDataAndFrequencies.pl

4. How to Prepare Input Files
Convert the presence and absence of the gene into binary codes 1 (presence) and 0 (absence) for each individual or population. The flexibility of the program enables to analyze genotype patterns for any number of genes specified in a genotype and in any order; therefore, the input file does not include the header.
A. KGP for Individual Population: The program needs only one input file (to be placed in the same folder as the .pl file) that is a text tab delimited file (to be placed in the same folder as the .pl file). For ease just copy paste the raw data that include the individual ID and the pattern in the input file provided named raw_data_genotype.txt. Avoid the usage of other special characters like comma, hyphen, semicolon, underscore, space etc) in the IDs.
(The first column are the IDs followed by the genotype pattern represented as 1 and 0 to designate the presence and absence of the KIR gene, respectively)
B. KGP for Population Comparison: This program uses two input files (to be placed in the same folder as the .pl file), both text tab delimited files to generate the desired output. This program compares the genotypic diversity by mapping the genotype patterns and frequencies across populations. Therefore, it is mandatory to abbreviate the populations and designate each genotype pattern with a population ID numbered consecutively. The population ID and the number has to be separated by 4 colon (For example South African San has 13 genotypes; the data will be represented ,as SAS::::1, SAS::::2, SAS::::3SAS::::13).
raw_data_genotype.txt (Input File 1): The format of the genotype is the same as the input file for the 1st program, except that it has the genotype patterns are the distinct patterns found in a population.
(The first column are the population IDs followed by number separated by four colons. The subsequent columns are the genotype pattern represented as 1 and 0 to designate the presence and absence of the KIR gene, respectively.)
Raw_data_frequency.txt (Input File 2): This includes the frequencies of each genotype pattern in a population. The order of populations ID should match the input
(The first column are the population IDs followed by number separated by four colons. The subsequent column is the frequency of the genotype pattern in the population.)

5. The Formats and Interpretations of Output Files
A. KGP for Individual Population: The program generates one output file as output_analysis_genotype.xls. The first column enlists the distinct genotype pattern enclosed within 4 hyphens on each side. The second column called the count of IDs, are the number of individuals sharing the genotype pattern. The last column represents IDs sharing the same genotype patterns under the heading IDs matching the pattern.
(output_analysis_genotype.xls: The first column represents the distinct genotype patterns that are parsed by the program. The subsequent columns are the no. of individuals sharing the same genotype, followed by IDs.)
B. KGP for Population Comparison: The program generates a single output file as output_analysis_genotype.xls. The first column enlists the distinct genotype pattern enclosed within 4 hyphens on each side. The second column called the count of IDs, are the number of populations sharing the genotype pattern. The third column maps the genotype IDs as given in the input files. Fourth columns onwards are the frequencies mapped for each genotype across populations. The blank cell from fourth column onwards indicates the absence of the genotype in a population.
(output_analysis_genotype.xls: The first three columns resemble the output in previous output file. Fourth column onwards represents the frequency distribution of each genotype across populations.)

6. Contact Information
This program and related materials can be downloaded through the following website
http://kgp.sf.net

Bugs and Comments should be addressed to
Manpreet Singh
manpsing@gmail.com

Dr. Komal Manpreet Singh
Kalmanovitz Liver Immunology Laboratory
California Pacific Medical Center
2200 Webster Street, San Francisco, CA- 94115
singhkm@cpmcri.org

7. Program History
Date Program was released on 16th April, 2011.

8. Reference
KIR Genotypic Diversity Can Track Ancestries in Heterogeneous Populations: A Cautionary Note for Disease Association Studies (Submitted- Singh et al. 2011)

Source: README.txt, updated 2011-04-17

KIR Genotype program Files

Get an email when there's a new version of KIR Genotype program