Menu

Home

liuyang wang

What's CPAG?

CPAG (Cross-Phenotype Analysis of GWAS) can estimate disease and trait similarity, identify informative disease clusters, and carry out pathway enrichment analysis. It also provides visualization of these
results in the form of hierarchical clustering trees, heatmaps, and networks.

Download CPAG

CPAG can be downloaded from here. Both Windows and Mac version are provided, and current version is 0.1.

Getting started

Notes

*New: The CPAGv0.2 for Mac OS has been updated to provide pleiotropic gene and SNPs lists and was compiled by Python 3.3 (~100 fold improvement in performance over v0.1). Similar improvements are coming soon for Windows.

1. For windows, you can run CPAG from the windows prompt. You can either 1) double click windows_prompt.bat in this folder or 2) click Start -> Search "Command Prompt".

2. For Mac system, CPAGv0.1_mac was compiled using Mac OS mavericks 10.9 system.

3. CPAG.exe or CPAGv0.1_mac must be placed into same folder with cpaglib. For Mac, please replace the "CPAG.exe" with "./CPAGv0.1_mac" in the following examples and run in the Terminal.

4. Type "CPAG.exe -h" for help, or check running examples using

    CPAG.exe --show-example

Quick examples

1. Run user defined SNPlist against NHGRI GWAS Catalog (Sep4,2013)

Note: User defined queries will run much faster (likely < 10min) because it computes pairwise similarity just for the user-definied trait and each trait in the catalog.

    CPAG.exe --cpag-diy --diy-gwasfile test/example.txt

2. Run user defined SNPlist against Autoimmune group of NHGRI GWAS Catalog (Sep4,2013)

    CPAG.exe --cpag-diy --diy-gwasfile test/example.txt --disgrp Autoimmune

3. Run user defined SNPlist against NHGRI GWAS Catalog (Sep4,2013), only including SNPs in the user-defined SNPlist with p values < 0.00001

    CPAG.exe --cpag-diy --diy-gwasfile test/example.txt --diy-gwas-pcut 0.00001

4. Run user defined SNPlist against Autoimmune group of NHGRI GWAS Catalog (Sep4,2013), only including SNPs in the user-defined SNPlist with p values < 0.00001

    CPAG.exe --cpag-diy --diy-gwasfile test/example.txt --diy-gwas-pcut 0.00001 --disgrp Autoimmune

5. Plot Network degree

Note: This function uses the first two columns of a "mainout" file to plot the network along with a distribution of the number of associated traits for each trait within the network.

    CPAG.exe --plot-network-degree test/example.networkdegree.infile

6. Run NHGRI GWAS Catalog only (Sep4,2013), only including SNPs with p values < 0.0000001

    CPAG.exe --cpag-all --cpag-gfile GWASCatalog2013Sep4_p1e-7.txt --cpag-gfile-pcut 0.0000001

This may take several hours to run depending on the power of your computer, because it computes all pairwise similarity. The output from running this will be the same as Supplemental Figure 1 and Supplemental Table 2 (Wang et al.), but can be run with different p-value thresholds and newer versions of the NHGRI GWAS Catalog.

Input files

An example of user-defined SNP file can be found test/example.txt, which contain 3 columns of chromosome, SNP rsID and p values. The column delimiter is 'tab' not 'space'.

Format for the user-defined SNPlist requires 3 columns. However, only the SNP rsIDs is absolutely required for CPAG (column 1 could be filled with "dummy" values and the p-values in column 3 are required only to take advantage of CPAG's p-value filtering options).

Chr rsID P
1 rs1234 0.0000001
2 rs1534 0.0001
2 rs2145 0.0001

Output files

Output files should be saved in a folder starting with "CPAGresults"

Output for user-defined SNPlist

1. a "mainout" file is a tab-delimited file listing all traits with similarity to the user-defined trait. For each trait the observed number of overlapping SNPs, the number of overlapping SNPs expected by chance, various p-values, and the Chao-Sorensen similarity index value is provided.

2. a "ChaoSorensenOneRowHeatmap_SortByPval" file is a pdf heatmap of color-coded similarity values sorted from low to high pvalue. A red line demarcates the threshold for statistical significance (p<0.05 after Bonferroni correction).

3. a "merge.ChaoSorensen" pdf file that places the user-defined trait into the NHGRI GWAS Catalog heirarchical tree and heatmap. The user-defined trait is labeled as Mysnplist_Mysnplist​.

4. a "mainout_Network" file is a PNG file with a network showing traits with any similarity to the user-defined trait and connectivity based on similarity of those traits.

5. a folder titled "gsea" which contains enrichment of pathways based on overlapping SNPs using the C2 MSigDB (GSEA Broad Institute) and IFN-targets from (Liu et al. 2012).

Output for NHGRI GWAS Catalog

1. a "mainout" file is a tab-delimited file listing all traits with similarity to the user-defined trait. For each trait the observed number of overlapping SNPs, the number of overlapping SNPs expected by chance, various p-values, and the Chao-Sorensen similarity index value is provided.

2. a "ChaoSorensen" pdf file that displays then entire NHGRI GWAS Catalog heirarchical tree and heatmap.

3. a folder titled "gsea" which contains enrichment of pathways based on overlapping SNPs using the C2 MSigDB (GSEA Broad Institute) and IFN-targets from (Liu et al. 2012).

This analysis has already been run the Sep4,2013 NHGRI GWAS Catalog and is included.

Future updates of the software will include automatic updating with the most current version of the NHGRI GWAS Catalog and the ability to query by gene lists.


MongoDB Logo MongoDB