The generation of Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) was originated from the observation that high proportion of cis-acting expression quantiative trait loci (cis-eQTLs) emerged in GWAS SNPs and the underexplored status of indel cis-eQTLs for GWAS. We believe that the integration of cis-eQTLs, especially indel cis-eQTLs, with candidate disease-associated variants generated from GWAS could facilitate the identification of causal genes or disease mechanisms. On the other hand, the biological information encoded in human genome, such as regulatory features from the Ensembl regulatory database, will be conductive in pinpointing functional causal variant(s) for the disease association. Thus, we generated a pipeline, PExFInS to integrate our LCL cis-eQTL data, along with publicly available cis-eQTL datasets derived from lung tissues, human monocytes, dendritic cells, blood, and LCLs.
PExFInS can be utilized to search for high linkage disequilibrium (LD) variants, including indels, with user-queried SNPs using the deep sequence data from the 1000 Genomes (1KG) project. PExFInS can annotate these high LD variants with ANNOVAR. In addition, PExFInS can map these variants to Ensembl Regulatory Regions. Furthermore, IndelDplot can carry out cis-acting expression quantitative trait loci (cis-eQTL) analysis with genome-wide expression dataset as well as the corresponding next generation sequencing dense genotyping of lymphoblastoid cell lines (LCLs) provided by the 1000 Genomes Project among 432 individuals across six population groups. PExFInS can also map these high LD variants to known Ensembl regulatory features.
PExFInS is written in SAS statistical language. A total of seven SAS macros were created to perform LD analysis, eQTL analysis, and regulatory feature mapping. These macros can work along or in combination to solve complicated tasks. With the input dbSNP rs ID(s) or chromosome range, PExFInS will output all high LD variants as well as the annotation data and eQTL data. Additionally, PExFInS can provide genotyping data required by Haploview to draw the LD plot for input variants and LD derived SNPs and indels. For example, if users are interested in LD pattern among interested variants and previously published GWAS SNPs, PExFInS can retrieve genotypes of all these variants and examine LD pattern and haplotype blocks among these variants. Furthermore, PExFInS can utilize a powerful annotation tool ANNOVAR to annotate the input SNPs and their high LD variants with annotation databases from the UCSC Genome Browser and map these variants to RefSeq genes, conserved regions, transcription factor binding sites, and DNase I hypertensive sites. The regulatory features of these variants from Ensembl regulatory Build can also be mapped, which were further combined with ANNOVAR annotations. Importantly, all the functional information, especially cis-eQTL information and Ensembl regulatory features, can be included in a customized track for visualization in UCSC Genome Browser.