Menu

IndelLdplot

chengzhongshan

Given the high proportion of cis-acting expression quantiative trait loci (cis-eQTLs) in GWAS SNPs and the underexplored indel cis-eQTLs for GWAS, integration of cis-eQTLs, especially indel cis-eQTLs, with candidate disease-associated variants generated from GWAS could facilitate the identification of causal genes or disease mechanisms. On the other hand, the biological information encoded in human genome, such as regulatory features from the Ensembl regulatory database, will be conductive in pinpointing functional causal variant(s) for the disease association. Thus, we generated a pipeline, IndelLDplot to integrate our LCL cis-eQTL data, along with publicly available cis-eQTL datasets derived from lung tissues, human monocytes, dendritic cells, blood, and LCLs.
IndelLDplot can be utilized to search for high linkage disequilibrium (LD) variants, including indels, with user-queried SNPs using the deep sequence data from the 1000 Genomes (1KG) project; then map these high LD variants to cis-eQTLs and known Ensembl regulatory features, by which the potential functional relevance of user-queried SNPs could be revealed. This strategy is illustrated in Figure 4a. The high LD variant 3 localized in an Ensembl regulatory region and also being a LCL cis-eQTLs can be prioritized as a potentially functional variant which can tag the GWAS SNP of interest and be brought forward for replication in another cohort. Additionally, the parent gene of variant 3 can be applied for functional validation using molecular biology or cellular biology techniques.
IndelLDplot is written in SAS statistical language (Figure 4b). A total of seven SAS macros were created to perform LD analysis, eQTL analysis, and regulatory feature mapping. These macros can work along or in combination to solve complicated tasks. With the input dbSNP rs ID(s) or chromosome range, IndelLDplot will output all high LD variants as well as the annotation data and eQTL data. Additionally, IndelLDplot can provide genotyping data required by Haploview to draw the LD plot for input variants and LD derived SNPs and indels. For example, if users are interested in LD pattern among interested variants and previously published GWAS SNPs, IndelLDplot can retrieve genotypes of all these variants and examine LD pattern and haplotype blocks among these variants. Furthermore, IndelLDplot can utilize a powerful annotation tool ANNOVAR33 to annotate the input SNPs and their high LD variants with annotation databases from the UCSC Genome Browser and map these variants to RefSeq genes, conserved regions, transcription factor binding sites, and DNase I hypertensive sites. The regulatory features of these variants from Ensembl regulatory Build can also be mapped, which were further combined with ANNOVAR annotations. Importantly, all the functional information, especially cis-eQTL information and Ensembl regulatory features, can be included in a customized track for visualization in UCSC Genome Browser.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.