GEAR: GEnetic Analysis Repository Wiki

GEnetic Analysis Repository

Status: Planning

Brought to you by: gc5k, zzxiang1985

LambdaD

Authors:

Attachments

1_2.png (19230 bytes)

BD_CD.png (18479 bytes)

LambdaMeta
This function is designed to evaluate the quality of summary statistics. It can calculate LambdaMeta for a pair of summary statistics on genetic effects; also can calculate Fst between cohorts on allele frequency.
LambdaMeta is a statistic invented to measure heterogeneity/homogeneity between cohorts. Under the null hypothesis, lambdameta=1 if the pair of cohorts are completely independent and no biological difference, such as different genetic architecture, or technical difference, such as different analysis protocols. When there are overlapping samples, LambdaMeta < 1; when there are bilogical/techncal difference, LambdaMeta >1. Often, LambdaMeta is slightly greater than zero.

Master command
lam

The meta-data should have these columns: "SNP", "BETA" (or "OR" for case-control studies), "SE", "A1", "A2", "CHR", "BP", and "P". The keywords are case-insensitive, and there is no requirement for the order of these columns. "A1" is the reference allele, and "A2" is the other allele. Other columns such as reference allele frequency, standard error of allele frequency can also be included.

SNP	CHR	BP	A1	A2	OR/Beta	SE	P	RAF	RAF_SE
snp1	1	100	G	T	1.05	0.03	0.03	0.10	0.35
snp2	2	200	T	A	0.95	0.033	0.12	0.3	0.03

The program will automatically eliminate ambiguous loci, such as A/T and G/C loci. In the example, the second row, which has ambiguous alleles will be eliminated.
Of note, all the summary statistic files should have the same column names, but their order can be different.

Options

--meta-batch <arg>
Specify the batch file that each line contains one meta file.
It looks like:
gwas1.txt
gwas2.txt
...

--qt-size <arg>
Specify the file in which each line contains the sample size for the file at the corresponding row in meta-batch.
It looks like:
100
200
...

--cc-size <arg>
For case-control studies, each line has two elements, the number of cases and the number of controls for each corresponding file.
It looks like
200 300
1000 800
...

--me <arg>
Specifies the number of markers that should be sampled for calculating LambdaMeta or --fst. By default it samples 30000 markers.

--key <args>
Although summary statistic files have all the columns required, their names may different. For example, "markerID" for "SNP", "effect" for "beta", "SE" for "SE", "Ref_Allele" for "A1", "Other_Allele" for "A2", "Chromosome" for "CHR", "POS" for "BP", "Pval" for "P", "RAF" for "RAF", and "freq_se" for "RAF_SE". Then this option should be used as
"--key markerID effect SE Ref_Allele Other_allele Chromosome POS Pval"
Pval will be used to calculate genomic inflation factor for each cohort.

--top <arg>
This option tells the program only the top X files listed in --meta-batch will be compared to all files. For example, if there are 10 summary statistic files included in --meta-batch, when "--top 1" is used, it only calculate lambdaMeta (of fst) between the first file and other files.
In practice, if only want to calculate fst between the cohort to 1KG European samples, the user can put the summary statistic file for 1KG as the first file in --meta-batch and use "--top 1" option.

--verbose
When this option is switched on, the detailed results for SNPs that are used for calculating lambdametw will be saved in "*.lam.gz" for each pair of cohorts.

Examples

java -jar gear.jar lam --meta-batch metalist.txt --qt-size qt-sample-size.txt --key SNP BETA SE A1 A2 CHR BP P --out test
java -jar gear.jar lam --meta-batch metalist.txt --qt-size cc-sample-size.txt --key SNP BETA SE A1 A2 CHR BP P --me 50000 --out test

java -jar gear.jar lam --meta-batch metalist.txt --qt-size qt-sample-size.txt --key SNP BETA SE A1 A2 CHR BP P --top 1 --out test

java -jar gear.jar lam --meta-batch metalist.txt --qt-size qt-sample-size.txt --key SNP BETA SE A1 A2 CHR BP P --verbose --out test

In ".lmat", please find the lambdameta for each pair of cohorts.
In ".gc", please find the lambda_gc for each cohort using the reported pvalues (first column) and the reported beta/se (second column). These two columns should be nearly identical. But if they are different, it may indicate come cohorts have uploaded summary results that have already been corrected for lambda_gc.

In "lamB"/"lambF" file, using "ChiExp" and "ChiObs" as x- and y-axis, respectively, a plot can be made for each pair of cohorts. The slop of the line should be very close to 1 if the quality of summary statistics is fairly good.

Overlapping samples deflates LambdaMeta

Heterogeneity inflates LambdaMeta
Fairly good