GEAR: GEnetic Analysis Repository Wiki

GEnetic Analysis Repository

Status: Planning

Brought to you by: gc5k, zzxiang1985

GMeta

Generalized GWAS Meta-analysis

When there is covariance structure between cohorts in GWAS Meta-analysis (GM), it may be necessary to adjust the covariance for GM such as commonly used inverse-variance method. The GWAS meta-analysis implemented here is a generalization of the one that in Metal.

Master command
gmeta/gm

--cm
Specify the correlation matrix file, which is one of the output from lam. If --cm is not used, the analysis is same as Metal.

--adj-overlap
Only the elements greater than 0 in the matrix, which is read in through --cm, will be used, and those less than 0 will be set to zero. It means only the overlapping samples will be adjusted.

--naive
When this option is turned on, no adjustment will be used. It equivalent to set the matrix to be the diagonal matrix.

--gc
Genomic control for cohorts which have genomic inflation factor greater than 1; otherwise, no genomic control adjustment will happen for cohorts.

--meta/--meta-gz
Specify the meta-data in text/gz format one by one.

--cc
Specify the numbers of cases and controls for each meta file which are for case-control studies.

--qt
Specify the sample size for each meta file which are for quantitative traits.

--full-snp-only
Only write SNPs that are available in all cohorts into results.

--key
Specify the key words for "SNP", "Beta" (for quantitative traits and case-control studies)/"OR" (for case-control studies only), "SE", "A1" (the reference allele), "A2", "CHR", "BP", and "P". "CHR", "BP", and "P" are optional. The keywords are case-insensitive, and there is no requirement for the order of these columns. "A1" is the reference allele, and "A2" is the other allele. If "A2" column is absent, the program will not eliminate ambiguous loci, such as A/T and G/C loci.

SNP	CHR	BP	A1	A2	OR/Beta	SE	P
snp1	1	100	G	T	1.05	0.03	0.03
snp2	2	200	T	A	0.95	0.033	0.12

--keep-atgc/--atgc
By default, AT/GC loci will be eliminated. When --keep-atgc/--atgc is used, the AT/GC loci will be used in meta-analysis. Be very careful in using this option, because summary stats are often not aligned on the same reference allele across cohorts.

--meta-batch <arg>
Specify the batch file that each line contains one meta file.
It should look like:
gwas1.txt
gwas2.txt
...

Or the files can be in gz format
--meta-gz-batch <arg>
Specify the batch file that each line contains one meta file in gz format.
It should look like:
gwas1.gz
gwas2.gz
...

And when the files are organized batch format, the sample size should be given in either --qt-batch for quantitative traits or --cc-batch for case-control studies.

--qt-size
Specify the file that each line contains the sample size for the file at the corresponding row in the --meta-batch/--meta-gz-batch.
For example:
100
200
...

--cc-size
Similar to --qt-size, but each line has two elements, which are the number of cases and the number of controls for the corresponding files.
For example:
200 300
1000 800
...

--keep-cohort
Specify the name of the files of the cohorts which are to be used in meta-analysis.

--remove-cohort
Specify the name of the files of the cohorts which are not to be used in meta-analysis.

Example

gear lam --meta m1.txt m2.txt --cc 100 130 150 200
gear lam --meta-gz m1.txt m2.txt --cc 100 130 150 200

Notes: the numbers of cases and controls for m1 are 100 and 130, respectively; the numbers of cases and controls for m2 are 150 and 200, respectively.

gear lam --meta m1.txt  m2.txt --qt 100 200
gear lam --meta-gz m1.gz  m2.gz --qt 100 200

Notes: the numbers of the sample size for m1 and m2 are 100 and 200, respectively.

Batch format
When there are many GWAS summary statistic files, it is much easier to write a batch file, which contains meta files.

--meta-batch <arg>
Specify the batch file that each line contains one meta file.
It should look like:
gwas1.txt
gwas2.txt
...

Or the files can be in gz format
--meta-gz-batch <arg>
Specify the batch file that each line contains one meta file in gz format.
It should look like:
gwas1.gz
gwas2.gz
...

And when the files are organized batch format, the sample size should be given in either --qt-batch for quantitative traits or --cc-batch for case-control studies.

--qt-batch
Specify the file that each line contains the sample size for the file at the corresponding row in the --meta-batch/--meta-gz-batch.
For example:
100
200
...

--cc-batch
Similar to --qt-batch, but each line has two elements, which are the number of cases and the number of controls for the corresponding files.
For example:
200 300
1000 800
...

--verbose/--verbose-gz
Once verbose/verbose-gz option is switched on, the result for all selected SNPs will be saved in text/gz format in ".lam". If this option is not switched on, only 100 fingerprint SNPs, whose lambdaD is at 1,2,3,..,100 percentile of the sequence, will be saved in text format in ".lam".

Two result files will be found. One is ".lmat", which has the covariance and estimated overlapping samples; the other is ".lam", which saves the lambdaD for SNPs.

Example

Meta-analysis without adjustment for correlation

gear gm --meta-batch metalist.txt --cc-size cc-sample-size.txt --out test
~~~~~~~~

Meta-analysis without adjustment for correlation but with gc correction

gear gm --meta-batch metalist.txt --cc-size cc-sample-size.txt --gc --out test
~~~~~~~~

Meta-analysis with adjustment for correlation

gear lam --meta-batch metalist.txt --qt-size cc-sample-size.txt --cm test.cm --out test

Meta-analysis with adjustment for correlation and gc correction

gear lam --meta-batch metalist.txt --qt-size cc-sample-size.txt --cm test.cm --gc --out test