Generate predicted eigenvectors
Subcommand:
propc
Projected PC will generate principal components (PC)/eigenvectors based on a reference population.
In this procedure the above issues will be solved and consequently makes prediction easier and avoid logistic such as strand issues.
It should be noted that GEAR will leave out monomorphic loci if there are any.
The format of the score pc loading file
SNP | RefAllele | pc1_score | pc2_score |
---|---|---|---|
SNPA | A | 1.95 | -0.5 |
SNPB | C | 2.04 | -0.7 |
SNPC | C | -0.98 | 0.34 |
SNPD | C | -0.24 | 3.1 |
By default, gear assumes that the score file contains a header line. If your pc score file doesn't contains the header line, you should switch on the --no-score-header option.
Options
--score
Specify the score file.
--batch
Often it is better to generate projected pc for the reference samples (such as HapMap) and the target samples together. It provides more information especially in illustration, as demonstrated below.
In batch.txt is the list of the roots of file names. For examples, for two files, dat1, dat2.
HM3_founders_noATGC_autosome_naive_imputed PUR_chr1_com
The files can be more than two. By default, only consensus markers across those files will be further matched to the scores. If the user wants to generate projected pc using as many as possible markers, --greedy should switched on. However, when --greedy option is on, the generated projected PC may not be matched up at the same space.
--score-gz
Specify the score file that is in gz format.
--no-score-header
When there is no title line for the score file, this option should be used.
--extract-score
Only SNPs included in both --extract-score and --score/--score-gz will be used for generating profile scores.
--remove-score
SNPs included in --removed-score will be used for generating profile scores.
--keep-atgc
It will keep AT/GC loci in the risk profile. However, the user should be sure whether the genotypes in both the reference panel and the target set are coded on the same reference allele/strand for each locus. By default, this option is off.
--auto-flip-off
When this option is on, a locus has flipped alleles in the testing set will not be matched.
As genotypes may be called on the complementary strands across genotyping platforms, gear will match them by flipping SNPs automatically. For example, the named SNP is "A" in the score file, but due to flipping the reported SNPs are "T/C" in the validation set. Under --auto-flip-off option is switched off, gear will flip "T/C" back to "A/G", and consequently match the score to the validation set. Of course, gear presumes the polymorphism is same across the discovery and the validation sets.
There are four possible schemes for matching a SNP between the discovery and the validation sets
Scheme | |
---|---|
The named score SNP matches the reference allele in the validation set | |
The named score SNP matches the alternative allele in the validation set | |
The named score SNP matches the flipped reference allele in the validation set | |
The named score SNP matches the flipped alternative allele in the validation set | |
Matches neither, then this locus will be discarded |
Notes
AT/GC loci will be left out if --keep-atgc is not on. Probably --keep-atgc should not be turned on otherwise the SNP coding on the same strand for each locus in both the discovery and the validation panels.
In the examples below, it shows how to generate projected PC for Puerto Rican cohort in 1000 Genome projects
Example 1 generating projected pc using batch solution
java -Xmx15G -jar /path/gear.jar probatch --batch batch.txt --score score.txt --out pur java -Xmx15G -jar /path/gear.jar probatch --batch batch.txt --score-gz score.txt.gz --out pur
Inside batch.txt is
HM3_founders_noATGC_autosome_naive_imputed PUR_chr1_com
The illustration of the projected pc for Puerto Ricans as well as HapMap reference is as below
.
The HapMap reference genotype data, the eigenvector scores can be found HERE. The demo is also included.
In addtion, the above procedure can also be implemented step by step if the user feels interested.
~~~~~~~~~~~~~~~~~
java -Xmx15G -jar /path/gear.jar comsnp --bfiles PUR HapMap --out score
java -Xmx15G -jar /path/gear.jar propc --bfile PUR --extract-score score.comsnp --score-gz HM3_SNP.blup20.gz --out Target
java -Xmx15G -jar /path/gear.jar propc --bfile HapMap --extract-score score.comsnp --score-gz HM3_SNP.blup20.gz --out HapMap_Ref
~~~~~~~~~~~~~~~~~~