Recent changes to Meta-watchdog

Meta-watchdog modified by Guobo Chen

Guobo Chen — Mon, 13 Apr 2015 05:19:32 -0000

--- v8
+++ v9
@@ -63,3 +63,10 @@
 gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --reg 0.9
 ~~~~

+
+If turn --verbose on, it will print out all pairwise regressions coefficient regardless of the regression coefficients
+
+~~~~
+gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --verbose
+~~~~
+

Meta-watchdog modified by Guobo Chen

Guobo Chen — Sun, 14 Sep 2014 23:57:10 -0000

--- v7
+++ v8
@@ -1,18 +1,9 @@
 ![Watchdog](https://sourceforge.net/p/gbchen/wiki/Meta-watchdog/attachment/watchdog.gif)

-**A procedure for detecting overlap individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\].**
+**A procedure for detecting overlap individuals between cohorts in meta-analysis using pseudo profile scores without sharing of the genotype data cannot be shared \[___beta version___\].**

 **Step 1: determine the number of profile scores**
-The central idea in detecting overlapping individuals without sharing genotypes is to use profile scores.  The first step is to determine the number of profile scores that are required to detect overlapping individuals. Assuming there are n1 and n2 individuals in cohort 1 and cohort 2, it will be n=n1*n2 comparisons to detect overlapping individuals between two cohorts.   
-There are two ways to detect the overlapping individuals: 1) chi-square test; 2) linear regression.
-
-When using the chi-square method, 
-
-~~~~~~
-gear mwpower --chisq 0.5 --alpha 0.01 --test n --out mw
-~~~~~~
-
-It will calculate the number of profile scores that controls the experiment-wise type I error rate at 0.01 given the cutoff for 0.5.
+The central idea in detecting overlapping individuals without sharing genotypes is to use pseudo profile scores (PPS). The first step is to determine the number of pseudo profile scores that are required to detect overlapping individuals. Assuming there are n1 and n2 individuals in cohort 1 and cohort 2, respectively, it will be needing n=n1*n2 comparisons to detect overlapping individuals between two cohorts.   

 When using the linear regression method,

@@ -20,15 +11,19 @@
 gear mwpower --reg 0.95 --alpha 0.01 --beta 0.05 --test n --out mw
 ~~~~~~

-It will calculate the number of profile scores that controls the experiment-wise type I error rate at 0.01 given the cutoff for 0.95.  The genetic interpretation for --b is that b of 0.45 detects first-degree relatives and b of 0.95 detects duplicated individuals.
+It calculates the number of pseudo profile scores that controls the experiment-wide type I error rate at 0.01 and type II error rate of 0.05 (power=1-type II error rate) given the cutoff for regression coefficient of 0.95.

-For example, if there are 1000 individuals in cohort 1 and cohort 2, respectively.  n=1000000, the requires number of scores will be K=41.57.  We take K=42 into the steps below.
+The genetic interpretation for --reg is that b of 0.45 (or 0.4) for detecting first-degree relatives and b for 0.95 for detects duplicated individuals across cohorts.

-The required sample size for scores will be saved in **mw.encode, which will be used in the next two steps**.
+For example, if there are 1000 individuals in cohort 1 and cohort 2, respectively.  n=1,000,000, the requires number of scores will be K=41.57.  We take K=42 into the steps below.
+
+The required number of the PPS will be saved in **mw.encode, which will be used in the next two steps**.
+
+In addition, it will give an estimate of the number of SNPs that is suggested to generate PPS.  According to our investigation, the number of SNPs is better to be 5~10 times of K.

-**Step 2: generate pseudo SNP effects**
-At this stage, the random seed (ecode) should be used, the number of scores, and the reference allele file should be provided.
+**Step 2: generate PPS given consensus SNPs**
+At this stage, mw.encode generated in the last step should be used, and the reference allele file should be provided.

 ~~~~~
 gear mwscore --bfile set1 --encode mw.encode --refallele refA.txt --out set1
@@ -38,14 +33,16 @@
 The reference allele file reads as below

 ~~~~~~~~
-rs1001 A
-rs2003 G
+rs1001 A 0.4
+rs2003 G 0.35
+...
 ~~~~~~~~
+The first column is the SNP names, and the second column is the reference alleles, and the third column is the reference allele frequencies.  The reference allele frequency can be calculated from one of the cohorts.  If the third column is not absent, the allele frequency will be calculated from each cohort.

 After this step, set1.profile and set2.profile will be generated.  The number of scores, which has already been determined in the first step, will be read from mw.encode.

 Notes:
-1) It is important for two cohorts to use the same encode.
+1) It is important for the cohorts in comparison to use the same encode.
 2) It is better to eliminate ambiguous loci which have A/T pairs or G/C pairs.
 3) However, gear will automatically take care the strand issue, such as A/G in set 1 but T/C in set 2.

@@ -60,16 +57,9 @@

 In addition, the user can also reset the parameters 

+It will use 0.9 rather than 0.95, as set in the first step, as the cutoff for the regression test.
+
 ~~~~
-gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --chisq 1
+gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --reg 0.9
 ~~~~

-It will use 1 rather than 0.5 as the cutoff for the chi-square test.  Similarly, 
-
-~~~~
-gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --reg 0.95
-~~~~
-
-It will use regression method instead.
-
-

Meta-watchdog modified by Guobo Chen

Guobo Chen — Thu, 06 Mar 2014 01:23:04 -0000

--- v6
+++ v7
@@ -9,7 +9,7 @@
 When using the chi-square method,

 ~~~~~~
-gear mwpower --chisq --alpha 0.01 --q 0.5 --test n --out mw
+gear mwpower --chisq 0.5 --alpha 0.01 --test n --out mw
 ~~~~~~

 It will calculate the number of profile scores that controls the experiment-wise type I error rate at 0.01 given the cutoff for 0.5.
@@ -17,7 +17,7 @@
 When using the linear regression method,

 ~~~~~~
-gear mwpower --alpha 0.01 --beta --b 0.95 --test n --out mw
+gear mwpower --reg 0.95 --alpha 0.01 --beta 0.05 --test n --out mw
 ~~~~~~

 It will calculate the number of profile scores that controls the experiment-wise type I error rate at 0.01 given the cutoff for 0.95.  The genetic interpretation for --b is that b of 0.45 detects first-degree relatives and b of 0.95 detects duplicated individuals.

Meta-watchdog modified by Guobo Chen

Guobo Chen — Wed, 05 Mar 2014 12:23:38 -0000

--- v5
+++ v6
@@ -3,55 +3,73 @@
 **A procedure for detecting overlap individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\].**

 **Step 1: determine the number of profile scores**
-If there are n1 individuals in cohort 1 and n2 individuals in cohort 2, the total comparisons will be n=n1*n2.  If want to control experimental-wise type I error rate at alpha=0.01, and type II error rate at beta=0.05 for heritability of 0.95, the command is as below 
+The central idea in detecting overlapping individuals without sharing genotypes is to use profile scores.  The first step is to determine the number of profile scores that are required to detect overlapping individuals. Assuming there are n1 and n2 individuals in cohort 1 and cohort 2, it will be n=n1*n2 comparisons to detect overlapping individuals between two cohorts.   
+There are two ways to detect the overlapping individuals: 1) chi-square test; 2) linear regression.
+
+When using the chi-square method, 

 ~~~~~~
-gear --dogpower --dogalpha 0.01 --dogbeta 0.05 --dogtest n --dogh2 0.95 --out Kpower
+gear mwpower --chisq --alpha 0.01 --q 0.5 --test n --out mw
 ~~~~~~

-The required sample size for scores will be saved in Kpwoer.watchdog.
+It will calculate the number of profile scores that controls the experiment-wise type I error rate at 0.01 given the cutoff for 0.5.
+
+When using the linear regression method,
+
+~~~~~~
+gear mwpower --alpha 0.01 --beta --b 0.95 --test n --out mw
+~~~~~~
+
+It will calculate the number of profile scores that controls the experiment-wise type I error rate at 0.01 given the cutoff for 0.95.  The genetic interpretation for --b is that b of 0.45 detects first-degree relatives and b of 0.95 detects duplicated individuals.

 For example, if there are 1000 individuals in cohort 1 and cohort 2, respectively.  n=1000000, the requires number of scores will be K=41.57.  We take K=42 into the steps below.

+The required sample size for scores will be saved in **mw.encode, which will be used in the next two steps**.

 **Step 2: generate pseudo SNP effects**
 At this stage, the random seed (ecode) should be used, the number of scores, and the reference allele file should be provided.

 ~~~~~
-gear enigma --ecode 2013 --ecol 42 --refallele refA.txt --out score42
+gear mwscore --bfile set1 --encode mw.encode --refallele refA.txt --out set1
+gear mwscore --bfile set2 --encode mw.encode --refallele refA.txt --out set2
 ~~~~~

 The reference allele file reads as below
+
+~~~~~~~~
 rs1001 A
 rs2003 G
+~~~~~~~~

-After this step, file score42.enigma will be generated.
+After this step, set1.profile and set2.profile will be generated.  The number of scores, which has already been determined in the first step, will be read from mw.encode.

 Notes:
-1) It is important for two cohorts to use the same ecode if pseudo values are generated separately in each data collecting center.  Here 2013 should be used if two cohorts generate their respective pseudo SNP effects locally.
+1) It is important for two cohorts to use the same encode.
 2) It is better to eliminate ambiguous loci which have A/T pairs or G/C pairs.
+3) However, gear will automatically take care the strand issue, such as A/G in set 1 but T/C in set 2.

-
-**Step 3: generate profile scores for cohorts**
+**Step 3: detect overlapping individuals**

 ~~~~
-gear profile --bfile cohort1 --score score42.enigma --no-score-header --out cohort1
-gear profile --bfile cohort2 --score score42.enigma --no-score-header --out cohort2
+gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap
 ~~~~
-Notes:
-1) gear will automatically flip the allele at those loci which are coded on different reference alleles, say the reference allele is A but C for rs1001 in cohort1 and cohort2, or flipped alleles the reference allele is G but C in cohort1 and cohort2.
-2) plink has a routine for generating profile scores, but it may give different profile scores for an individual, who is in cohort 1 and cohort 2, because plink does not correct for flipping alleles.  Otherwise the users sure the genotypes are coded on the same reference allele for each locus, plink profile score should not be used for this task.
+
+The parameters encapsulated in mw.encode will be written used to detect overlapping individuals, which if any will be written in to overlap.mw
+
+In addition, the user can also reset the parameters 
+
+~~~~
+gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --chisq 1
+~~~~
+
+It will use 1 rather than 0.5 as the cutoff for the chi-square test.  Similarly, 
+
+~~~~
+gear mw --set1 set1.profile --set2 set2.profile --encode mw.encode --out overlap --reg 0.95
+~~~~
+
+It will use regression method instead.

-
-**Step 4: detect overlapping individuals**
-
-~~~~
-gear --watchdog --set1 cohort1.profile --set2 cohort2.profile --dogcutoff 0.95 --out overlap
-~~~~
-
-The --dogcutoff value is the same as the --dogh2, which is set in step 1.
-The individual pairs having similarity over 0.95 will be printed into overlap.watchdog 
-

Meta-watchdog modified by Guobo Chen

Guobo Chen — Fri, 22 Nov 2013 06:23:15 -0000

--- v4
+++ v5
@@ -1,6 +1,6 @@
 ![Watchdog](https://sourceforge.net/p/gbchen/wiki/Meta-watchdog/attachment/watchdog.gif)

-**A procedure for detecting overlap individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\] https://sourceforge.net/p/gbchen/wiki/markdown_syntax_dialoga.**
+**A procedure for detecting overlap individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\].**

 **Step 1: determine the number of profile scores**
 If there are n1 individuals in cohort 1 and n2 individuals in cohort 2, the total comparisons will be n=n1*n2.  If want to control experimental-wise type I error rate at alpha=0.01, and type II error rate at beta=0.05 for heritability of 0.95, the command is as below

Meta-watchdog modified by Guobo Chen

Guobo Chen — Fri, 22 Nov 2013 01:56:50 -0000

--- v3
+++ v4
@@ -1,37 +1,57 @@
 ![Watchdog](https://sourceforge.net/p/gbchen/wiki/Meta-watchdog/attachment/watchdog.gif)

-**A procedure for identifying overlapping individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\].**
+**A procedure for detecting overlap individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\] https://sourceforge.net/p/gbchen/wiki/markdown_syntax_dialoga.**

 **Step 1: determine the number of profile scores**
 If there are n1 individuals in cohort 1 and n2 individuals in cohort 2, the total comparisons will be n=n1*n2.  If want to control experimental-wise type I error rate at alpha=0.01, and type II error rate at beta=0.05 for heritability of 0.95, the command is as below 
-gear --dogpower --dogalpha 0.01 --dogbeta 0.05 --dogtest n --dogh2 0.95

-Let's say the number of the required profile scores is 30.
+~~~~~~
+gear --dogpower --dogalpha 0.01 --dogbeta 0.05 --dogtest n --dogh2 0.95 --out Kpower
+~~~~~~
+
+The required sample size for scores will be saved in Kpwoer.watchdog.
+
+For example, if there are 1000 individuals in cohort 1 and cohort 2, respectively.  n=1000000, the requires number of scores will be K=41.57.  We take K=42 into the steps below.
+

 **Step 2: generate pseudo SNP effects**
 At this stage, the random seed (ecode) should be used, the number of scores, and the reference allele file should be provided.
-gear enigma --ecode 2013 --ecol 30 --refallele refA.txt --out score
+
+~~~~~
+gear enigma --ecode 2013 --ecol 42 --refallele refA.txt --out score42
+~~~~~

 The reference allele file reads as below
 rs1001 A
 rs2003 G

-After this step, a file score.enigma will be generated.
+After this step, file score42.enigma will be generated.

 Notes:
-1) It is important for two cohorts to use the same ecode.  Here 2013 should be used if two cohorts generate their respective pseudo SNP effects locally.
+1) It is important for two cohorts to use the same ecode if pseudo values are generated separately in each data collecting center.  Here 2013 should be used if two cohorts generate their respective pseudo SNP effects locally.
 2) It is better to eliminate ambiguous loci which have A/T pairs or G/C pairs.

+
 **Step 3: generate profile scores for cohorts**
-gear profile --bfile cohort1 --score score.enigma --no-score-header --out cohort1
-gear profile --bfile cohort2 --score score.enigma --no-score-header --out cohort2
+
+~~~~
+gear profile --bfile cohort1 --score score42.enigma --no-score-header --out cohort1
+gear profile --bfile cohort2 --score score42.enigma --no-score-header --out cohort2
+~~~~
 Notes:
-gear will automatically flip the allele at those loci which are coded on different reference alleles, say the reference allele is A but C for rs1001 in cohort1 and cohort2, or flipped alleles the reference allele is G but C in cohort1 and cohort2.
+1) gear will automatically flip the allele at those loci which are coded on different reference alleles, say the reference allele is A but C for rs1001 in cohort1 and cohort2, or flipped alleles the reference allele is G but C in cohort1 and cohort2.
+2) plink has a routine for generating profile scores, but it may give different profile scores for an individual, who is in cohort 1 and cohort 2, because plink does not correct for flipping alleles.  Otherwise the users sure the genotypes are coded on the same reference allele for each locus, plink profile score should not be used for this task.
+
+

 **Step 4: detect overlapping individuals**
+
+~~~~
 gear --watchdog --set1 cohort1.profile --set2 cohort2.profile --dogcutoff 0.95 --out overlap
+~~~~
+
 The --dogcutoff value is the same as the --dogh2, which is set in step 1.
 The individual pairs having similarity over 0.95 will be printed into overlap.watchdog

Meta-watchdog modified by Guobo Chen

Guobo Chen — Wed, 20 Nov 2013 05:18:01 -0000

--- v2
+++ v3
@@ -1,4 +1,37 @@
-
 ![Watchdog](https://sourceforge.net/p/gbchen/wiki/Meta-watchdog/attachment/watchdog.gif)

-**A procedure for identifying overlapping individuals between cohorts in meta-analysis when the genotype data cannot be shared.**
+**A procedure for identifying overlapping individuals between cohorts in meta-analysis when the genotype data cannot be shared \[___beta version___\].**
+
+**Step 1: determine the number of profile scores**
+If there are n1 individuals in cohort 1 and n2 individuals in cohort 2, the total comparisons will be n=n1*n2.  If want to control experimental-wise type I error rate at alpha=0.01, and type II error rate at beta=0.05 for heritability of 0.95, the command is as below 
+gear --dogpower --dogalpha 0.01 --dogbeta 0.05 --dogtest n --dogh2 0.95
+
+Let's say the number of the required profile scores is 30.
+
+
+**Step 2: generate pseudo SNP effects**
+At this stage, the random seed (ecode) should be used, the number of scores, and the reference allele file should be provided.
+gear enigma --ecode 2013 --ecol 30 --refallele refA.txt --out score
+
+The reference allele file reads as below
+rs1001 A
+rs2003 G
+
+After this step, a file score.enigma will be generated.
+
+Notes:
+1) It is important for two cohorts to use the same ecode.  Here 2013 should be used if two cohorts generate their respective pseudo SNP effects locally.
+2) It is better to eliminate ambiguous loci which have A/T pairs or G/C pairs.
+
+
+**Step 3: generate profile scores for cohorts**
+gear profile --bfile cohort1 --score score.enigma --no-score-header --out cohort1
+gear profile --bfile cohort2 --score score.enigma --no-score-header --out cohort2
+Notes:
+gear will automatically flip the allele at those loci which are coded on different reference alleles, say the reference allele is A but C for rs1001 in cohort1 and cohort2, or flipped alleles the reference allele is G but C in cohort1 and cohort2.
+
+**Step 4: detect overlapping individuals**
+gear --watchdog --set1 cohort1.profile --set2 cohort2.profile --dogcutoff 0.95 --out overlap
+The --dogcutoff value is the same as the --dogh2, which is set in step 1.
+The individual pairs having similarity over 0.95 will be printed into overlap.watchdog 
+

Meta-watchdog modified by Guobo Chen

Guobo Chen — Sun, 17 Nov 2013 11:00:46 -0000

--- v1
+++ v2
@@ -1,2 +1,4 @@
+
+![Watchdog](https://sourceforge.net/p/gbchen/wiki/Meta-watchdog/attachment/watchdog.gif)
+
 **A procedure for identifying overlapping individuals between cohorts in meta-analysis when the genotype data cannot be shared.**
-

Meta-watchdog modified by Guobo Chen

Guobo Chen — Sun, 17 Nov 2013 10:52:58 -0000

A procedure for identifying overlapping individuals between cohorts in meta-analysis when the genotype data cannot be shared.