<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Home</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>Recent changes to Home</description><atom:link href="https://sourceforge.net/p/phasedel/wiki/Home/feed" rel="self"/><language>en</language><lastBuildDate>Tue, 22 Feb 2022 03:06:06 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/phasedel/wiki/Home/feed" rel="self" type="application/rss+xml"/><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v20
+++ v21
@@ -81,7 +81,7 @@
 =======

 The 'demo' folder includes input data files and running commands for five steps to achieve final output of annotated somatic deletion calls.
-Please download the following BAMs and locate them under the demo BAM folder (demo/input/bam/) before running. These are ~8.4 Gb BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
+Please download following BAMs and locate them under the demo BAM folder (demo/input/bam/) before running. These are ~8.4 Gb BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
 [Hunamp_bulk.chr17.bam](https://nas.aleelab.net/sharing/RqXZOwEAm)
 [Hunamp_bulk.chr17.bam.bai](https://nas.aleelab.net/sharing/6iY44d0Ss)
 [IL-11.chr17.bam](https://nas.aleelab.net/sharing/qLF2k6HSr)
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Tue, 22 Feb 2022 03:06:06 -0000</pubDate><guid>https://sourceforge.netf9690febe37708075cae8ea1c40d9a9580f8d908</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v19
+++ v20
@@ -81,7 +81,7 @@
 =======

 The 'demo' folder includes input data files and running commands for five steps to achieve final output of annotated somatic deletion calls.
-If the downloaded package does not include demo BAM files, please download the following BAMs and locate them under the demo BAM folder (demo/input/bam/). These are BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
+Please download the following BAMs and locate them under the demo BAM folder (demo/input/bam/) before running. These are ~8.4 Gb BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
 [Hunamp_bulk.chr17.bam](https://nas.aleelab.net/sharing/RqXZOwEAm)
 [Hunamp_bulk.chr17.bam.bai](https://nas.aleelab.net/sharing/6iY44d0Ss)
 [IL-11.chr17.bam](https://nas.aleelab.net/sharing/qLF2k6HSr)
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Tue, 22 Feb 2022 03:04:07 -0000</pubDate><guid>https://sourceforge.net62223fcedb0822701f54903d0223d717d1b05c50</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v18
+++ v19
@@ -81,7 +81,11 @@
 =======

 The 'demo' folder includes input data files and running commands for five steps to achieve final output of annotated somatic deletion calls.
-If the downloaded package does not include demo BAM files, please download the following BAMs and locate them under the demo folder (demo/bam/). These are BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
+If the downloaded package does not include demo BAM files, please download the following BAMs and locate them under the demo BAM folder (demo/input/bam/). These are BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
+[Hunamp_bulk.chr17.bam](https://nas.aleelab.net/sharing/RqXZOwEAm)
+[Hunamp_bulk.chr17.bam.bai](https://nas.aleelab.net/sharing/6iY44d0Ss)
+[IL-11.chr17.bam](https://nas.aleelab.net/sharing/qLF2k6HSr)
+[IL-11.chr17.bam.bai](https://nas.aleelab.net/sharing/sXyCYWbbE)

 Execute **command.sh** in each subdirectory under the demo folder to run all steps of PhaseDel sequentially. Followings are the actual commands written in command.sh to run each step of PhaseDel.  Replace /path/to/... with the correct paths for the demo data before running. Each step runs in less than a minute with a single core except step 3 and 4, which takes about an hour and 10 minutes, respectively.

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Tue, 22 Feb 2022 02:57:42 -0000</pubDate><guid>https://sourceforge.net14d23045dd835090f92866f9c0e5e4f676cda911</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v17
+++ v18
@@ -163,7 +163,7 @@

 Analysis modules and arguments
 =======
-##Analysis modules##
+###Analysis modules###
 PhaseDel have six divided modules to complete the entire analysis. Use -m option to select the analysis module. You need to run all the modules in the provided order to perform a complete analysis.  Two modules—**GenerateHetSNPList** and **MakeDuplicateList**—should be applied for a set of samples, and other modules—**MergeDel**, **LinkageAnalysis**, **CallSomatic**, and **AnalyzeMechanism**—should be applied per cell.

 Modules                  | Description
@@ -176,7 +176,7 @@
 **AnalyzeMechanism**   | Analyze underlying mechanisms for selected somatic deletion candidates.

 &lt;br/&gt;
-##GenerateHetSNPAndIndelList##
+###GenerateHetSNPAndIndelList###
 **GenerateHetSNPAndIndelList** module takes a genotyped GVCF file that includes multiple samples (single cells and/or matched bulk). This module generates two types of output files: (1) germline heterozygous common SNP list and (2) separated insertion and deletion call sets for each sample. Since both somatic and germline mutations would be detected by GATK from single cell data, this module generates (1) only from matched bulk samples. To indicate which samples are from matched bulk, the module accepts a file that contains SM tags list of bulk samples. Note that the variants in the inuput GVCF should be annotated by dbSNP (indicated by ID column, e.g. rs75454623), to select common SNPs from the entire calls.

 #####Mandatory arguments:#####
@@ -196,7 +196,7 @@
 3. **indel/ID.ins.call**: GATK deletion calls for each sample (required for **AnalyzeMechanism** module)
 &lt;br/&gt;

-##MergeDel##
+###MergeDel###
 **MergeDel** module takes two deletion call files from GATK and Delly from a given sample and merge them to make an integrated deletion list, an initial deletion call set for linkage analysis. For GATK, deletion calls generated by GenerateHetSNPAndIndelList module should be provided (indel/ID.del.call). For Delly, a VCF file generated by Delly deletion calling should be provided. The next step—**LinkageAnalysis**—will perform phasing analysis to discriminate genuine deletions from these merged calls.

 #####Mandatory arguments:#####
@@ -211,7 +211,7 @@
 1. **Merged deletion candidates (filename given by a user)**: Merged deletion candidate list (required for **LinkageAnalysis** module)
 &lt;br/&gt;

-##LinkageAnalysis##
+###LinkageAnalysis###
 **LinkageAnalysis** module takes an initial deletion candidate list and a bulk het. SNP list for a given cell, and performs phasing analysis to discriminate genuine deletions from whole-genome amplification artifacts. The output is the filtered list of true deletion candidates including both somatic and germline deletions. The next step—**CallSomatic**—will select high-confidence somatic candidates from this output.

 #####Mandatory arguments:#####
@@ -247,7 +247,7 @@
 3. **ID.filtered.phased.del.list**: Selected genuine deletion candidates including both somatic and germline deletions (required for **CallSomatic** module)
 &lt;br/&gt;

-##MakeDuplicateList (optional)##
+###MakeDuplicateList (optional)###
 **MakeDuplicateList** module takes a set of phaseable  deletion breakpoints (ID.phased.del.list files generated from LinkageAnalysis module) from different individuals and selects deletion candidates observed in more than one individual with the exact same breakpoints. These duplicated candidates are likely to be germline deletions or systematic artifacts, thus will be filtered out from the final somatic deletion candidates during the next step—**CallSomatic**. If all single cell data are from the same individual, you don't need to run this module—duplicated candidates can be clonal mutations and thus should not be filtered out.

 #####Mandatory arguments:#####
@@ -260,7 +260,7 @@
 1. **Duplicated deletion candidates (filename given by a user)**: Possible germline/artifactual candidate list that should be filtered out for somatic deletion calling (used for **CallSomatic** module)
 &lt;br/&gt;

-##CallSomatic##
+###CallSomatic###
 **CallSomatic** module takes a selected deletion candidate list generated from **LinkageAnalysis** module with the BAM files for a given cell and a matched bulk tissue, and discriminate high-confident somatic deletions. This module also estimates the FDR for a given cell based on the estimated level of amplification bias, and calculates genome-wide somatic deletion rate using two-component model. The output files include the estimated deletion rate, fitted model graphs for a given cell, and the final list of high-confidence somatic deletion candidates. The next step—**AnalyzeMechanism**—will annotate the predicted undelyring mechanisms for the final deletion candidates.

 #####Mandatory arguments:#####
@@ -295,7 +295,7 @@
 2. **ID.selected.somatic.del.candidates.cc.controlled.txt**: A final candidate list for high-confidence somatic deletions (required for **AnalyzeMechanism** module)
 &lt;br/&gt;

-##AnalyzeMechanism##
+###AnalyzeMechanism###
 **AnalyzeMechanism** module takes a final somatic deletion list generated from **CallSomatic** module and predicts their underlying mechanisms for deletion formation. The prediction is made following criteria in a previous cancer study [Yang et al.](https://shorturl.at/ntFI2) The output is annotated list of somatic deletions with their predicted underlying mechanisms.

 #####Mandatory arguments:#####
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Mon, 21 Feb 2022 09:44:42 -0000</pubDate><guid>https://sourceforge.netba9f736e942c62e0265ee3d4f1cc98c23691b7ec</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v16
+++ v17
@@ -12,13 +12,15 @@

 Prerequisite softwares
 =======
-**Java** 
+The followings include tested versions in parenthesis when applicable; later versions are likely to still work. These instructions are designed to enable use of PhaseDel on human sequencing data aligned to GRCh37 (hg19).
+
+**Java (version 1.8) ** 

 *  Java version 1.8 or higher
 *  HTSJDK library (included in the package)

-**R packages** 
- [Rscript](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html)
+**R packages (version 4.0.1) ** 
+[rstan](https://mc-stan.org/users/interfaces/rstan)
 [plyr](https://cran.r-project.org/web/packages/plyr/index.html)
 [tidyr](https://tidyr.tidyverse.org/)
 [dplyr](https://dplyr.tidyverse.org/)
@@ -41,7 +43,9 @@
 *  **lib/**
     : A folder containing other JAR libraries for PhaseDel
 *  **data/**
-    : A folder containing R codes and  data files for analysis and annotation
+    : A folder containing R codes and  data files for phasing analysis and annotation
+*  **demo/**
+    : A folder containing scripts and data files for PhaseDel demo running
 &lt;br/&gt;

 Running PhaseDel
@@ -73,6 +77,89 @@

 &lt;br/&gt;

+Running the PhaseDel demo
+=======
+
+The 'demo' folder includes input data files and running commands for five steps to achieve final output of annotated somatic deletion calls.
+If the downloaded package does not include demo BAM files, please download the following BAMs and locate them under the demo folder (demo/bam/). These are BAMs for chromosome 17 of the public MDA data of single fibroblasts (Dong et al., Nat Methods 2017).
+
+Execute **command.sh** in each subdirectory under the demo folder to run all steps of PhaseDel sequentially. Followings are the actual commands written in command.sh to run each step of PhaseDel.  Replace /path/to/... with the correct paths for the demo data before running. Each step runs in less than a minute with a single core except step 3 and 4, which takes about an hour and 10 minutes, respectively.
+
+##### step1_GenerateHetSNPAndIndelList (&amp;lt;1 min) #####
+```
+java -jar /path/to/PhaseDel.jar -m GenerateHetSNPAndIndelList \
+   -v ../input/gatk/dong_et_al.NatMethod.2017.HC_G.g.chr17.vcf.gz \
+   -b ../input/gatk/bulk.sampleID \
+   -o ./
+```
+
+##### step2_MergeDel (&amp;lt;1 min) #####
+```
+# bulk
+java -jar /path/to/PhaseDel.jar -m MergeDel \
+   -v ../step1_GenerateHetSNPAndIndelList/hetSNP_bulk/Hunamp_bulk.bulk.het.snps.vcf \
+   -G ../step1_GenerateHetSNPAndIndelList/indel/Hunamp_bulk.del.call \
+   -D ../input/delly/Hunamp_bulk.delly_del.chr17.vcf.gz \
+   -d ./Hunamp_bulk.merged.del
+
+# single cell
+java -jar /path/to/PhaseDel.jar -m MergeDel \
+   -v ../step1_GenerateHetSNPAndIndelList/hetSNP_bulk/Hunamp_bulk.bulk.het.snps.vcf \
+   -G ../step1_GenerateHetSNPAndIndelList/indel/IL-11.del.call \
+   -D ../input/delly/IL-11.delly_del.chr17.vcf.gz \
+   -d ./IL-11.merged.del
+```
+
+##### step3_LinkageAnalysis (~1 hour) #####
+```
+REF=/path/to/hs37d5/genome.fa
+
+# bulk
+java -jar /path/to/PhaseDel.jar -m LinkageAnalysis \
+   -r $REF \
+   -b ../input/bam/Hunamp_bulk.chr17.bam \
+   -v ../step1_GenerateHetSNPAndIndelList/hetSNP_bulk/Hunamp_bulk.bulk.het.snps.vcf \
+   -d ../step2_MergeDel/Hunamp_bulk.merged.del \
+   -o ./
+
+# single cell
+java -jar /path/to/PhaseDel.jar -m LinkageAnalysis \
+   -r $REF \
+   -b ../input/bam/IL-11.chr17.bam \
+   -v ../step1_GenerateHetSNPAndIndelList/hetSNP_bulk/Hunamp_bulk.bulk.het.snps.vcf \
+   -d ../step2_MergeDel/IL-11.merged.del \
+   -o ./
+```
+
+##### step4_CallSomatic (~10 min) #####
+```
+REF=/path/to/hs37d5/genome.fa
+RSCRIPT=/path/to/Rscript
+
+java -jar /path/to/PhaseDel.jar -m CallSomatic \
+   -r $REF \
+   -v ../step1_GenerateHetSNPAndIndelList/hetSNP_bulk/Hunamp_bulk.bulk.het.snps.vcf \
+   -b ../input/bam/IL-11.chr17.bam \
+   -B ../input/bam/Hunamp_bulk.chr17.bam \
+   -p ../step3_LinkageAnalysis/IL-11.chr17.filtered.phased.del.list \
+   -P ../step3_LinkageAnalysis/Hunamp_bulk.chr17.filtered.phased.del.list \
+   -R $RSCRIPT \
+   -o ./
+```
+
+##### step5_AnalyzeMechanism (&amp;lt;1 min) #####
+```
+REF=/path/to/hs37d5/genome.fa
+
+java -jar /path/to/PhaseDel.jar -m AnalyzeMechanism \
+   -r $REF \
+   -d ../step4_CallSomatic/IL-11.chr17.selected.somatic.del.candidates.cc.controlled.txt \
+   -i ../step1_GenerateHetSNPAndIndelList/indel/IL-11.ins.call \
+   -o ./
+```
+
+The final output files are IL-11.chr17.cc.estimation.FDR.added.txt (step4, estimated rates) and  IL-11.chr17.selected.somatic.del.candidates.cc.controlled.mec.annotated.txt (step5, final deletion candidates). See next section for more details on configuration options and input parameters.
+&lt;br/&gt;

 Analysis modules and arguments
 =======
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Mon, 21 Feb 2022 09:32:18 -0000</pubDate><guid>https://sourceforge.netd3ce5fe680edf31d285973433b3c7d0b324092ed</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v15
+++ v16
@@ -87,8 +87,8 @@
 **MakeDuplicateList**   | Make a list of duplicated phased candidates between different individuals. Duplicates will be considered to be possible germline variants/systematic artifacts and filtered out from the somatic candidate list.
 **CallSomatic**   | Discriminate somatic deletions from the phased candidates and estimate genome-wide somatic deletion rate for a given cell.
 **AnalyzeMechanism**   | Analyze underlying mechanisms for selected somatic deletion candidates.
-&lt;br/&gt;
-
+
+&lt;br/&gt;
 ##GenerateHetSNPAndIndelList##
 **GenerateHetSNPAndIndelList** module takes a genotyped GVCF file that includes multiple samples (single cells and/or matched bulk). This module generates two types of output files: (1) germline heterozygous common SNP list and (2) separated insertion and deletion call sets for each sample. Since both somatic and germline mutations would be detected by GATK from single cell data, this module generates (1) only from matched bulk samples. To indicate which samples are from matched bulk, the module accepts a file that contains SM tags list of bulk samples. Note that the variants in the inuput GVCF should be annotated by dbSNP (indicated by ID column, e.g. rs75454623), to select common SNPs from the entire calls.

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Sun, 07 Feb 2021 06:37:39 -0000</pubDate><guid>https://sourceforge.netcd6709378fb0d7d7c3271abb0b1ac76b2b1daa85</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v14
+++ v15
@@ -118,7 +118,7 @@
 **Germline het. SNP list**   | -v, --hetSNPList | Germline heterozygous SNP list for a given individual (generated from GenerateHetSNPAndIndelList module).
 **GATK-derived deletion call** | -G, --gatk_del | GATK-derived deletion calls for a given sample (generated by GenerateHetSNPAndIndelList module, indel/ID.del.call file)
 **Delly deletion call** | -D, --delly_del | Delly VCF file for deletion call  for a given sample (.vcf)
-**Merged deletion candidates**   | -d, --mergedDel | Merged deletion candidate list (Path for output file)
+**Output file for merged deletion candidates**   | -d, --mergedDel | Merged deletion candidate list (Path for output file)

 #####Output files:#####
 1. **Merged deletion candidates (filename given by a user)**: Merged deletion candidate list (required for **LinkageAnalysis** module)
@@ -166,18 +166,11 @@
 #####Mandatory arguments:#####
 Input                  | Option    | Description
 ---------------------- | --------- | -----------
-**A list of files for phaseable  deletion breakpoints** | -l, --phasedCandidateList | A tab-delimited file with the information of phaseable deletion list and its individual&lt;br/&gt;e.g. #phased_deletion_list individual_ID&lt;br/&gt;   1459_PFC_01.phased.del.list 1459&lt;br/&gt;   1459_PFC_02.phased.del.list&amp;amp;emsp1459&lt;br/&gt;   1278_PFC_01.phased.del.list        1278&lt;br/&gt;
-**Output directory**   | -o, --output_dir | Path for output directory. This module will generate two subdirectories (hetSNP_bulk and indel) for outputs (1) and (2), respectively.
-
-#####Optional arguments:#####
-Option  | Default Value                       | Description
-------- | ----------------------------------- | ------------------
--b, --bulk_samples |  | A file containing SM tag list for bulk samples (line-separated). SM tags should match to the sample ID in the GVCF file.
-
-#####Output files:#####
-1. **hetSNP_bulk/ID.bulk.het.snps.vcf**: Germline heterozygous common SNP list for a given bulk sample indicated by the provided SM tag (required for multiple follow-up modules)
-2. **indel/ID.del.call**: GATK deletion calls for each sample (required for **MergeDel** module)
-3. **indel/ID.ins.call**: GATK deletion calls for each sample (required for **AnalyzeMechanism** module)
+**A list of files for phaseable  deletion breakpoints** | -l, --phasedCandidateList | A tab-delimited file with the information of phaseable deletion list and its individual&lt;br/&gt; e.g. #phased_deletion_list individual_ID&lt;br/&gt;  1459_PFC_01.phased.del.list 1459&lt;br/&gt;  1459_PFC_02.phased.del.list 1459&lt;br/&gt;  1278_PFC_01.phased.del.list 1278&lt;br/&gt;
+**Output file for duplicated deletion list**   | -u, --duplicateList | Output file for a list of duplicated phased candidates between different individuals
+
+#####Output files:#####
+1. **Duplicated deletion candidates (filename given by a user)**: Possible germline/artifactual candidate list that should be filtered out for somatic deletion calling (used for **CallSomatic** module)
 &lt;br/&gt;

 ##CallSomatic##
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Sun, 07 Feb 2021 06:36:41 -0000</pubDate><guid>https://sourceforge.net5314f66c501e84e7c8c5caba20b108b43b3fc6a9</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v13
+++ v14
@@ -109,16 +109,31 @@
 3. **indel/ID.ins.call**: GATK deletion calls for each sample (required for **AnalyzeMechanism** module)
 &lt;br/&gt;

+##MergeDel##
+**MergeDel** module takes two deletion call files from GATK and Delly from a given sample and merge them to make an integrated deletion list, an initial deletion call set for linkage analysis. For GATK, deletion calls generated by GenerateHetSNPAndIndelList module should be provided (indel/ID.del.call). For Delly, a VCF file generated by Delly deletion calling should be provided. The next step—**LinkageAnalysis**—will perform phasing analysis to discriminate genuine deletions from these merged calls.
+
+#####Mandatory arguments:#####
+Input                  | Option    | Description
+---------------------- | --------- | -----------
+**Germline het. SNP list**   | -v, --hetSNPList | Germline heterozygous SNP list for a given individual (generated from GenerateHetSNPAndIndelList module).
+**GATK-derived deletion call** | -G, --gatk_del | GATK-derived deletion calls for a given sample (generated by GenerateHetSNPAndIndelList module, indel/ID.del.call file)
+**Delly deletion call** | -D, --delly_del | Delly VCF file for deletion call  for a given sample (.vcf)
+**Merged deletion candidates**   | -d, --mergedDel | Merged deletion candidate list (Path for output file)
+
+#####Output files:#####
+1. **Merged deletion candidates (filename given by a user)**: Merged deletion candidate list (required for **LinkageAnalysis** module)
+&lt;br/&gt;
+
 ##LinkageAnalysis##
-**LinkageAnalysis** module takes an initial deletion candidate list and a bulk het. SNP list for a given cell, and perform phasing analysis to discriminate genuine deletions from whole-genome amplification artifacts. The output is the filtered list of true deletion candidates including both somatic and germline deletions. The next step—**CallSomatic**—will select high-confidence somatic candidates from this output.
+**LinkageAnalysis** module takes an initial deletion candidate list and a bulk het. SNP list for a given cell, and performs phasing analysis to discriminate genuine deletions from whole-genome amplification artifacts. The output is the filtered list of true deletion candidates including both somatic and germline deletions. The next step—**CallSomatic**—will select high-confidence somatic candidates from this output.

 #####Mandatory arguments:#####
 Input                  | Option    | Description
 ---------------------- | --------- | -----------
 **Reference sequence** | -r, --reference | FASTA formatted reference sequence file. *The reference must be BWA indexed.*
 **BAM file** | -b, --scBam | A single-cell or a bulk BAM file to be analyzed using phasing. The BAM file must be (coordinate) sorted and indexed.
-**Germline het. SNP list**   | -v, --hetSNPList | Germline heterozygous SNP list for a given individual (generated from GenerateHetSNPList module).
-**Initial deletion candidates**   | -d, --mergedDel | Initial deletion candidate list for a given single-cell/bulk data (generated from MergeDelVCF module).
+**Germline het. SNP list**   | -v, --hetSNPList | Germline heterozygous SNP list for a given individual (generated from GenerateHetSNPAndIndelList module).
+**Initial deletion candidates**   | -d, --mergedDel | Initial deletion candidate list for a given single-cell/bulk data (generated from MergeDel module).
 **Output directory**   | -o, --output_dir | Path for output directory.

 #####Optional arguments:#####
@@ -145,6 +160,26 @@
 3. **ID.filtered.phased.del.list**: Selected genuine deletion candidates including both somatic and germline deletions (required for **CallSomatic** module)
 &lt;br/&gt;

+##MakeDuplicateList (optional)##
+**MakeDuplicateList** module takes a set of phaseable  deletion breakpoints (ID.phased.del.list files generated from LinkageAnalysis module) from different individuals and selects deletion candidates observed in more than one individual with the exact same breakpoints. These duplicated candidates are likely to be germline deletions or systematic artifacts, thus will be filtered out from the final somatic deletion candidates during the next step—**CallSomatic**. If all single cell data are from the same individual, you don't need to run this module—duplicated candidates can be clonal mutations and thus should not be filtered out.
+
+#####Mandatory arguments:#####
+Input                  | Option    | Description
+---------------------- | --------- | -----------
+**A list of files for phaseable  deletion breakpoints** | -l, --phasedCandidateList | A tab-delimited file with the information of phaseable deletion list and its individual&lt;br/&gt;e.g. #phased_deletion_list individual_ID&lt;br/&gt;   1459_PFC_01.phased.del.list 1459&lt;br/&gt;   1459_PFC_02.phased.del.list&amp;amp;emsp1459&lt;br/&gt;   1278_PFC_01.phased.del.list        1278&lt;br/&gt;
+**Output directory**   | -o, --output_dir | Path for output directory. This module will generate two subdirectories (hetSNP_bulk and indel) for outputs (1) and (2), respectively.
+
+#####Optional arguments:#####
+Option  | Default Value                       | Description
+------- | ----------------------------------- | ------------------
+-b, --bulk_samples |  | A file containing SM tag list for bulk samples (line-separated). SM tags should match to the sample ID in the GVCF file.
+
+#####Output files:#####
+1. **hetSNP_bulk/ID.bulk.het.snps.vcf**: Germline heterozygous common SNP list for a given bulk sample indicated by the provided SM tag (required for multiple follow-up modules)
+2. **indel/ID.del.call**: GATK deletion calls for each sample (required for **MergeDel** module)
+3. **indel/ID.ins.call**: GATK deletion calls for each sample (required for **AnalyzeMechanism** module)
+&lt;br/&gt;
+
 ##CallSomatic##
 **CallSomatic** module takes a selected deletion candidate list generated from **LinkageAnalysis** module with the BAM files for a given cell and a matched bulk tissue, and discriminate high-confident somatic deletions. This module also estimates the FDR for a given cell based on the estimated level of amplification bias, and calculates genome-wide somatic deletion rate using two-component model. The output files include the estimated deletion rate, fitted model graphs for a given cell, and the final list of high-confidence somatic deletion candidates. The next step—**AnalyzeMechanism**—will annotate the predicted undelyring mechanisms for the final deletion candidates.

@@ -154,7 +189,7 @@
 **Reference sequence** | -r, --reference | FASTA formatted reference sequence file. *The reference must be BWA indexed.*
 **A single cell BAM file** | -b, --scBam | A single cell WGS BAM file corresponding to the provided deletion candidates. The reduced BAM generated from **LinkageAnalysis** module should be located together within the same folder.
 **A matched bulk BAM file** | -B, --bulkBam | A matched bulk WGS BAM. The reduced BAM generated from **LinkageAnalysis** module should be located together within the same folder.
-**Germline het. SNP list**   | -v, --hetSNPList | Germline heterozygous SNP list for a given individual (generated from **GenerateHetSNPList** module).
+**Germline het. SNP list**   | -v, --hetSNPList | Germline heterozygous SNP list for a given individual (generated from **GenerateHetSNPAndIndelList** module).
 **Phased deletion candidates from a single cell**   | -p, --phasedDelList_sc | Phased deletion list (ID.filtered.phased.del.list) from the given single cell (generated from **LinkageAnalysis** module).
 **Phased deletion candidates from a matched bulk**   | -P, --phasedDelList_bulk | Phased deletion list (ID.filtered.phased.del.list) from the matched bulk (generated from **LinkageAnalysis** module).
 **Rscript path**   | -R, --Rscript | An absolute path for the Rscript executable file (e.g. /usr/bin/Rscript).
@@ -188,7 +223,7 @@
 ---------------------- | --------- | -----------
 **Reference sequence** | -r, --reference | FASTA formatted reference sequence file. *The reference must be BWA indexed.*
 **A list of selected somatic deletion candidates** | -d, --somatic_deletions | A final list of selected high-confidence somatic deletions for a given cell (generated from **CallSomatic** module (ID.selected.somatic.del.candidates.cc.controlled.txt))
-**A list of insertion candidates** | -i, --insertionList | An insertion candidate list for a given cell (generated from **MergeDelVCF** module)
+**A list of insertion candidates** | -i, --insertionList | An insertion candidate list for a given cell (generated from **GenerateHetSNPAndIndelList** module)
 **Output directory**   | -o, --output_dir | Path for output directory.

 #####Optional arguments:#####
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Sun, 07 Feb 2021 06:28:51 -0000</pubDate><guid>https://sourceforge.netd42299f13c14d133a36b469ddc1287b2240469ce</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v12
+++ v13
@@ -62,10 +62,10 @@

     Analysis modules:
     ---------------------------------------------------------------------------------------------
-        GenerateHetSNPList                          Generate heterozygous SNP list from GATK gvcf
-        MergeDelVCF                                 Merge multiple VCF files from initial deletion calling (e.g. GATK, Delly VCFs)
+        GenerateHetSNPAndIndelList                  Generate heterozygous SNP list and ins./del. sets from the gnotyped GATK GVCF file
+        MergeDel                                    Merge deletion calls for initial candidate set (GATK and Delly calls)
         LinkageAnalysis                             Linkage analysis to discriminate genuine deletion calls (both germline and somatic)
-        MakeDuplicateList                           Make a list of duplicated phased candidates among different individuals (possible germline/artifact list)
+        MakeDuplicateList (optional)                Make a list of duplicated phased candidates between different individuals (possible germline/artifact list)
         CallSomatic                                 Call somatic deletion candidates from phased candidates and estimate genome-wide deletion rate
         AnalyzeMechanism                            Analyze underlying mechanisms for selected somatic deletion candidates

@@ -77,16 +77,37 @@
 Analysis modules and arguments
 =======
 ##Analysis modules##
-PhaseDel have six divided modules to complete the entire analysis. Use -m option to select the analysis module. You need to run all the modules in a provided order to perform a complete analysis.  Two modules—**GenerateHetSNPList** and **MakeDuplicateList**—should be applied for a set of samples, and other modules—**MergeDelVCF**, **LinkageAnalysis**, **CallSomatic**, and **AnalyzeMechanism**—should be applied per cell.
+PhaseDel have six divided modules to complete the entire analysis. Use -m option to select the analysis module. You need to run all the modules in the provided order to perform a complete analysis.  Two modules—**GenerateHetSNPList** and **MakeDuplicateList**—should be applied for a set of samples, and other modules—**MergeDel**, **LinkageAnalysis**, **CallSomatic**, and **AnalyzeMechanism**—should be applied per cell.

 Modules                  | Description
 ---------------------- | -----------
-**GenerateHetSNPList** | Generate heterozygous SNP list from GATK gvcf. Required for other modules.
-**MergeDelVCF** | Merge multiple VCF files from initial deletion calling (e.g. GATK, Delly VCFs). Provides initial candidate list for linkage analysis.
+**GenerateHetSNPAndIndelList** | Generate heterozygous SNP list and insertion/deletion call sets from genotyped GATK GVCF file. Required for other modules.
+**MergeDel** | Merge GATK and Delly deletion calls to make an initial deletion candidate set. Merged output is used for linkage analysis.
 **LinkageAnalysis**   | Linkage analysis to discriminate genuine deletion calls (both germline and somatic) from amplification artifacts.
-**MakeDuplicateList**   | Make a list of duplicated phased candidates among different individuals. Duplicates will be considered to be possible germline variants/systematic artifacts and filtered out from the somatic candidate list.
+**MakeDuplicateList**   | Make a list of duplicated phased candidates between different individuals. Duplicates will be considered to be possible germline variants/systematic artifacts and filtered out from the somatic candidate list.
 **CallSomatic**   | Discriminate somatic deletions from the phased candidates and estimate genome-wide somatic deletion rate for a given cell.
 **AnalyzeMechanism**   | Analyze underlying mechanisms for selected somatic deletion candidates.
+&lt;br/&gt;
+
+##GenerateHetSNPAndIndelList##
+**GenerateHetSNPAndIndelList** module takes a genotyped GVCF file that includes multiple samples (single cells and/or matched bulk). This module generates two types of output files: (1) germline heterozygous common SNP list and (2) separated insertion and deletion call sets for each sample. Since both somatic and germline mutations would be detected by GATK from single cell data, this module generates (1) only from matched bulk samples. To indicate which samples are from matched bulk, the module accepts a file that contains SM tags list of bulk samples. Note that the variants in the inuput GVCF should be annotated by dbSNP (indicated by ID column, e.g. rs75454623), to select common SNPs from the entire calls.
+
+#####Mandatory arguments:#####
+Input                  | Option    | Description
+---------------------- | --------- | -----------
+**Genotyped GATK GVCF** | -v, --gatk_vcf | *dbSNP-annotated* genotyped GVCF file that includes genotyping results for multiple samples (single cells and/or  matched bulk)
+**Output directory**   | -o, --output_dir | Path for output directory. This module will generate two subdirectories (hetSNP_bulk and indel) for outputs (1) and (2), respectively.
+
+#####Optional arguments:#####
+Option  | Default Value                       | Description
+------- | ----------------------------------- | ------------------
+-b, --bulk_samples |  | A file containing SM tag list for bulk samples (line-separated). SM tags should match to the sample ID in the GVCF file.
+
+#####Output files:#####
+1. **hetSNP_bulk/ID.bulk.het.snps.vcf**: Germline heterozygous common SNP list for a given bulk sample indicated by the provided SM tag (required for multiple follow-up modules)
+2. **indel/ID.del.call**: GATK deletion calls for each sample (required for **MergeDel** module)
+3. **indel/ID.ins.call**: GATK deletion calls for each sample (required for **AnalyzeMechanism** module)
+&lt;br/&gt;

 ##LinkageAnalysis##
 **LinkageAnalysis** module takes an initial deletion candidate list and a bulk het. SNP list for a given cell, and perform phasing analysis to discriminate genuine deletions from whole-genome amplification artifacts. The output is the filtered list of true deletion candidates including both somatic and germline deletions. The next step—**CallSomatic**—will select high-confidence somatic candidates from this output.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Sun, 07 Feb 2021 04:56:51 -0000</pubDate><guid>https://sourceforge.net84fc93acd7b5096562670d06365f2ca8640262f9</guid></item><item><title>Home modified by Junho Kim</title><link>https://sourceforge.net/p/phasedel/wiki/Home/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v11
+++ v12
@@ -81,12 +81,12 @@

 Modules                  | Description
 ---------------------- | -----------
-1. **GenerateHetSNPList** | Generate heterozygous SNP list from GATK gvcf. Required for other modules.
-2. **MergeDelVCF** | Merge multiple VCF files from initial deletion calling (e.g. GATK, Delly VCFs). Provides initial candidate list for linkage analysis.
-3. **LinkageAnalysis**   | Linkage analysis to discriminate genuine deletion calls (both germline and somatic) from amplification artifacts.
-4. **MakeDuplicateList**   | Make a list of duplicated phased candidates among different individuals. Duplicates will be considered to be possible germline vairants/systematic artifacts and filtered out from the somatic candidate list.
-5. **CallSomatic**   | Discriminate somatic deletions from the phased candidates and estimate genome-wide somatic deletion rate for a given cell.
-6. **AnalyzeMechanism**   | Analyze underlying mechanisms for selected somatic deletion candidates.
+**GenerateHetSNPList** | Generate heterozygous SNP list from GATK gvcf. Required for other modules.
+**MergeDelVCF** | Merge multiple VCF files from initial deletion calling (e.g. GATK, Delly VCFs). Provides initial candidate list for linkage analysis.
+**LinkageAnalysis**   | Linkage analysis to discriminate genuine deletion calls (both germline and somatic) from amplification artifacts.
+**MakeDuplicateList**   | Make a list of duplicated phased candidates among different individuals. Duplicates will be considered to be possible germline variants/systematic artifacts and filtered out from the somatic candidate list.
+**CallSomatic**   | Discriminate somatic deletions from the phased candidates and estimate genome-wide somatic deletion rate for a given cell.
+**AnalyzeMechanism**   | Analyze underlying mechanisms for selected somatic deletion candidates.

 ##LinkageAnalysis##
 **LinkageAnalysis** module takes an initial deletion candidate list and a bulk het. SNP list for a given cell, and perform phasing analysis to discriminate genuine deletions from whole-genome amplification artifacts. The output is the filtered list of true deletion candidates including both somatic and germline deletions. The next step—**CallSomatic**—will select high-confidence somatic candidates from this output.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Junho Kim</dc:creator><pubDate>Fri, 08 Jan 2021 23:40:23 -0000</pubDate><guid>https://sourceforge.net8732da1561dd6df2ce9272c42d43b411b02a333e</guid></item></channel></rss>