Download Latest Version CHAT_1.1.tar.gz (3.8 MB)
Email in envelope

Get an email when there's a new version of Clonal Heterogeneity Analysis Tool

Home / simulations
Name Modified Size InfoDownloads / Week
Parent folder
tumor.vcf 2014-09-17 76.4 kB
sCNA.rdata 2014-09-17 8.8 kB
simData.rdata 2014-09-17 31.0 kB
README.txt 2014-09-17 2.7 kB
Totals: 4 Items   119.0 kB 0
Written by: Bo Li 
Last modified: September 13th, 2014

The three data files in this folder contain simulated data used as input to compare the performances of CHAT, PyClone and EXPANDS.

1. simData.rdata contains the TRUE attributes of sCNAs and SNVs that were simulated, and then used to further simulate their OBSERVED genomic data.  It is a list that contains two data frames. 
	simData$sCNA contains the true attributes of 400 sCNAs, including their start (st) and end positions (ed), simulated sAGP values (true_sAGP), copy number of the minor allele (true_nb), and the total copy number (true_nt).
	simData$SNV contains the true attributes of 1000 simulated somatic mutations, including their simulated read counts in a sequencing dataset.  The fields are: chromosome number (chr), genomic coordinate (position) of the SNV, assigned CCF value (true_CCF) using the method described in the Materials and Methods 7b, count of reads carrying the normal and the mutation allele in the tumor sample (normal_count_cancer and mutant_count_cancer, respectively), count of reads carrying the normal allele (normal_count_control) and mutation allele in the control sample (mutant_count_control, which is always 0), assigned lineage scenario (true_scenario), and the properties of the sCNA this somatic mutation has occurred in (true_nb, true_nt and true_sAGP).

2. sCNA.rdata is a matrix containing the simulated OBSERVED data for the 400 sCNAs based on their true attributes already simulated in simData as described above.  These included the observed LRR and BAF values.  The number of markers for each sCNA is assigned 500 for both LRR and BAF data.  The number of markers for LRR is denoted as num.LRR, and for BAF is num.BAF.  Gaussian white noise is added into the observed LRR and BAF values to mimic the noise in the real data.  This file is used in CHAT to estimate the nb, nt and sAGP values for each sCNA. 

3. tumor.vcf is a text file in the Variant Call Format, containing the simulated read counts that could be OBSERVED based on their true attributes already generated. The 1000 variants in this file correspond to the same 1000 somatic mutations in simData.   Columns 1 and 2 are chromosome and positions of the SNV. Columns 3-9 are place holders and non-informative for this analysis. Column 10 contains information for the normal sample, in the order of genotype, read depth, normal allele count, and somatic allele count. Column 11 contains information for the tumor sample, in the order of genotype, read depth, normal and somatic allele counts.  These counts are the same as those in simData$SNV.

R codes used to generate and analyze these simulated data are available from the author: libo@umich.edu.
Source: README.txt, updated 2014-09-17