Clonal Heterogeneity Analysis Tool / Wiki / CHAT

Jean-Sebastien Milanese - 2014-10-02

Hi again Bo,

for the getCCF function, I noticed the AD is set to 3 by default (with delimiter ';'). Considering the VCF example I showed you in the sequencing discussion, my AD should be in 4. However, the output file is empty. There is no error in the terminal so I'm assuming the VCF gets loaded but not properly.

Unlike you, my delimiter is not ';' but ':'. In the sequencing discussion, you mentioned VCF with delimiter ',' for AD (ParseVCF) so I'm getting a bit confused.

I would just like to know the real delimiter that getCCF recognize so I can convert my VCF accordingly for both functions (ParseVCF and getCCF) before running everything. I think it would be easier for me to convert to what getCCF recognize and then, change the ParseVCF myself.

Thanks again!
Jean-Sebastien

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2014-10-03
  
  Hi Jean-Sebastien,
  
  The VCF I am using have two types of delimiters, and looks something like this:
  
  0/1:50,32:82:...
  
  The fields are GP:AD:DP...
  
  I guess you can convert your VCF file into this format and apply CHAT.
  
  It is probably harder for you to modify getSampleCCF or getCCF functions, since they are built in the package. But you can certainly modify ParseVCF.
  
  Thanks,
  Bo
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mitsuko - 2014-12-23

I managed to run getSeg(), getAGP() and getsAGP() to obtain the sAGP data properly, e.g.,

new.dd.dat=get(load('sAGP.Rdata'))
head(new.dd.dat)
chr logR.mean bin BAF.mean bin
Sample1.T1 2 752566 12278932 0.032007233 1000 0.000000000 1000
Sample1.T1 2 12450829 24516880 0.020524412 1000 0.000000000 1000
Sample1.T1 2 24531634 41293664 0.006148282 1000 0.039042118 1000
Sample1.T1 2 41295542 56904162 -0.077622378 1000 0.007352231 1000
Sample1.T1 2 56904480 68334436 0.179746835 1000 0.084860368 1000
Sample1.T1 2 68335624 83172684 0.135804702 1000 0.000000000 1000
seg_purity nb nt
Sample1.T1 0.0000000 1 2
Sample1.T1 0.0000000 1 2
Sample1.T1 0.1192383 0 1
Sample1.T1 0.0000000 1 2
Sample1.T1 0.1170898 0 3
Sample1.T1 0.0000000 1 2

... but when I send it to getCCF() along with the vcf files, the resulting output file always turns out empty. I wonder if there are any ways to find out what's going wrong.

Last edit: Mitsuko 2014-12-23

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2014-12-23
  
  The output of sAGP looks OK to me. In getCCF() step, the most typical problem is the format of VCF files. In my code, I suppose the file includes a tumor/normal pair, with the 10th and 11th columns tumor and normal (or the other way round), and Allele Depth (AD) should be coded as the 3rd (you can also specify a different number) field. Could you please check if your VCF file has this format?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2014-12-23
  
  Hi Mitsuko,
  
  I wonder if you still have this problem or you have solved it? I saw your
  most recent post and it seems to me that you have solved this issue.
  
  Thanks,
  Bo
  
  On Tue, Dec 23, 2014 at 2:05 PM, Mitsuko mitsukok@users.sf.net wrote:
  
  When using getSeg(), I tend to get NaN in the mean values: for example,
  
  seg.dat=get(load('seg.Rdata'))
  head(seg.dat)
  chr logR.mean bin BAF.mean bin
  Sample001.T1 2 1017197 11754168 NaN 1000 0.000000000 1000
  Sample001.T1 2 11761252 22448217 NaN 1000 0.006551691 1000
  Sample001.T1 2 22450487 37096184 NaN 1000 0.000000000 1000
  Sample001.T1 2 37096594 50605337 NaN 1000 0.000000000 1000
  Sample001.T1 2 50613676 63529371 NaN 1000 0.000000000 1000
  Sample001.T1 2 63533935 77754853 NaN 1000 0.000000000 1000
  
  ... I assume this is due to some division by zeros, but how can I avoid
  such errors?
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/clonalhetanalysistool/wiki/CHAT/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mitsuko - 2014-12-23
    
    Yes, I solved the problem on my own - I needed to specify data.type="log" when calling getSeg(). I should have read the help file more carefully.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mitsuko Korobkin - 2014-12-23
    
    Yes, I found out what's wrong myself before you replied me. I forgot to specify data.type='log' while running getSeg() previously.
    
    I'll update on the other problem I'm having later.
    
    Thanks again,
    -Mitsuko
    
    From: Bo Li [mailto:lukeli1987@users.sf.net]
    Sent: Tuesday, December 23, 2014 3:34 PM
    To: [clonalhetanalysistool:wiki]
    Subject: [clonalhetanalysistool:wiki] Re: Discussion for CHAT page
    
    Hi Mitsuko,
    
    I wonder if you still have this problem or you have solved it? I saw your
    most recent post and it seems to me that you have solved this issue.
    
    Thanks,
    Bo
    
    On Tue, Dec 23, 2014 at 2:05 PM, Mitsuko mitsukok@users.sf.netmitsukok@users.sf.net wrote:
    
    When using getSeg(), I tend to get NaN in the mean values: for example,
    
    seg.dat=get(load('seg.Rdata'))
    head(seg.dat)
    chr logR.mean bin BAF.mean bin
    Sample001.T1 2 1017197 11754168 NaN 1000 0.000000000 1000
    Sample001.T1 2 11761252 22448217 NaN 1000 0.006551691 1000
    Sample001.T1 2 22450487 37096184 NaN 1000 0.000000000 1000
    Sample001.T1 2 37096594 50605337 NaN 1000 0.000000000 1000
    Sample001.T1 2 50613676 63529371 NaN 1000 0.000000000 1000
    Sample001.T1 2 63533935 77754853 NaN 1000 0.000000000 1000
    
    ... I assume this is due to some division by zeros, but how can I avoid
    such errors?
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/clonalhetanalysistool/wiki/CHAT/https://sourceforge.net/p/clonalhetanalysistool/wiki/CHAT
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/https://sourceforge.net/auth/subscriptions
    
    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/clonalhetanalysistool/wiki/CHAT/https://sourceforge.net/p/clonalhetanalysistool/wiki/CHAT
    
    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/https://sourceforge.net/auth/subscriptions
    
    This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
    
    alternate
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mitsuko - 2014-12-23

I prepare my vcf files so that normal and tumor samples appear in the 10th and the 11th field, respectively. The AD data in the INFO field is in the 2nd, so I let AD=2. (The attached vcf file with this message shows for the first 10 lines of my input file.) I run the getCCF() as follows.

getCCF("./VCF",'sAGP.Rdata', output="CCF.vcf", nt=FALSE, nc=10, tc=11, AD=2, filter=TRUE, TCGA=FALSE)

Program loads all the vcf files with no error messages, but the output file is empty. I will check if there exist some common entries among those vcf files to see if the problem is caused by the input files or not. Thank you anyway for your help.

Last edit: Mitsuko 2014-12-23

Sample1-T1_head.vcf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2014-12-23
  
  OK, I see the problem. This is a bit of legacy issue -- VCF calls from older variant callers, such as GATK (which is not a good example), will assign a genotype for the tumor sample. I am not familiar with newer callers. But in your file, I don't see a GT field in the 11th columns. getCCF aims to find true somatic mutations. If GT not given, it will assume the first field is GT and use the wrong information. If you are certain that all the mutations are somatic, you can add 0/1 prior to the current tumor INFO column. But you should also make sure that in your normal column, there are GT=0/0 or 1/1. I see many calls have 0/1 but with alternative read depths quite small. I think that may be a technical problem you should pay attention to.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mitsuko - 2014-12-23

I'm glad you pointed out the problem, because there was in fact a bug in my script that prepared input VCF files. In the example vcf file I sent you, the header was correctly positioned (10th=normal, 11th=tumor), but the data entries for the normal and tumor fields were swapped, which caused the error. Another problem was that the variant caller I used (MuTect) tend to set GT=0 for the normal sample although correct way seems to be GT=0/0:

http://gatkforums.broadinstitute.org/discussion/2485/vcf-output-question

After fixing these problems, I was able to obtain CCF output. Thanks again for your help!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mitsuko - 2014-12-24

I have one more question: I tend to get no segment information from chromosome 1 and cannot figure out why. The rest of the chromosomes look fine, e.g.,

data=get(load('seg.Rdata'))
head(data)
chr logR.mean bin BAF.mean bin
Sample1.T1 1 0 0 -0.03060 1000 0.008756967 1000
Sample1.T2 1 0 0 -0.00440 1000 0.020682437 1000
Sample1.T1 2 1017197 11754168 -0.06160 1000 0.000000000 1000
Sample1.T1 2 11761252 22448217 -0.05935 1000 0.006551691 1000
Sample1.T1 2 22450487 37096184 -0.06015 1000 0.000000000 1000
Sample1.T1 2 37096594 50605337 -0.05770 1000 0.000000000 1000

Input BAF and LRR files have numerous entries in chromosome 1, and so I don't understand why getSeg() returns no segments. I attach the R script I use to produce the seg.Rdata with this post.

run_getSeg.R

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2014-12-24
  
  I am not sure at this point where could go wrong. May I take a look at your SNP data to diagnose?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mitsuko - 2014-12-24
    
    I think I found the cause of the problem - I removed the X and Y chromosome entries in my BAF/LRR input files and re-run my script, and the output seg.Rdata contains segments in the chromosome 1 correctly.
    
    Thank you for the help anyways.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Bo Li - 2014-12-24
      
      You are right -- sex chromosomes are currently not included in my analysis. Glad that you fix it. Thanks for letting me know!
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tong - 2015-01-05

Hi Bo,

I was able to run CHAT following the pipeline you provided.
However, I am facing problems in understanding the output .vcf file.

For example, I take 2 lines from my output .vcf file as example.
In the second line, the SNP is assigned to A1, which is corresponding to the 1st lineage scenario described in your paper. What I am not sure is what does the "NA" mean in the first line? Thanks in advance!

10 134898263 . G T 200 PASS tumor_A001.T.S01;32;72;0;111;NA;NA;NA;NA;NA;NA GT:DP:AD:BQ:MQ:SB:FA 0/0:111,0:111 0/1:40,32:73
11 223764 . A G 200 PASS tumor_A001.T.S01;29;53;1;86;0.57;0.197;0.574;1;4;A1 GT:DP:AD:BQ:MQ:SB:FA 0/0:85,1:86 0/1:24,29:53

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2015-01-05
  
  Hi Tong,
  
  The NA in these fields indicate that the CHAT cannot estimate information for the corresponding somatic mutation due to the unidentifiability issues for sAGP or CCF. Please check out our paper for more details.
  
  Thanks,
  Bo
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ying ya - 2016-09-23

Hi Bo,

What maybe the problems when I use getSegChr(bb.chr, ll.chr, thr.hets=thr.hets, data.type='log') to calculate segments and get errors:
.Error in nls(y ~ 1/sqrt(2 * pi)/b * exp(-(x - a)^2/2/b^2), start = list(b = 0.06, :
number of iterations exceeded maximum of 50

The erros just take place on some chromosomes and some samples.

Thanks,
Ying

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bo Li - 2016-09-23
  
  Hi Ying,
  
  You probably need to look at the distribution of BAF markers. There should
  be 4 peaks -- use thr.hets to remove the top and bottom peaks. If those two
  are not effectively removed, there will be problem.
  
  Best,
  Bo
  
  On Fri, Sep 23, 2016 at 2:08 AM, ying ya yingya@users.sf.net wrote:
  
  Hi Bo,
  
  What maybe the problems when I use getSegChr(bb.chr, ll.chr,
  thr.hets=thr.hets, data.type='log') to calculate segments and get errors:
  .Error in nls(y ~ 1/sqrt(2 * pi)/b * exp(-(x - a)^2/2/b^2), start = list(b
  = 0.06, :
  number of iterations exceeded maximum of 50
  
  The erros just take place on some chromosomes and some samples.
  
  Thanks,
  Ying
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/clonalhetanalysistool/wiki/CHAT/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ron - 2018-11-02

Hi,
I am tring to get the purity file:

para <- getPara()
para$BAFfilter=1 #i think this needs correction instead of 10
para$datafile <- 'Test.Rdata'
para$thr.penalty <- 300
para$savefile <- 'AGP_temp.txt'
getAGP(para=para)

I keep getting the following error:

Error in dat[delPos, 4] * (1/para$LRR_correction_del) :
non-numeric argument to binary operator

Thanks
Ron

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Clonal Heterogeneity Analysis Tool Wiki

CHAT

Related

Discussion