LoFreq / Discussion / General Discussion: Error running lofreq somatic and indel calling handler on normal tumor bam files

chris.cornor - 2015-01-14

Hi Andreas,

I am trying to use lofreq for somatic indel calling using the somatic and --call-indels handler for my normal and tumor bam files. I downloaded the version lofreq_star-2.1.0_linux-x86-64.tgz. I am using it on our institute cluster Linux version 3.2.0-4-amd64. I have just unpacked the software and then tried to run it. The command am using is below

/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq somatic --threads 8 -n /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam -t /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998.realigned.recal.bam -f /scratch/GT/vdas/test_exome/exome/hg19.fa --call-indels -l /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed -d /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf -o /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_HG_

N_S8980.realigned.recal.bam is my normal bam

T_S7998.realigned.recal.bam is my tumor bam

Since it is an exome data am using the bed file of the SureSelect V4 and also am using the dbSNP file to remove the variants from the output.

I am getting the following error.

WARNING [2015-01-14 17:09:00,626]: Using stringent normal for SQ which does not however contain indels!
CRITICAL [2015-01-14 17:09:00,724]: The following command failed with code 1: lofreq call-parallel --pp-threads 8 -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1 -l /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed --call-indels -a 0.100000 --use-orphan -B -N -A /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam -o /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_HG_normal_relaxed.vcf.gz
CRITICAL [2015-01-14 17:09:00,724]: Received the following on stderr:
INFO [2015-01-14 17:09:00,653]: Using 8 threads with following basic args: lofreq call -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1 --call-indels -a 0.100000 --use-orphan -B -N -A /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam

Traceback (most recent call last):

File "/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq2_call_pparallel.py", line 751, in <module>

main()

File "/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq2_call_pparallel.py", line 563, in main

bed_bins = [Region._make(x) for x in read_bed_coords(bed_file)]

File "/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq2_call_pparallel.py", line 106, in read_bed_coords

(chrom, start, end) = line.split("\t")[0:3]

ValueError: need more than 1 value to unpack

CRITICAL [2015-01-14 17:09:00,724]: Somatic SNV caller failed. Exiting

Can you please tell me where am getting wrong and how to get rid of this error. I guess the software I downloaded if fine for my system.

Regards,

Chris
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2015-01-15
  
  Hi Chris,
  
  this looks like an unhandled error triggered in the bed reading
  function. Is your bed file in any way unusual? Would you mind to share
  it, either here or via PM (wilma@gis.a-star.edu.sg)? The problem
  should be easy to fix then.
  
  Andreas
  
  On 15 January 2015 at 01:45, chris.cornor vd4mmind@users.sf.net wrote:
  
  Hi Andreas,
  
  I am trying to use lofreq for somatic indel calling using the somatic and
  --call-indels handler for my normal and tumor bam files. I downloaded the
  version lofreq_star-2.1.0_linux-x86-64.tgz. I am using it on our institute
  cluster Linux version 3.2.0-4-amd64. I have just unpacked the software and
  then tried to run it. The command am using is below
  
  /scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq somatic --threads 8 -n
  /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam
  -t
  /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998.realigned.recal.bam
  -f /scratch/GT/vdas/test_exome/exome/hg19.fa --call-indels -l
  /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed
  -d /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf -o
  /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_HG_
  
  N_S8980.realigned.recal.bam is my normal bam
  
  T_S7998.realigned.recal.bam is my tumor bam
  
  Since it is an exome data am using the bed file of the SureSelect V4 and
  also am using the dbSNP file to remove the variants from the output.
  
  I am getting the following error.
  
  WARNING [2015-01-14 17:09:00,626]: Using stringent normal for SQ which does
  not however contain indels!
  CRITICAL [2015-01-14 17:09:00,724]: The following command failed with code
  1: lofreq call-parallel --pp-threads 8 -d 101000 -f
  /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1
  -l
  /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed
  --call-indels -a 0.100000 --use-orphan -B -N -A
  /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam
  -o
  /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_HG_normal_relaxed.vcf.gz
  CRITICAL [2015-01-14 17:09:00,724]: Received the following on stderr:
  INFO [2015-01-14 17:09:00,653]: Using 8 threads with following basic args:
  lofreq call -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose
  --no-default-filter -b 1 --call-indels -a 0.100000 --use-orphan -B -N -A
  /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam
  
  Traceback (most recent call last):
  
  File
  "/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq2_call_pparallel.py",
  line 751, in <module>
  
  main()
  
  File
  "/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq2_call_pparallel.py",
  line 563, in main
  
  bed_bins = [Region._make(x) for x in read_bed_coords(bed_file)]
  
  File
  "/scratch/GT/softwares/lofreq_star-2.1.0/bin/lofreq2_call_pparallel.py",
  line 106, in read_bed_coords
  
  (chrom, start, end) = line.split("\t")[0:3]
  
  ValueError: need more than 1 value to unpack
  
  CRITICAL [2015-01-14 17:09:00,724]: Somatic SNV caller failed. Exiting
  
  Can you please tell me where am getting wrong and how to get rid of this
  error. I guess the software I downloaded if fine for my system.
  
  Regards,
  
  Chris
  
  Error running lofreq somatic and indel calling handler on normal tumor bam
  files
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-15

I am not sure how I can share the bed file, can i host it up anywhere? its almost 180 mb I guess. it is the sure select bed file which I have been supplied with the kit. I also have a .list of the same file, can that be used or I have to use the .bed format itself?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-15

I have used the S03723314_Covered.bed file which you can download from the agilent website. And this does not work. So you recommend to just create a custom bed file with coordinates and then run the command?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-15

The format of the bed file looks like this

head -5 S03723314_Covered.bed

browser position chr1:762097-762270

track name="Covered" description="Agilent SureSelect DNA - SureSelect #Human All Exon V4- Genomic regions covered by probes" color=0,0,128

chr1 762097 762270 #ref|LINC00115,ens|ENST00000473798,ens|ENST00000536430,ref|NR_024321,mRNA|AK026292,mRNA|BC017762,mRNA|BC017762,mRNA|AK026292,ref|NR_024321,ens|ENST00000536430,ens|ENST00000473798,ref|LINC00115

chr1 861281 861490 #ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11

chr1 865591 865791 #ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ens|ENST00000341065,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000341065,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11

This is a standard bed format for the agilent kits.

Last edit: chris.cornor 2015-01-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2015-01-15
  
  Oh ok. That looks like an extension of the bed format. LoFreq (and
  samtools) expect something much simpler as defined here
  http://genome.ucsc.edu/FAQ/FAQformat.html#format1:
  1st column chromosome name
  2nd column region start (zero based)
  3rd column region end (inclusive)
  All other columns are optional.
  You have unexpected rows in your data needed for the UCSC track definition.
  
  Anyway, the following will produce a version of your file that should work:
  $ grep -v ^browser S03723314_Covered.bed | grep -v ^track > new.bed
  
  Try the created file called "new.bed" instead of the original bed
  file. Let me know if that works and I will make corresponding changes
  in the LoFreq code.
  
  Thanks,
  Andreas
  
  On 15 January 2015 at 19:15, chris.cornor vd4mmind@users.sf.net wrote:
  
  The format of the bed file looks like this
  
  head -5 S03723314_Covered.bed
  
  browser position chr1:762097-762270
  
  track name="Covered" description="Agilent SureSelect DNA - SureSelect Human
  All Exon V4- Genomic regions covered by probes" color=0,0,128
  chr1 762097 762270
  ref|LINC00115,ens|ENST00000473798,ens|ENST00000536430,ref|NR_024321,mRNA|AK026292,mRNA|BC017762,mRNA|BC017762,mRNA|AK026292,ref|NR_024321,ens|ENST00000536430,ens|ENST00000473798,ref|LINC00115
  chr1 861281 861490
  ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11
  chr1 865591 865791
  ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ens|ENST00000341065,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000341065,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11#
  
  This is a standard bed format for the agilent kits.
  
  Error running lofreq somatic and indel calling handler on normal tumor bam
  files
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-15

Yes the tool is working now, I modified the bed file of the company to a general bed format as of mentioned in UCSC just keeping the chr start and end coordinates and it seem to run , am waiting for the output now. Just to ask you a thing, this vcf are zipped , to annotate I usually use VEP for retrieving maximum annotated variants but it does not work with pakced vcf. I will try to unzip it once the run is over. Just to ask you any annotator that can directly accept vcf zipped files?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-15

I am watching 3 different outout, 2 is of snvs ( one is relaxed and other is stringent) and one stringent for the indel file. I uncompressed the stringent and relaxed file of my output to see how the contents are placed. In some SNVs the QUAL filter does not have any values and it is having (.) and in other they have values. Usually I chose variants having QUAL above 50. ALso here I see the stringent SNV file has over 20k variants. in the info column I could see something as CONSVAR, may I know what it stands for? It seems I need to filter the vcf files as well from the lofreq output. I could not find much details on these front in any of the documentation. If still I had missed I would ask for apology. Please let me know then shall I go for the filter handler to filter the vcf files that are output here from LoFreq?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-15

Also I carefully noticed some log comments where I see these comments
WARNING [2015-01-15 11:59:20,322]: Using stringent normal for SQ which does not however contain indels!
CRITICAL [2015-01-15 14:44:52,604]: The following command failed with code 1: lofreq vcfset -a complement -2 /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf -1 /scratch/GT/vdas/pietro/exome_s
eq/results/lofreq_out/out_LG_somatic_final.snvs.vcf.gz --only-snvs -o /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_LG_somatic_final_minus-dbsnp.snvs.vcf.gz
CRITICAL [2015-01-15 14:44:52,604]: Received the following on stderr:
FATAL(lofreq_vcfset.c|main_vcfset:301): Couldn't load tabix index for /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf

CRITICAL [2015-01-15 14:44:52,604]: Somatic SNV caller failed. Exiting

Does this mean that the tool did not work properly? do I have to index even te dbsnp file?

Also in the log file which comes as an output I see the below comments which shows that the run was fine

lofreq call-parallel --pp-threads 8 -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1 -l /scratch/GT/vdas/referenceBed/hg19/ss_v4/Exon_SSV4_clean.bed --call-indels

-a 0.100000 --use-orphan -B -N -A /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam -o /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_HG_normal_relaxed.vcf.gz
stderr: INFO [2015-01-15 11:59:13,416]: Using 8 threads with following basic args: lofreq call -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1 --call-indels -a 0.10
0000 --use-orphan -B -N -A /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam
stderr:
stderr: INFO [2015-01-15 11:59:14,418]: Adding 27 commands to mp-pool
stderr: Number of substitution tests performed: 17946093
stderr: Number of indel tests performed: 424952
stderr: INFO [2015-01-15 12:51:21,951]: Copying concatenated vcf file to final destination

Can you please guide me where am going wrong? I mean we should not have so many SNVs or indels. Both are around 20k and 14k respectively after filtering for variants with QUAL score. So am in a fix now. I have already used this bam files on GATK and VarScan and I have got respectable counts for SNVs and indels but I want to make my analysis mroe strong using LoFreq indel caller to build a comprehensive list of genes that are indels. Also I have very few samples and I work individually on them.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2015-01-16
  
  Hi Chris,
  
  When you call somatic SNVs then you only need to look at the file that
  ends in "somatic_final.snvs.vcf.gz", or if you used dbsnp then
  "somatic_final_minus-dbsnp.snvs.vcf.gz". All the other files are more
  or less temporary and you should ignore them. See also
  http://csb5.github.io/lofreq/commands/#somatic
  
  You are getting a "tabix index" error because LoFreq expects your
  dbsnp file to be tabix indexed (see 'lofreq somatic --help'). This is
  to speed up things. Just run 'bgzip' and 'tabix' on the dbsnp vcf file
  to achieve this. This will also save you some disk space.
  
  Regarding the quality filtering: LoFreq filters variants for you
  already. Just use the files called final (see above). So there is no
  need for you to do that again. We call variants with more than 50%
  allele frequency consensus variants (CONSVAR) and don't assign a
  quality to them, which is why they get a "dot" (for missing) instead.
  These are "obvious" variants i.e. you could say they are high quality.
  You will want to keep them.
  
  You asked for a variant annotator that directly accepts zipped vcf
  files: I don't know if SNPeff etc. support this, but if it doesn't,
  then it's just a matter of running 'gunzip' on the file first.
  
  And finally the bed file issue: samtools actually does support the
  format you used and I fixed LoFreq to do the same. The fix will be
  included in the next version (2.1.2). For the sake of completeness,
  could you please send me a link to the bed file you used (you said you
  downloaded it from somewhere), so that I can test this thoroughly?
  
  Let me know if you need more information or if you have any more problems,
  Andreas
  
  On 16 January 2015 at 01:30, chris.cornor vd4mmind@users.sf.net wrote:
  
  Also I carefully noticed some log comments where I see these comments
  WARNING [2015-01-15 11:59:20,322]: Using stringent normal for SQ which does
  not however contain indels!
  CRITICAL [2015-01-15 14:44:52,604]: The following command failed with code
  1: lofreq vcfset -a complement -2
  /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf -1
  /scratch/GT/vdas/pietro/exome_s
  eq/results/lofreq_out/out_LG_somatic_final.snvs.vcf.gz --only-snvs -o
  /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_LG_somatic_final_minus-dbsnp.snvs.vcf.gz
  CRITICAL [2015-01-15 14:44:52,604]: Received the following on stderr:
  FATAL(lofreq_vcfset.c|main_vcfset:301): Couldn't load tabix index for
  /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf
  
  CRITICAL [2015-01-15 14:44:52,604]: Somatic SNV caller failed. Exiting
  
  Does this mean that the tool did not work properly? do I have to index even
  te dbsnp file?
  
  Also in the log file which comes as an output I see the below comments which
  shows that the run was fine
  
  lofreq call-parallel --pp-threads 8 -d 101000 -f
  /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1
  -l /scratch/GT/vdas/referenceBed/hg19/ss_v4/Exon_SSV4_clean.bed
  --call-indels
  
  -a 0.100000 --use-orphan -B -N -A
  /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam
  -o
  /scratch/GT/vdas/pietro/exome_seq/results/lofreq_out/out_HG_normal_relaxed.vcf.gz
  stderr: INFO [2015-01-15 11:59:13,416]: Using 8 threads with following basic
  args: lofreq call -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa
  --verbose --no-default-filter -b 1 --call-indels -a 0.10
  0000 --use-orphan -B -N -A
  /scratch/GT/vdas/pietro/exome_seq/results/N_S8980/N_S8980.realigned.recal.bam
  stderr:
  stderr: INFO [2015-01-15 11:59:14,418]: Adding 27 commands to mp-pool
  stderr: Number of substitution tests performed: 17946093
  stderr: Number of indel tests performed: 424952
  stderr: INFO [2015-01-15 12:51:21,951]: Copying concatenated vcf file to
  final destination
  
  Can you please guide me where am going wrong? I mean we should not have so
  many SNVs or indels. Both are around 20k and 14k respectively after
  filtering for variants with QUAL score. So am in a fix now. I have already
  used this bam files on GATK and VarScan and I have got respectable counts
  for SNVs and indels but I want to make my analysis mroe strong using LoFreq
  indel caller to build a comprehensive list of genes that are indels. Also I
  have very few samples and I work individually on them.
  
  Error running lofreq somatic and indel calling handler on normal tumor bam
  files
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-16

Hi Andreas,
Thank you for your reply. I would like to say that I had a successful run. Actually while it was runnnig the *final.vcf.gz were still not generated so I could not see them. I would look carefully now on them. One note I would like to say is I do not see any file with somatic_final_minus-dbsnp.snvs.vcf.gz. Is this because my dbsnp file is not indexed? Should I index it and run it once more? I will now go through the outputs but meanwhile would like to inform you that I have used -d dbsnps file but I do not see any output with final_minus-dbsnps. So shall I run again with the indexed dbsnp and then making a zip file?

Also the bed file you are talking about is hosted in agilent website. You have to log in

https://earray.chem.agilent.com/earray/
If you are new to earray, you will first need to register and then it is free to use. In the top right corner of the screen choose Application type Sureselect Target enrichment. From there choose “Libraries” and then “Browse Libraries”. Choose the catalogue kit that you are interested in and click download. A list of the annotation files available will then appear, choose the file type you require and click download.

If you are facing issues let me know I can put my file in dropbox and link it to you. Thanks.
Chris

Last edit: chris.cornor 2015-01-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2015-01-19
  
  Hey Chris,
  
  yes, the file somatic_final_minus-dbsnp.snvs.vcf.gz is not there,
  because the dbsnp removal step failed, since your dbsnp was not
  indexed. I changed the code so that it will now complain directly.
  Anyway, you can just run bgzip and tabix on dbsnp and then rerun
  LoFreq somatic using the same parameters as before and add --continue.
  That will pick up things were they failed and continue. If you however
  unzipped some files while LoFreq was still running, then it's probably
  safest to remove all intermediate files and start all over again.
  
  Andreas
  
  On 16 January 2015 at 17:49, chris.cornor vd4mmind@users.sf.net wrote:
  
  Hi Andreas,
  Thank you for your reply. I would like to say that I had a successful run.
  Actually while it was runnnig the *final.vcf.gz were still not generated so
  I could not see them. I would look carefully now on them. One note I would
  like to say is I do not see any file with
  somatic_final_minus-dbsnp.snvs.vcf.gz. Is this because my dbsnp file is not
  indexed? Should I index it and run it once more? I will now go through the
  outputs but meanwhile would like to inform you that I have used -d dbsnps
  file but I do not see any output with final_minus-dbsnps. So shall I run
  again with the indexed dbsnp and then making a zip file?
  
  Also the bed file you are talking about is hosted in agilent website. You
  have to log in
  
  https://earray.chem.agilent.com/earray/
  If you are new to earray, you will first need to register and then it is
  free to use. In the top right corner of the screen choose Application type
  Sureselect Target enrichment. From there choose “Libraries” and then “Browse
  Libraries”. Choose the catalogue kit that you are interested in and click
  download. A list of the annotation files available will then appear, choose
  the file type you require and click download.
  
  If you are facing issues let me know I can put my file in dropbox and link
  it to you. Thanks.
  Chris
  
  Error running lofreq somatic and indel calling handler on normal tumor bam
  files
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

chris.cornor - 2015-01-19

Hi Andreas,

Thank you very much for the reply. I would like to let you know that in the final indel file (which is my primary requirement now), I have annotated and there I found some indels with rsid after annotation which says that they are indeed present in dbsnp database so I guess rerunning it wont be that necessary right? However sometimes some variants even after discounting we see they do not show any rsid at vcf level but on annotation they do get some ids. So do you think at this point it will be necessary to do it all over again. My idea is to find variants in tumor and in its reprogrammed clone and match them , provided if some of them are not novel it is not an issue as of now. When I will go for tumor driving variants then probably discounting the dbsnp variants will be useful. That can also be done filtering for annotated variants which has some #rsid after annotating them. Please let me know your views on this.

Thanks,

Chris

Last edit: chris.cornor 2015-01-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2015-01-19
  
  Hey Chris,
  
  if you get a final vcf file, then there is no need to rerun LoFreq.
  
  Whether you want to remove dbsnp or not ,depends a bit on the question
  you want to answer. The default is be to remove dbsnp, simply to avoid
  cases where a germline call was erroneously missed in the normal,
  which then produces a (false) somatic call. If in doubt, go with the
  default!
  
  Andreas
  
  On 19 January 2015 at 17:48, chris.cornor vd4mmind@users.sf.net wrote:
  
  Hi Andreas,
  
  Thank you very much for the reply. I would like to let you know that in the
  final indel file (which is my primary requirement now, I have not annotated
  them and there I found some indels with rsid after annotation which says
  that they are indeed present in dbsnp database so I guess rerunning it wont
  be that necessary right? However sometimes some variants even after
  discounting we see they do not show any rsid at vcf level but on annotation
  they do get some ids. So do you think at this point it will be necessary to
  do it all over again. My idea is to find variants in tumor and in its
  reprogrammed clone and match them , provided if some of them are not novel
  it is not an issue as of now. When I will go for tumor driving variants then
  probably discounting the dbsnp variants will be useful. That can also be
  done filtering for annotated variants which has some #rsid after annotating
  them. Please let me know your views on this.
  
  Thanks,
  
  Chris
  
  Error running lofreq somatic and indel calling handler on normal tumor bam
  files
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Error running lofreq somatic and indel calling handler on normal tumor bam files

Fast and sensitive variant-calling from sequencing data

Forums

Help

Error running lofreq somatic and indel calling handler on normal tumor bam files

browser position chr1:762097-762270

track name="Covered" description="Agilent SureSelect DNA - SureSelect #Human All Exon V4- Genomic regions covered by probes" color=0,0,128

chr1 762097 762270 #ref|LINC00115,ens|ENST00000473798,ens|ENST00000536430,ref|NR_024321,mRNA|AK026292,mRNA|BC017762,mRNA|BC017762,mRNA|AK026292,ref|NR_024321,ens|ENST00000536430,ens|ENST00000473798,ref|LINC00115

chr1 861281 861490 #ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11

chr1 865591 865791 #ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ens|ENST00000341065,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000341065,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11

lofreq call-parallel --pp-threads 8 -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1 -l /scratch/GT/vdas/referenceBed/hg19/ss_v4/Exon_SSV4_clean.bed --call-indels

Error running lofreq somatic and indel calling handler on normal tumor bam files

Fast and sensitive variant-calling from sequencing data

Forums

Help

Error running lofreq somatic and indel calling handler on normal tumor bam files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

browser position chr1:762097-762270

track name="Covered" description="Agilent SureSelect DNA - SureSelect #Human All Exon V4- Genomic regions covered by probes" color=0,0,128

chr1 762097 762270 #ref|LINC00115,ens|ENST00000473798,ens|ENST00000536430,ref|NR_024321,mRNA|AK026292,mRNA|BC017762,mRNA|BC017762,mRNA|AK026292,ref|NR_024321,ens|ENST00000536430,ens|ENST00000473798,ref|LINC00115

chr1 861281 861490 #ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11

chr1 865591 865791 #ref|SAMD11,ccds|CCDS2.2,ens|ENST00000420190,ens|ENST00000437963,ens|ENST00000342066,ens|ENST00000341065,ref|NM_152486,mRNA|AF161376,mRNA|AF161376,ref|NM_152486,ens|ENST00000341065,ens|ENST00000342066,ens|ENST00000437963,ens|ENST00000420190,ccds|CCDS2.2,ref|SAMD11

lofreq call-parallel --pp-threads 8 -d 101000 -f /scratch/GT/vdas/test_exome/exome/hg19.fa --verbose --no-default-filter -b 1 -l /scratch/GT/vdas/referenceBed/hg19/ss_v4/Exon_SSV4_clean.bed --call-indels

Error running lofreq somatic and indel calling handler on normal tumor bam files