NGSEP / Discussion / Frequently Asked Questions: SingleSampleVariantsDetector

Mahmoud Bassyouni - 2022-01-05

Hi,
I am trying to run the single sample variant detector script on a specific chromosome which is 22. I got the BAM file for the same chromosome and also the Fasta file is for the same chromosome only. it gets me this error

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at ngsep.NGSEPcore.main(NGSEPcore.java:66) Caused by: java.io.IOException: Inconsistent file header. Sequence chr1 not present in the reference sequences at ngsep.alignments.io.ReadAlignmentFileReader.loadHeader(ReadAlignmentFileReader.java:192) at ngsep.alignments.io.ReadAlignmentFileReader.init(ReadAlignmentFileReader.java:167) at ngsep.alignments.io.ReadAlignmentFileReader.<init>(ReadAlignmentFileReader.java:82) at ngsep.discovery.AlignmentsPileupGenerator.createReader(AlignmentsPileupGenerator.java:335) at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:302) at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:292) at ngsep.discovery.SingleSampleVariantsDetector.findSNVS(SingleSampleVariantsDetector.java:905) at ngsep.discovery.SingleSampleVariantsDetector.run(SingleSampleVariantsDetector.java:621) at ngsep.discovery.SingleSampleVariantsDetector.main(SingleSampleVariantsDetector.java:572) ... 5 more

which is seems to be an issue needed to specify which chromosome I am working with but the thing is that I cannot find I flag to specifiy such an option in the available ones.
that was the code I ran, the variables refers to the files path,

java -jar $NGSEP SingleSampleVariantsDetector -i $BAM -r $ref -o chr22_NGSEP -sampleId NA12877

can you help with that please ?
Thanks,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Jorge Duitama - 2022-01-06
  
  Dear Mahmoud
  
  Thanks for your interest in NGSEP. This error occurs when the reference genome used to run the variants detector is different than the file used to map the reads. If you only want variants in chr22, you can filter the bam. However, please keep the same reference file because, depending on how you filter the bam, the header could still have all the chromosomes, which makes the software fail. You can also use the option "-querySeq" of the SingleSampleVariantsDetector to call variants only on chromosome 22. The alternative making a previous filter of the bam file is a bit quicker but in any case, please use the complete reference genome.
  
  Let me know how things go
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Mahmoud Bassyouni - 2022-01-09

thanks for the reply Jorge. the thing is the "-querySeq" flag takes a string not file this means that it won't be viable I guess to use on the terminal for the whole chromosome and also the thing with using the whole refernce file didn't go through as there was a chromosome missing from the refernce file as the tool giving me so I am not pretty sure what should I do here...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Jorge Duitama - 2022-01-10
  
  Hi Mahmoud
  
  I do not understand this issue. In this case you still need to provide the reference fasta file with the -r option. With the otion -querySeq you tell the software that you only want to process one sequence ("chr22" in your case).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Mahmoud Bassyouni - 2022-01-09

so now there is another update, I have got another refernce genome that supposedly has all the decoys implented with it. I tried to ran it using this script java -jar $NGSEP SingleSampleVariantsDetector -i /home/ionadmin/bassyouni/source_bam/NA12877_chr22.bam -r /home/ionadmin/bassyouni/GRCh38_full_analysis_set_plus_decoy_hla.fa -o chr22_NGSEP -sampleId NA12877

it went with another erro from the same type

Jan 09, 2022 2:23:20 PM ngsep.discovery.SingleSampleVariantsDetector run INFO: Loaded 3366 sequences Jan 09, 2022 2:23:20 PM ngsep.discovery.SingleSampleVariantsDetector findSNVS INFO: Finding variants Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at ngsep.NGSEPcore.main(NGSEPcore.java:66) Caused by: java.io.IOException: Inconsistent file header. Sequence KN707606.1 not present in the reference sequences at ngsep.alignments.io.ReadAlignmentFileReader.loadHeader(ReadAlignmentFileReader.java:192) at ngsep.alignments.io.ReadAlignmentFileReader.init(ReadAlignmentFileReader.java:167) at ngsep.alignments.io.ReadAlignmentFileReader.<init>(ReadAlignmentFileReader.java:82) at ngsep.discovery.AlignmentsPileupGenerator.createReader(AlignmentsPileupGenerator.java:335) at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:302) at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:292) at ngsep.discovery.SingleSampleVariantsDetector.findSNVS(SingleSampleVariantsDetector.java:905) at ngsep.discovery.SingleSampleVariantsDetector.run(SingleSampleVariantsDetector.java:621) at ngsep.discovery.SingleSampleVariantsDetector.main(SingleSampleVariantsDetector.java:572) ... 5 more

so obviosuly it's giving me here that the sequence KN707606.1 is not in the ref file but I checked it myself through grepping it with grep "KN707606.1" GRCh38_full_analysis_set_plus_decoy_hla.fa
and it was there

>chrUn_KN707606v1_decoy AC:KN707606.1 gi:734691250 LN:2200 rl:decoy M5:20c768ac79ca38077e5012ee0e5f8333 AS:hs38d1

so what do you think might be wrong ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Jorge Duitama - 2022-01-10
  
  Hi Mahmoud
  
  The issue is still the same because the name of the sequence in the fasta file is "chrUn_KN707606v1_decoy" and the name in the bam header is "KN707606.1". Ideally, you need to provide the exact reference sequence that was used to generate the bam file. However, in bam files generated by some human genetics projects, they do not have a good standard on what is the reference and they do not make it available, which is a pain for many people.
  
  If you do not have access to the exact reference genome, an alternative is to use samtools reheader to generate a new bam file having in the header only the chromosomes that you want to process. You may also need samtools view to generate a bam file from a sam file. If you manage to do so, then you can again use the chromosome 22 as reference. Use samtools faidx to have a small file with the names of the sequences in the fasta file and make sure that they correspond exactly with those in the header of the bam file. Double check both names and lengths.
  
  Let me know how things go.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Mahmoud Bassyouni - 2022-01-10
    
    Alright will do that and will get back to you if any thing changes
    happened, thank you so much for helping! Much appreciated …,
    
    On Mon, 10 Jan 2022 at 11:23 PM Jorge Duitama jduitama@users.sourceforge.net wrote:
    
    Hi Mahmoud
    
    The issue is still the same because the name of the sequence in the fasta
    file is "chrUn_KN707606v1_decoy" and the name in the bam header is
    "KN707606.1". Ideally, you need to provide the exact reference sequence
    that was used to generate the bam file. However, in bam files generated by
    some human genetics projects, they do not have a good standard on what is
    the reference and they do not make it available, which is a pain for many
    people.
    
    If you do not have access to the exact reference genome, an alternative is
    to use samtools reheader to generate a new bam file having in the header
    only the chromosomes that you want to process. You may also need samtools
    view to generate a bam file from a sam file. If you manage to do so, then
    you can again use the chromosome 22 as reference. Use samtools faidx to
    have a small file with the names of the sequences in the fasta file and
    make sure that they correspond exactly with those in the header of the bam
    file. Double check both names and lengths.
    
    Let me know how things go.
    
    SingleSampleVariantsDetector
    https://sourceforge.net/p/ngsep/discussion/faq/thread/912993e6a2/?limit=25#b3af/2c56
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/ngsep/discussion/faq/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    alternate
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Mahmoud Bassyouni - 2022-01-13

Thanks @jduitama, I have found the refernrnce genome that they used and it went through perfect, Thanks again for helping much appreciated!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

SingleSampleVariantsDetector

NGSEP (Next Generation Sequencing Experience Platform)

Forums

Help

SingleSampleVariantsDetector

Let me know how things go.

SingleSampleVariantsDetector

NGSEP (Next Generation Sequencing Experience Platform)

Forums

Help

SingleSampleVariantsDetector document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Let me know how things go.

SingleSampleVariantsDetector