Menu

SingleSampleVariantsDetector

2022-01-05
2022-01-13
  • Mahmoud Bassyouni

    Hi,
    I am trying to run the single sample variant detector script on a specific chromosome which is 22. I got the BAM file for the same chromosome and also the Fasta file is for the same chromosome only. it gets me this error

    Exception in thread "main" java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at ngsep.NGSEPcore.main(NGSEPcore.java:66)
    Caused by: java.io.IOException: Inconsistent file header. Sequence chr1 not present in the reference sequences
        at ngsep.alignments.io.ReadAlignmentFileReader.loadHeader(ReadAlignmentFileReader.java:192)
        at ngsep.alignments.io.ReadAlignmentFileReader.init(ReadAlignmentFileReader.java:167)
        at ngsep.alignments.io.ReadAlignmentFileReader.<init>(ReadAlignmentFileReader.java:82)
        at ngsep.discovery.AlignmentsPileupGenerator.createReader(AlignmentsPileupGenerator.java:335)
        at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:302)
        at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:292)
        at ngsep.discovery.SingleSampleVariantsDetector.findSNVS(SingleSampleVariantsDetector.java:905)
        at ngsep.discovery.SingleSampleVariantsDetector.run(SingleSampleVariantsDetector.java:621)
        at ngsep.discovery.SingleSampleVariantsDetector.main(SingleSampleVariantsDetector.java:572)
        ... 5 more
    

    which is seems to be an issue needed to specify which chromosome I am working with but the thing is that I cannot find I flag to specifiy such an option in the available ones.
    that was the code I ran, the variables refers to the files path,

    java -jar $NGSEP SingleSampleVariantsDetector -i $BAM -r $ref -o chr22_NGSEP -sampleId NA12877
    

    can you help with that please ?
    Thanks,

     
    • Jorge Duitama

      Jorge Duitama - 2022-01-06

      Dear Mahmoud

      Thanks for your interest in NGSEP. This error occurs when the reference genome used to run the variants detector is different than the file used to map the reads. If you only want variants in chr22, you can filter the bam. However, please keep the same reference file because, depending on how you filter the bam, the header could still have all the chromosomes, which makes the software fail. You can also use the option "-querySeq" of the SingleSampleVariantsDetector to call variants only on chromosome 22. The alternative making a previous filter of the bam file is a bit quicker but in any case, please use the complete reference genome.

      Let me know how things go

       
  • Mahmoud Bassyouni

    thanks for the reply Jorge. the thing is the "-querySeq" flag takes a string not file this means that it won't be viable I guess to use on the terminal for the whole chromosome and also the thing with using the whole refernce file didn't go through as there was a chromosome missing from the refernce file as the tool giving me so I am not pretty sure what should I do here...

     
    • Jorge Duitama

      Jorge Duitama - 2022-01-10

      Hi Mahmoud

      I do not understand this issue. In this case you still need to provide the reference fasta file with the -r option. With the otion -querySeq you tell the software that you only want to process one sequence ("chr22" in your case).

       
  • Mahmoud Bassyouni

    so now there is another update, I have got another refernce genome that supposedly has all the decoys implented with it. I tried to ran it using this script java -jar $NGSEP SingleSampleVariantsDetector -i /home/ionadmin/bassyouni/source_bam/NA12877_chr22.bam -r /home/ionadmin/bassyouni/GRCh38_full_analysis_set_plus_decoy_hla.fa -o chr22_NGSEP -sampleId NA12877

    it went with another erro from the same type

    Jan 09, 2022 2:23:20 PM ngsep.discovery.SingleSampleVariantsDetector run
    INFO: Loaded 3366 sequences
    Jan 09, 2022 2:23:20 PM ngsep.discovery.SingleSampleVariantsDetector findSNVS
    INFO: Finding variants
    Exception in thread "main" java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at ngsep.NGSEPcore.main(NGSEPcore.java:66)
    Caused by: java.io.IOException: Inconsistent file header. Sequence KN707606.1 not present in the reference sequences
        at ngsep.alignments.io.ReadAlignmentFileReader.loadHeader(ReadAlignmentFileReader.java:192)
        at ngsep.alignments.io.ReadAlignmentFileReader.init(ReadAlignmentFileReader.java:167)
        at ngsep.alignments.io.ReadAlignmentFileReader.<init>(ReadAlignmentFileReader.java:82)
        at ngsep.discovery.AlignmentsPileupGenerator.createReader(AlignmentsPileupGenerator.java:335)
        at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:302)
        at ngsep.discovery.AlignmentsPileupGenerator.processFile(AlignmentsPileupGenerator.java:292)
        at ngsep.discovery.SingleSampleVariantsDetector.findSNVS(SingleSampleVariantsDetector.java:905)
        at ngsep.discovery.SingleSampleVariantsDetector.run(SingleSampleVariantsDetector.java:621)
        at ngsep.discovery.SingleSampleVariantsDetector.main(SingleSampleVariantsDetector.java:572)
        ... 5 more
    

    so obviosuly it's giving me here that the sequence KN707606.1 is not in the ref file but I checked it myself through grepping it with grep "KN707606.1" GRCh38_full_analysis_set_plus_decoy_hla.fa
    and it was there

    >chrUn_KN707606v1_decoy  AC:KN707606.1  gi:734691250  LN:2200  rl:decoy  M5:20c768ac79ca38077e5012ee0e5f8333  AS:hs38d1
    

    so what do you think might be wrong ?

     
    • Jorge Duitama

      Jorge Duitama - 2022-01-10

      Hi Mahmoud

      The issue is still the same because the name of the sequence in the fasta file is "chrUn_KN707606v1_decoy" and the name in the bam header is "KN707606.1". Ideally, you need to provide the exact reference sequence that was used to generate the bam file. However, in bam files generated by some human genetics projects, they do not have a good standard on what is the reference and they do not make it available, which is a pain for many people.

      If you do not have access to the exact reference genome, an alternative is to use samtools reheader to generate a new bam file having in the header only the chromosomes that you want to process. You may also need samtools view to generate a bam file from a sam file. If you manage to do so, then you can again use the chromosome 22 as reference. Use samtools faidx to have a small file with the names of the sequences in the fasta file and make sure that they correspond exactly with those in the header of the bam file. Double check both names and lengths.

      Let me know how things go.

       
      • Mahmoud Bassyouni

        Alright will do that and will get back to you if any thing changes
        happened, thank you so much for helping! Much appreciated …,

        On Mon, 10 Jan 2022 at 11:23 PM Jorge Duitama jduitama@users.sourceforge.net wrote:

        Hi Mahmoud

        The issue is still the same because the name of the sequence in the fasta
        file is "chrUn_KN707606v1_decoy" and the name in the bam header is
        "KN707606.1". Ideally, you need to provide the exact reference sequence
        that was used to generate the bam file. However, in bam files generated by
        some human genetics projects, they do not have a good standard on what is
        the reference and they do not make it available, which is a pain for many
        people.

        If you do not have access to the exact reference genome, an alternative is
        to use samtools reheader to generate a new bam file having in the header
        only the chromosomes that you want to process. You may also need samtools
        view to generate a bam file from a sam file. If you manage to do so, then
        you can again use the chromosome 22 as reference. Use samtools faidx to
        have a small file with the names of the sequences in the fasta file and
        make sure that they correspond exactly with those in the header of the bam
        file. Double check both names and lengths.

        Let me know how things go.

        SingleSampleVariantsDetector
        https://sourceforge.net/p/ngsep/discussion/faq/thread/912993e6a2/?limit=25#b3af/2c56


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/ngsep/discussion/faq/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

         
  • Mahmoud Bassyouni

    Thanks @jduitama, I have found the refernrnce genome that they used and it went through perfect, Thanks again for helping much appreciated!

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.