Menu

MultisampleVariantsDetector and VCF output

2023-09-05
2023-09-06
  • Xavier Argout

    Xavier Argout - 2023-09-05

    Dear NGSEP team,
    I have just experienced a problem with the MultisampleVariantsDetector module.
    The analysis of 95 sorted bam samples with header seemed to run fine and finished with the message "INFO: Multisample Variants Detector Completed". The generated VCF file contained only the header lines, but no data lines with information about the SNP markers.
    The command line I used was
    java -jar NGSEPcore4.3.2.jar MultisampleVariantsDetector -r Theobromacacaocriollochr.v2.0.fna -maxAlnsPerStartPos 1000 .sorted.bam*.

    Also attached is a log file and VCF output.
    Any help would be greatly appreciated.

    Xavier

     

    Last edit: Xavier Argout 2023-09-05
  • Jorge Duitama

    Jorge Duitama - 2023-09-05

    Dear Xavier

    Thanks for your interest in NGSEP. The command has a typo because the asterisk should go before sorted.bam (it should be *sorted.bam), but I guess it is just a typo. Based on the log everything looks fine. Please share some information on how did you generate the BAM files. Make sure that the reference used in the MultisampleVariantsDetector is the exact same reference used to align reads. If possible, please send me the result of the following command

    samtools view -h CCKM23-041.sorted.bam | head -n 1000

    Best regards

     
  • Xavier Argout

    Xavier Argout - 2023-09-05

    Dear Jorge,
    thank you for your prompt answer.
    Yes it was a typo in my message, sorry.
    Mapping was done with Bowtie2 (in paired mode) and Bam files were generated with samtools : command line : samtools view -u CCKM23-041.header.sam | samtools sort -o CCKM23-041.sorted.bam
    Attached is the result of your command .
    Thanks again for your help.
    Xavier

     
  • Jorge Duitama

    Jorge Duitama - 2023-09-05

    Dear Xavier

    The bam file looks fine. The only thing that I see is that the name of the reference file used for mapping is Theobroma_cacao_criollo_chr.v2.0.fna and the name of the file used for variant calling is Theobromacacaocriollochr.v2.0.fna. Please use samtools faidx to verify if the two files have the exact same genome (including chromosome names) as follows:

    samtools faidx Theobroma_cacao_criollo_chr.v2.0.fna
    samtools faidx Theobromacacaocriollochr.v2.0.fna

    If they are the same, please share with me the sequence of chr1 to make a small internal test. You can do that with faidx as well:

    samtools faidx Theobromacacaocriollochr.v2.0.fna chr1 > chr1.fa

     
  • Xavier Argout

    Xavier Argout - 2023-09-05

    Dear Jorge,
    it is the same reference file I used for mapping and for NGSEP.
    Please find attached chr1.fa

     

    Last edit: Xavier Argout 2023-09-05
  • Jorge Duitama

    Jorge Duitama - 2023-09-06

    Dear Xavier

    I checked more closely at the commands used to sort the alignments and I ran a few tests and it seems like the issue happened at the sorting step. It looks like the samtools view command is not preserving the header of bowtie2, and then the samtools sort could be removing the RG tag from the alignments. The net effect is that your alignments in the sorted bam are missing a tag like this:

    RG:Z:CCKM23-041

    This tag is required by the MultisampleVariantsDetector (and as far as I remember by GATK as well) to know to which read group corresponds each alignment. I ran a small test adding this tag for a few alignments and the SNPs started showing. To reproduce this, you can download the attached modified version of your alignments and run a command like this:

    java -Xmx4g -jar /path/to/NGSEPcore_4.3.2.jar MultisampleVariantsDetector -r chr1.fa -o testMultiSample.vcf -maxAlnsPerStartPos 1000 firstReads.sam

    If possible, I think it is more simple to use picard SortSam (https://broadinstitute.github.io/picard/) to sort the alignments. Picard can receive directly the sam files from bowtie2 and it can generate a bam index within the same command. You can see our script runMappingBowtie in the training directory for further details.

    Let me know how things go

    Jorge

     
  • Xavier Argout

    Xavier Argout - 2023-09-06

    Dear Jorge,
    thank you for your advice. I sorted again the Bowtie2 sam files with Picard tools and MultisampleVariantsDetector now works!!!
    Problem solved!
    Thank you,
    Xavier

     
    👍
    1

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.