NGSEP / Discussion / Frequently Asked Questions: MultisampleVariantsDetector and VCF output

Xavier Argout - 2023-09-05

Dear NGSEP team,
I have just experienced a problem with the MultisampleVariantsDetector module.
The analysis of 95 sorted bam samples with header seemed to run fine and finished with the message "INFO: Multisample Variants Detector Completed". The generated VCF file contained only the header lines, but no data lines with information about the SNP markers.
The command line I used was
java -jar NGSEPcore4.3.2.jar MultisampleVariantsDetector -r Theobromacacaocriollochr.v2.0.fna -maxAlnsPerStartPos 1000 .sorted.bam*.

Also attached is a log file and VCF output.
Any help would be greatly appreciated.

Xavier

Last edit: Xavier Argout 2023-09-05

Log_MultisampleVariantsDetector.gz

variants.vcf.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2023-09-05

Dear Xavier

Thanks for your interest in NGSEP. The command has a typo because the asterisk should go before sorted.bam (it should be *sorted.bam), but I guess it is just a typo. Based on the log everything looks fine. Please share some information on how did you generate the BAM files. Make sure that the reference used in the MultisampleVariantsDetector is the exact same reference used to align reads. If possible, please send me the result of the following command

samtools view -h CCKM23-041.sorted.bam | head -n 1000

Best regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Xavier Argout - 2023-09-05

Dear Jorge,
thank you for your prompt answer.
Yes it was a typo in my message, sorry.
Mapping was done with Bowtie2 (in paired mode) and Bam files were generated with samtools : command line : samtools view -u CCKM23-041.header.sam | samtools sort -o CCKM23-041.sorted.bam
Attached is the result of your command .
Thanks again for your help.
Xavier

head.txt.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2023-09-05

Dear Xavier

The bam file looks fine. The only thing that I see is that the name of the reference file used for mapping is Theobroma_cacao_criollo_chr.v2.0.fna and the name of the file used for variant calling is Theobromacacaocriollochr.v2.0.fna. Please use samtools faidx to verify if the two files have the exact same genome (including chromosome names) as follows:

samtools faidx Theobroma_cacao_criollo_chr.v2.0.fna
samtools faidx Theobromacacaocriollochr.v2.0.fna

If they are the same, please share with me the sequence of chr1 to make a small internal test. You can do that with faidx as well:

samtools faidx Theobromacacaocriollochr.v2.0.fna chr1 > chr1.fa

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Xavier Argout - 2023-09-05

Dear Jorge,
it is the same reference file I used for mapping and for NGSEP.
Please find attached chr1.fa

Last edit: Xavier Argout 2023-09-05

chr1.fa.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jorge Duitama - 2023-09-06

Dear Xavier

I checked more closely at the commands used to sort the alignments and I ran a few tests and it seems like the issue happened at the sorting step. It looks like the samtools view command is not preserving the header of bowtie2, and then the samtools sort could be removing the RG tag from the alignments. The net effect is that your alignments in the sorted bam are missing a tag like this:

RG:Z:CCKM23-041

This tag is required by the MultisampleVariantsDetector (and as far as I remember by GATK as well) to know to which read group corresponds each alignment. I ran a small test adding this tag for a few alignments and the SNPs started showing. To reproduce this, you can download the attached modified version of your alignments and run a command like this:

java -Xmx4g -jar /path/to/NGSEPcore_4.3.2.jar MultisampleVariantsDetector -r chr1.fa -o testMultiSample.vcf -maxAlnsPerStartPos 1000 firstReads.sam

If possible, I think it is more simple to use picard SortSam (https://broadinstitute.github.io/picard/) to sort the alignments. Picard can receive directly the sam files from bowtie2 and it can generate a bam index within the same command. You can see our script runMappingBowtie in the training directory for further details.

Let me know how things go

Jorge

firstReads.sam

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Xavier Argout - 2023-09-06

Dear Jorge,
thank you for your advice. I sorted again the Bowtie2 sam files with Picard tools and MultisampleVariantsDetector now works!!!
Problem solved!
Thank you,
Xavier

👍
1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

MultisampleVariantsDetector and VCF output

NGSEP (Next Generation Sequencing Experience Platform)

Forums

Help

MultisampleVariantsDetector and VCF output

MultisampleVariantsDetector and VCF output

NGSEP (Next Generation Sequencing Experience Platform)

Forums

Help

MultisampleVariantsDetector and VCF output document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

MultisampleVariantsDetector and VCF output