I have two unrelated questions regarding the operation of two modules for separate projects:
1- Does the "ReadsAligner" module work for mapping long reads (ONT) to a reference genome? Or does it only work with short reads?
2- I am using the "MultisampleVariantsDetector" module and would like to know which parameters I should adjust to call variants with a minimum of 5 reads of depth, and at least 15% of them being the reference allele.
I appreciate your attention.
Best regards, Diego
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
About the reads aligner, the reads aligner is actually working better at this moment with long reads. Make sure to tell the software that you have ONT reads using the option -p. The software will use this option to choose the algorithm based on minimizers. If the reads have high error rates you can reduce the k-mer length with the option -k to something like 15.
About variants detection, we usually do not perform filtering of variants or genotype calls at the discovery and genotyping stages. The goal of MultisampleVariantsDetector is to obtain a raw genomic variation database with as much as you can get from the aligned reads. Our command to perform filtering of variants and genotype calls is VCFFilter. In this command you can use the options -minRD to remove genotype calls based on a plain minimum read depth and the option -q to remove genotype calls based on quality score. Both we and other groups have shown in different papers that the latter option is better than a plain filter on percentage of allele calls because quality scores are the result of the Bayesian model that takes into account read quality scores. Genotype calls not passing filters become missing data. Then, you can use the option -m if you want to remove variants having a large number of missing data.
Best regards
Jorge
Last edit: Jorge Duitama 2023-09-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have two unrelated questions regarding the operation of two modules for separate projects:
1- Does the "ReadsAligner" module work for mapping long reads (ONT) to a reference genome? Or does it only work with short reads?
2- I am using the "MultisampleVariantsDetector" module and would like to know which parameters I should adjust to call variants with a minimum of 5 reads of depth, and at least 15% of them being the reference allele.
I appreciate your attention.
Best regards, Diego
Hi Diego
About the reads aligner, the reads aligner is actually working better at this moment with long reads. Make sure to tell the software that you have ONT reads using the option -p. The software will use this option to choose the algorithm based on minimizers. If the reads have high error rates you can reduce the k-mer length with the option -k to something like 15.
About variants detection, we usually do not perform filtering of variants or genotype calls at the discovery and genotyping stages. The goal of MultisampleVariantsDetector is to obtain a raw genomic variation database with as much as you can get from the aligned reads. Our command to perform filtering of variants and genotype calls is VCFFilter. In this command you can use the options -minRD to remove genotype calls based on a plain minimum read depth and the option -q to remove genotype calls based on quality score. Both we and other groups have shown in different papers that the latter option is better than a plain filter on percentage of allele calls because quality scores are the result of the Bayesian model that takes into account read quality scores. Genotype calls not passing filters become missing data. Then, you can use the option -m if you want to remove variants having a large number of missing data.
Best regards
Jorge
Last edit: Jorge Duitama 2023-09-01