Good day to you. We’ve been exploring NGSEP a little bit so far with our GBS data, I love the overall simplicity for the end-user, but we’ve run into a snag. We use JoinMap for our linkage analysis. It looks like we are getting some erroneous classification types during the conversion from .vcf to JoinMap, unless there is something that we don’t understand about the allowable outputs for the program. In the segregation types nnxnp and lmxll, which result in classifications of nn and np and lm and ll respectively, we are getting the occasional calls in the data for both mm and pp classifications, this should not exist and is flagged in JoinMap as an incorrect code. Could you please explain these occurrences and how we should be interpreting these situations? Thank you for your time.
Best Regards,
Jacob Snelling,
Horticulture
4160 Agriculture and Life Sciences Building
Oregon State University
Corvallis, OR 97331-7304
Many thanks for your interest in NGSEP. We are glad to know that you found the software easy to use. My first guess on the issue that you are describing is that the genotype calls that Join map is showing as erroneous are actually erroneous in the VCF file. If possible, please share a filtered VCF file including one or more SNPs with erroneous genotype calls so that I can take a look. For different reasons, errors like this unfortunately happen in every variant calling pipeline. There are several alternatives to reduce them. The first I recommend, would be to increase the minimum quality score using the filter functionality. If the percentage of SNPs affected with errors is not too big, you can just remove these SNPs. Given that you have GBS data, you shoud have plenty of SNPs to take conservative filtering decisions and still build a dense genetic map. Also, join map may have a specific function to remove only the erroneous datapoints or transform them into heterozygous, which would be the most likely correct genotype for a erroneous homozygous datapoint. Finally, if you see too many errors in one specific SNP, this may reflect a genotyping error in one of the parents for such SNP.
Let me know how things go.
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2018-07-31
Hi Jorge,
I believe you are right, this does stem from the vcf file, or, it seems more likely, in the actual conversion script. I've gotten the same errors from Stacks V2 vcf outputs as well as NGSEP from start to finish. I'm attaching and example of both the vcf and corresponding joinmap files. Just do a text seach in the joinmap file for pp or mm and then find the same variant in the vcf file. For CP type crosses, the only allowable classifications for these segregation types in JoinMap should be nnxnp = nn, np, -- or lmxll = lm, ll, --, the pp or mm classifications should not be possible. Thank you for your help.
I went over your files and I actually found an error that swapped the genotype information of the parents. Fortunately this should not affect the construction of the genetic map but it definitely looks weird. Giving a bit of thought to the main issue, I also decided to change the behavior of the converter when an inconsistent homozygous genotype is found in the VCF file. Instead of exporting the error, NGSEP will now issue a warning in the log file and generate an unknown genotype call ("--"). This fix will formally appear in the next release. In the mean time, one quick fix would be to eliminate the SNPs with inconsistent genotypes, which can be done with the "-frs" option of FilterVCF. If you prefer to try already the fixed (but stil unstable) version, you can clone the github repository:
And run again the converter with NGSEPcore_3.2.1.jar. For other functionalities different than this one, please keep using the official jar of the previous release (NGSEPcore_3.2.0.jar). Although the probability of having errors is currently not too big, version 3.2.1 has not formally passed through the sanity tests to ensure that everything is still working fine.
Let me know how things go
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2018-08-03
Thank you Jorge. I'll let you know when I've made some progress. I'm having some issues with both compiling and java after a recent system upgrade, so I haven't been able to test it out yet. We'll see what happens first, the next version, or working out the bugs in my system :)
Jacob
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2018-10-30
Hi Jorge. I just wanted to let you know that the JoinMap conversion is working properly after the most recent update. Thanks for your help again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Jacob. It is great for us to know that the new version worked for you. Feel free to write back if you have further quesrtions or issues with NGSEP.
Best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Jorge!!
I hope this email finds you well. I was wondering if between the options of
NGSEP there is a filtering option such as QualByDEpth of GATK.
Thanks in advance,
Julian.
Thanks Jacob. It is great for us to know that the new version worked for
you. Feel free to write back if you have further quesrtions or issues with
NGSEP.
First of all sorry for the delayed answer. I took a look to the QualByDepth calculation and we definitely do not have a similar filter. Reading the documentation, normalizing QUAL by depth sounds counterintuitive to me because in principle more evidence should be translated in better quality. They claim that "variants in regions with deep coverage can have artificially inflated QUAL scores" sounds more like an issue with the GATK model to calculate QUAL scores than an inherent aspect of the data.
In NGSEP the QUAL field is always less or equal than 255 and it is calculated as the maximum GQ value of the samples genotyped. I think that this calculation is consistent with the definition of QUAL in the VCF format, which is basically the probability of existance of the variant encoded as Phred score. The QualByDepth filter may be an indirect way to filter out some variants within duplications. For that case, in NGSEP you can use a catalog of repetitive elements to filter directly those regions. Moreover, if you have WGS and if the coverage distribution looks normal, you can call CNVs along with SNVs with the FindVariants command and then you can filter out SNVs (or small indels) within the predicted CNVs.
If the question is related to the thread above, you can always use the structure of the population to filter variants by MAF and observed heterozygosity.
Let me know your thougths or further questions on this matter.
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there,
Good day to you. We’ve been exploring NGSEP a little bit so far with our GBS data, I love the overall simplicity for the end-user, but we’ve run into a snag. We use JoinMap for our linkage analysis. It looks like we are getting some erroneous classification types during the conversion from .vcf to JoinMap, unless there is something that we don’t understand about the allowable outputs for the program. In the segregation types nnxnp and lmxll, which result in classifications of nn and np and lm and ll respectively, we are getting the occasional calls in the data for both mm and pp classifications, this should not exist and is flagged in JoinMap as an incorrect code. Could you please explain these occurrences and how we should be interpreting these situations? Thank you for your time.
Best Regards,
Jacob Snelling,
Horticulture
4160 Agriculture and Life Sciences Building
Oregon State University
Corvallis, OR 97331-7304
Sent from Mail for Windows 10
Hi Jacob
Many thanks for your interest in NGSEP. We are glad to know that you found the software easy to use. My first guess on the issue that you are describing is that the genotype calls that Join map is showing as erroneous are actually erroneous in the VCF file. If possible, please share a filtered VCF file including one or more SNPs with erroneous genotype calls so that I can take a look. For different reasons, errors like this unfortunately happen in every variant calling pipeline. There are several alternatives to reduce them. The first I recommend, would be to increase the minimum quality score using the filter functionality. If the percentage of SNPs affected with errors is not too big, you can just remove these SNPs. Given that you have GBS data, you shoud have plenty of SNPs to take conservative filtering decisions and still build a dense genetic map. Also, join map may have a specific function to remove only the erroneous datapoints or transform them into heterozygous, which would be the most likely correct genotype for a erroneous homozygous datapoint. Finally, if you see too many errors in one specific SNP, this may reflect a genotyping error in one of the parents for such SNP.
Let me know how things go.
Jorge
Hi Jorge,
I believe you are right, this does stem from the vcf file, or, it seems more likely, in the actual conversion script. I've gotten the same errors from Stacks V2 vcf outputs as well as NGSEP from start to finish. I'm attaching and example of both the vcf and corresponding joinmap files. Just do a text seach in the joinmap file for pp or mm and then find the same variant in the vcf file. For CP type crosses, the only allowable classifications for these segregation types in JoinMap should be nnxnp = nn, np, -- or lmxll = lm, ll, --, the pp or mm classifications should not be possible. Thank you for your help.
Jacob
Thanks Jacob
I went over your files and I actually found an error that swapped the genotype information of the parents. Fortunately this should not affect the construction of the genetic map but it definitely looks weird. Giving a bit of thought to the main issue, I also decided to change the behavior of the converter when an inconsistent homozygous genotype is found in the VCF file. Instead of exporting the error, NGSEP will now issue a warning in the log file and generate an unknown genotype call ("--"). This fix will formally appear in the next release. In the mean time, one quick fix would be to eliminate the SNPs with inconsistent genotypes, which can be done with the "-frs" option of FilterVCF. If you prefer to try already the fixed (but stil unstable) version, you can clone the github repository:
git clone https://github.com/NGSEP/NGSEPcore
Build the jar for version 3.2.1:
cd /path/to/NGSEPcore
make
And run again the converter with NGSEPcore_3.2.1.jar. For other functionalities different than this one, please keep using the official jar of the previous release (NGSEPcore_3.2.0.jar). Although the probability of having errors is currently not too big, version 3.2.1 has not formally passed through the sanity tests to ensure that everything is still working fine.
Let me know how things go
Jorge
Thank you Jorge. I'll let you know when I've made some progress. I'm having some issues with both compiling and java after a recent system upgrade, so I haven't been able to test it out yet. We'll see what happens first, the next version, or working out the bugs in my system :)
Jacob
Hi Jorge. I just wanted to let you know that the JoinMap conversion is working properly after the most recent update. Thanks for your help again.
Thanks Jacob. It is great for us to know that the new version worked for you. Feel free to write back if you have further quesrtions or issues with NGSEP.
Best regards
Hi Jorge!!
I hope this email finds you well. I was wondering if between the options of
NGSEP there is a filtering option such as QualByDEpth of GATK.
Thanks in advance,
Julian.
On Wed, Oct 31, 2018 at 8:31 AM Jorge Duitama jduitama@users.sourceforge.net wrote:
Hi Julian
First of all sorry for the delayed answer. I took a look to the QualByDepth calculation and we definitely do not have a similar filter. Reading the documentation, normalizing QUAL by depth sounds counterintuitive to me because in principle more evidence should be translated in better quality. They claim that "variants in regions with deep coverage can have artificially inflated QUAL scores" sounds more like an issue with the GATK model to calculate QUAL scores than an inherent aspect of the data.
In NGSEP the QUAL field is always less or equal than 255 and it is calculated as the maximum GQ value of the samples genotyped. I think that this calculation is consistent with the definition of QUAL in the VCF format, which is basically the probability of existance of the variant encoded as Phred score. The QualByDepth filter may be an indirect way to filter out some variants within duplications. For that case, in NGSEP you can use a catalog of repetitive elements to filter directly those regions. Moreover, if you have WGS and if the coverage distribution looks normal, you can call CNVs along with SNVs with the FindVariants command and then you can filter out SNVs (or small indels) within the predicted CNVs.
If the question is related to the thread above, you can always use the structure of the population to filter variants by MAF and observed heterozygosity.
Let me know your thougths or further questions on this matter.
Jorge