Hello, is there any option for the MultisampleVariantsDetector to show ADP for SNVs? One option would be by running the VCFAnnotate command after having the first VCF (I am not sure if I'll have the ADPs after this)?
THanks!
Paula E
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your interest in NGSEP. For SNVs we decided long time ago to use the field BSDP, which stores ACGT counts (in that order). The main advantage is that we can retain counts of probably erroneous base calls. In principle we do not record the ADP field for SNPs but it is possible to calculate this information with a bit of scripting if needed.
If your goal is to infer (tetraploid) allele dosages, I would recommend you to take a look to the ACN field instead. If the variants detector is executed in polyploid mode, this field should have the allele dosage estimation for each datapoint, based on our Bayeasian model for genotype calling. Our calculation of distance matrices and the diversity statistics take into account this field. You can also check the option GWASpoly of our VCFConverter if you plan to do GWAS.
Best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-11-30
Hello,
I also need the ADP field to run another software polyrad which requires reference and alternate allele depths. Is there any way to get that? This software does not read the BSDP format.
Thank you!
Harpreet
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I checked both the polyRAD software and the VCF specification because the AD format field was not formally included in the specification, but it was a custom field of GATK. PolyRAD looks like an alternative to NGSEP for variant calling, which does not seem to take into account quality scores. Anyways, I just added a script to the NGSEP distribution that generates AD format fields for VCF files generated by NGSEP. While the new version is formally released, you can try this script following these instructions:
Please note that the NGSEPcore_4.2.2.jar is provided as part of the classpath, not as an executable jar. Please also note that this jar should not be considered an official version of NGSEP, because it has not gone through our regular testing process. Once we release version 4.2.2 (by the end of this year), you can run only the last step using the official jar of the released distribution. The script will also be available in future versions of the software.
it would be great if you let us know your impressions about the genotype calls of polyRAD, compared to those provided directly by NGSEP in ployploid mode.
Best regards
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-12-06
Hello Jorge,
polyrad is still not reading the AD field. Though I can see it in newly generated vcf file. May header is needed explaining AD field??
Thank you!
Harpreet
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-12-06
Great, thanks so much!!
Harpreet
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-12-06
One more question -
Can we have zero DP and some positive values in two of the BSDP field? e.g DP=0 and BSDP=0,3,0,1.
If so, how?
I am getting few such cases in my vcf generated using DeNovoGBS tool.
Thanks again!
Harpreet
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
About the header, although i do not think that would be the issue, I just updated the script to generate the missing header. To try this change, you need to update the repository and recompile, or just follow again the four steps.
About the inconsistency between DP and BSDP fields, it is likely to be a bug tht probably would take me some time to check and correct. If possible, please send me a couple of VCF lines where you see this error. I will also try to reproduce it with my local datasets.
Best regards
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-12-19
Thanks for your help again, Jorge!
Here are couple of lines from the vcf file I generated using DenovoGBS tool from NGSEP where there is inconsistency between AD and BSDP field:
Thanks Harpreet. At first sight it seems to be an issue with some missing data points with relatively low read depth. However, I will take a look at the code and see if I can find the issue. In the mean time, if needed, you can recalculate total DP for the zero calls as the sum of the AD numbers.
Best regards
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, is there any option for the MultisampleVariantsDetector to show ADP for SNVs? One option would be by running the VCFAnnotate command after having the first VCF (I am not sure if I'll have the ADPs after this)?
THanks!
Paula E
Hi Paula
Thanks for your interest in NGSEP. For SNVs we decided long time ago to use the field BSDP, which stores ACGT counts (in that order). The main advantage is that we can retain counts of probably erroneous base calls. In principle we do not record the ADP field for SNPs but it is possible to calculate this information with a bit of scripting if needed.
If your goal is to infer (tetraploid) allele dosages, I would recommend you to take a look to the ACN field instead. If the variants detector is executed in polyploid mode, this field should have the allele dosage estimation for each datapoint, based on our Bayeasian model for genotype calling. Our calculation of distance matrices and the diversity statistics take into account this field. You can also check the option GWASpoly of our VCFConverter if you plan to do GWAS.
Best regards
Hello,
I also need the ADP field to run another software polyrad which requires reference and alternate allele depths. Is there any way to get that? This software does not read the BSDP format.
Thank you!
Harpreet
Hi Harpreet
I checked both the polyRAD software and the VCF specification because the AD format field was not formally included in the specification, but it was a custom field of GATK. PolyRAD looks like an alternative to NGSEP for variant calling, which does not seem to take into account quality scores. Anyways, I just added a script to the NGSEP distribution that generates AD format fields for VCF files generated by NGSEP. While the new version is formally released, you can try this script following these instructions:
git clone https://github.com/NGSEP/NGSEPcore
cd NGSEPcore
make
java -cp NGSEPcore_4.2.2.jar ngsep.vcf.VCFGenerateADField input.vcf > output.vcf
Please note that the NGSEPcore_4.2.2.jar is provided as part of the classpath, not as an executable jar. Please also note that this jar should not be considered an official version of NGSEP, because it has not gone through our regular testing process. Once we release version 4.2.2 (by the end of this year), you can run only the last step using the official jar of the released distribution. The script will also be available in future versions of the software.
it would be great if you let us know your impressions about the genotype calls of polyRAD, compared to those provided directly by NGSEP in ployploid mode.
Best regards
Jorge
Hello Jorge,
polyrad is still not reading the AD field. Though I can see it in newly generated vcf file. May header is needed explaining AD field??
Thank you!
Harpreet
Great, thanks so much!!
Harpreet
One more question -
Can we have zero DP and some positive values in two of the BSDP field? e.g DP=0 and BSDP=0,3,0,1.
If so, how?
I am getting few such cases in my vcf generated using DeNovoGBS tool.
Thanks again!
Harpreet
Dear Harpreet
About the header, although i do not think that would be the issue, I just updated the script to generate the missing header. To try this change, you need to update the repository and recompile, or just follow again the four steps.
About the inconsistency between DP and BSDP fields, it is likely to be a bug tht probably would take me some time to check and correct. If possible, please send me a couple of VCF lines where you see this error. I will also try to reproduce it with my local datasets.
Best regards
Jorge
Thanks for your help again, Jorge!
Here are couple of lines from the vcf file I generated using DenovoGBS tool from NGSEP where there is inconsistency between AD and BSDP field:
173 74 . T C 255 . NS=108;AN=2;AFS=397,35;OH=0.19;MAF=0.08 GT:PL:GQ:DP:BSDP:ACN ./.:0,8,180:0:6:0,0,0,6:4,0 0/0:0,21,510:51:17:0,0,0,17:4,0 ./.:30,16,240:0:9:0,1,0,8:4,0 0/0:0,13,300:42:10:0,
0,0,10:4,0 0/1:330,71,120:52:15:0,11,0,4:1,3 0/0:0,11,266:41:9:0,0,0,9:4,0 0/0:0,10,240:40:8:0,0,0,8:4,0 ./.:0,8,180:0:6:0,0,0,6:4,0 0/0:0,10,236:40:8:0,0,0,8:4,0 0/0:30,45,904:45:32:0,1,0,31:4,0 0/0:0,35,836:65:28
:0,0,0,28:4,0 0/0:0,11,270:41:9:0,0,0,9:4,0 0/0:0,18,416:47:14:0,0,0,14:4,0 0/0:0,14,330:44:11:0,0,0,11:4,0 ./.:90,28,218:0:11:0,3,0,8:4,0 0/0:0,19,445:49:15:0,0,0,15:4,0 ./.:0,6,132:0:5:0,0,0,5:4,0 0/0:0,34,810:64:27:0,0,0,27:4,0
./.:0,8,180:0:6:0,0,0,6:4,0 ./.:30,21,360:0:13:0,1,0,12:4,0 0/0:0,14,330:44:11:0,0,0,11:4,0 0/0:0,36,870:66:29:0,0,0,29:4,0 ./.:0,0,0:0:0:1,0,0,2:4,0 ./.:0,0,0:0:0:0,0,0,3:4,0
3853945 91 . G T 255 . NS=89;AN=2;AFS=317,39;OH=0.22;MAF=0.11 GT:PL:GQ:DP:BSDP:ACN ./.:90,44,630:0:24:0,0,21,3:4,0 ./.:0,0,0:0:0:0,0,1,0:4,0 ./.:60,17,120:0:6:0,0,4,2:4,0 ./.:0,0,0:0:0:0,0,2,0:4,0 ./.:0,8,180:0:6:0,0,6,0:4,0 ./.:0,9,206:0:7:0,0,7,0:4,0 ./.:0,0,0:0:0:0,0,0,0:4,0 ./.:0,0,0:0:0:0,0,1,0:4,0 ./.:0,8,180:0:6:0,0,6,0:4,0 0/1:312,81,480:255:27:0,0,16,11:2,2 0/0:0,11,270:41:9:0,0,9,0:4,0 0/0:0,20,476:50:16:0,0,16,0:4,0 ./.:30,17,270:0:10:0,0,9,1:4,0 0/1:265,59,120:55:13:0,0,4,9:1,3 0/0:0,11,270:41:9:0,0,9,0:4,0 0/0:0,10,240:40:9:1,0,8,0:4,0 0/0:0,10,240:40:8:0,0,8,0:4,0 0/0:0,34,787:64:27:0,0,27,0:4,0 ./.:60,46,792:0:29:0,0,27,2:4,0 ./.:60,31,446:0:17:0,0,15,2:4,0 0/0:0,10,240:40:8:0,0,8,0:4,0 ./.:0,0,0:0:0:0,0,1,0:4,0 0/0:0,23,535:52:18:0,0,18,0:4,0 0/0:1,13,260:42:10:0,0,10,0:4,0 ./.:30,11,120:0:5:0,0,4,1:4,0 ./.:60,18,150:0:7:0,0,5,2:4,0 0/1:150,36,210:84:12:0,0,7,5:2,2 0/0:0,20,458:50:16:0,0,16,0:4,0 ./.:0,0,0:0:0:0,0,2,0:4,0 ./.:192,44,60:0:9:0,0,2,7:4,0 0/1:177,53,350:94:19:0,0,13,6:3,1 0/0:0,13,296:42:10:0,0,10,0:4,0 0/0:0,11,270:41:9:0,0,9,0:4,0 ./.:0,0,0:0:0:0,0,3,3:4,0
I noticed these when I was trying to extract the information for AD field from BSDP format.
Best,
Harpreet
Thanks Harpreet. At first sight it seems to be an issue with some missing data points with relatively low read depth. However, I will take a look at the code and see if I can find the issue. In the mean time, if needed, you can recalculate total DP for the zero calls as the sum of the AD numbers.
Best regards
Jorge