Hi, I'm using NGESP mainly for calling and filtering variants. Sometimes i would like to have the variants called by using a bed file. I can of course read in a bed file and add the values to the NGSEP command but this is a bit cumbersome
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your interest in NGSEP. To do this or to offer an alternative, we would need to have more information on which type of data are you storing in BED files. In general BED files are files to describe genomic regions that people use for many different purposes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-08-09
NGSEP has now the following parameters
-querySeq STRING : Call variants just for this sequence.
-first INT : Call variants just from this position in the
given query sequence.
-last INT : Call variants just until this position in the
given query sequence.
Great, thanks. We can definitely consider adding this feature in future versions. We have not done this before, mainly because we usually build first a complete VCF with all the information that we can extract from the reads, and then we use the VCFFilter command to select regions for different applications. You can try this while we work on the new feature. You can also use samtools view to filter the alignments, and then call NGSEP on the filtered alignments file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-08-17
Hi Jorge,
I was about to open a new discussion for my request but I guess this discussion is appropriate to my need as well.
I am looking for a way to retrieve the genotype data of a whole-genome resequenced durum wheat (Triticum turgidum spp. durum) at a set of >3,000 SNP loci which have been used to genotype a large population (259) of durum wheat accessions. I have the SNP positions of the Infinium (Illumina) array used in that study and I need to compare those genotypes to the corresponding positions of my resequenced landrace.
Is NGSEP appropriate to do this? In case, which module and parameters should be used?
Thanks a lot in advance for your advice.
Filippo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your case is more complicated than the case described above. The first task would be to find the exact locations of the SNPs. Assuming that each SNP has a surrounding sequence, you can use any mapping tool (blast, bwa, bowtie2, etc) to map the sequences in the genome. However, this will not give you automatically the position of the SNP. I am not aware of any tool able to map Array SNPs to a genome (for sure NGSEP does not do that job).
Once you manage to do that, I would recommend you to make a VCF file (not a bed file) to retain the information of reference and alternative allele for each SNP. That file can be provided to either the single sample variants detector or to the multisample variants detector (option -knownVariants). In both cases, the command will genotype the specific locations provided in the VCF file.
Hope this helps
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2022-08-26
Hi Jorge,
thanks for your reply and help. I hoped there was a system to automatically map SNPs to a reference genome.
As for the second part of your suggestion (the VCF file) I do not understand how to manage it. Anyway, I'll find a solution.
Kind regards,
Filippo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I'm using NGESP mainly for calling and filtering variants. Sometimes i would like to have the variants called by using a bed file. I can of course read in a bed file and add the values to the NGSEP command but this is a bit cumbersome
Thanks for your interest in NGSEP. To do this or to offer an alternative, we would need to have more information on which type of data are you storing in BED files. In general BED files are files to describe genomic regions that people use for many different purposes.
NGSEP has now the following parameters
-querySeq STRING : Call variants just for this sequence.
-first INT : Call variants just from this position in the
given query sequence.
-last INT : Call variants just until this position in the
given query sequence.
Great, thanks. We can definitely consider adding this feature in future versions. We have not done this before, mainly because we usually build first a complete VCF with all the information that we can extract from the reads, and then we use the VCFFilter command to select regions for different applications. You can try this while we work on the new feature. You can also use samtools view to filter the alignments, and then call NGSEP on the filtered alignments file.
Hi Jorge,
I was about to open a new discussion for my request but I guess this discussion is appropriate to my need as well.
I am looking for a way to retrieve the genotype data of a whole-genome resequenced durum wheat (Triticum turgidum spp. durum) at a set of >3,000 SNP loci which have been used to genotype a large population (259) of durum wheat accessions. I have the SNP positions of the Infinium (Illumina) array used in that study and I need to compare those genotypes to the corresponding positions of my resequenced landrace.
Is NGSEP appropriate to do this? In case, which module and parameters should be used?
Thanks a lot in advance for your advice.
Filippo
Hi Filippo
Your case is more complicated than the case described above. The first task would be to find the exact locations of the SNPs. Assuming that each SNP has a surrounding sequence, you can use any mapping tool (blast, bwa, bowtie2, etc) to map the sequences in the genome. However, this will not give you automatically the position of the SNP. I am not aware of any tool able to map Array SNPs to a genome (for sure NGSEP does not do that job).
Once you manage to do that, I would recommend you to make a VCF file (not a bed file) to retain the information of reference and alternative allele for each SNP. That file can be provided to either the single sample variants detector or to the multisample variants detector (option -knownVariants). In both cases, the command will genotype the specific locations provided in the VCF file.
Hope this helps
Jorge
Hi Jorge,
thanks for your reply and help. I hoped there was a system to automatically map SNPs to a reference genome.
As for the second part of your suggestion (the VCF file) I do not understand how to manage it. Anyway, I'll find a solution.
Kind regards,
Filippo
Hi Fillipo
No problem. Sorry for not being more helpful at this time. Let us know how things go.
Jorge