Hi there, new to NGSEP but interesting in comparing it with other GbS data pipelines. I am running a fastq file through the demultiplex step, but as far as I can see there doesn't seem to be any facility for quality control ie removing reads where the quality score falls below a chosen threshold. Is this correct? And if so, is this a big drawback of using NGSEP on raw data compared with some of the other packages out there?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your interest in NGSEP. It is true that we do not have a specific functionality for initial quality control. In our experience, the main issue with GBS data is 3' end adaptor contamination so, to take care of that issue, there is a specific field in the demultiplexing facility to add the first basepairs of the adapter sequence, so that the demultiplexing tool can perform removal of adapter contamination. We plan to have this feature as a separate functionality for cases where the data is already demultiplexed. Rather than that, it is always possible to use tools such as trimmomatic to perform an initial quality filtering before demultiplexing.
For the reference-guided pipeline, bad reads are not a big issue because they normally do not map to the reference. For the de-novo analysis that we just released, these reads could make some noise but they are not likely to create false clusters, so they should also be removed from the analysis.
In any case it would be great if you can let us know the outcomes of your comparison. Please also let us know if you have any issues running NGSEP.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there, new to NGSEP but interesting in comparing it with other GbS data pipelines. I am running a fastq file through the demultiplex step, but as far as I can see there doesn't seem to be any facility for quality control ie removing reads where the quality score falls below a chosen threshold. Is this correct? And if so, is this a big drawback of using NGSEP on raw data compared with some of the other packages out there?
Hi
Thanks for your interest in NGSEP. It is true that we do not have a specific functionality for initial quality control. In our experience, the main issue with GBS data is 3' end adaptor contamination so, to take care of that issue, there is a specific field in the demultiplexing facility to add the first basepairs of the adapter sequence, so that the demultiplexing tool can perform removal of adapter contamination. We plan to have this feature as a separate functionality for cases where the data is already demultiplexed. Rather than that, it is always possible to use tools such as trimmomatic to perform an initial quality filtering before demultiplexing.
For the reference-guided pipeline, bad reads are not a big issue because they normally do not map to the reference. For the de-novo analysis that we just released, these reads could make some noise but they are not likely to create false clusters, so they should also be removed from the analysis.
In any case it would be great if you can let us know the outcomes of your comparison. Please also let us know if you have any issues running NGSEP.