Hi Adam, thanks again for your great work on pbjelly.
How can I trace back the locations which were gaps and then are filled after a PBjelly run?
Isn't somewhere a sort of bed file with the list of start(s) and end(s) of the newly filled interval relative to the contig?
This would be very important since, e.g, with low coverage pacbio reads one might want to correct the remap high confidence data (say , Illumina) and correct base calls in those intervals only, as they are likely affected by the high PacBio error rate (at low coverage).
This isn't something that's automatically output since the upgrade of PBJelly to do inter-scaffolding gap-filling. However, the information exists so that you can piece this information together.
The first file you'd look at is gap_fill_status.txt. This has two columns, the gap's name, and the status of the fill.
From there, you can look at both the gapInfo.bed and the assembly/gapNameFolder/fillingMetrics.json. Those 3 pieces show nearly everything there is to know about an individual gap.
When I get the time to do new development on PBJelly, I'll put this as something that should be created by PBJelly.
Log in to post a comment.