tg_index -C -> gap5 "export as CAF" renames reads
Brought to you by:
awhitwham,
jkbonfield
Hi James,
assuming one has a file "bla.caf" and does the following:
tg_index -C bla.caf
gap5 bla.0.g5d
In gap5: File -> Export sequences -> as CAF
then all the reads in the resulting "bla.0.caf" file will have been renamed by appending a dot and then some (gap5 internal?) number. I'm not sure whether there's an intention behind the behaviour, but I filed it as "bug" as I think that this should not be. Export as SAM e.g. does not do that, while export as ACE does something else (appen ".f" and ".r")
Best,
Bastien
Looking at this code it appears a bit dumb at present. If your read doesn't contain a dot then it assumes it lacks suffixes and fwd/rev pairs will have the same name. Then it just appends ".<record_id>".
Ideally it'd spot duplicates and resolve the conflicts, but it may be very slow to implement. I'll experiment. I've already found a related bug as contig names can clash with reading names too.