I was just wondering what are the plans for gap4 and gap5. I have had a quick look through the website and software docs but cant see anything.
Are they going to be merged? or are gap4 going to remain for sanger sequencing and gap5 for NGS?
To be honest I dont really mind just wanted to know the plan.
The original intention was that I'd just update gap4 to cope with more reads, but it proved such a thony problem (ie gap4 was so old and crufty) that it was simply not practical. I had to do some radical redesign of how data is stored; separating sequences from their position/orientation, etc. Gap4 also had the notion of data being in a list - reading 1 joins to reading 2 which joins to reading 3, etc. Gap5 has a hierarchical tree to hold data, turning many algorithms from O(N) to O(log(N)) complexity. Put simply they're never going to be able to be merged as they're so different at their hearts.
Gap5 will *eventually* replace gap4, but clearly people aren't going to move away from Gap4 and start using Gap5 for smaller projects (where gap4 works just fine) until there are no losses in functionality. So I don't expect the complete replacement of gap4 to be any time soon. There are just so many functions and tools in gap4 which I haven't implemented in gap5, not to mention gap4 is obviously far better tested and relatively bug free.
One thing I forgot - gap5 can be used with Sanger sequencing data fine, if you don't mind the current lack of proper alignment between sequences and traces. (The traces are displayed, but not always at the correct point; on the to-do list.) We have quite a lot of mixed illumina + capillary projects in Gap5 now.
When we were doing this in Gap4 we were having to store only fragmented illumina consensus sequences rather than the raw reads themselves.