Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
Has anyone tried to assemble 454 sequence data using gap4 and if yes how many reads and how did it perform?
Plant Research International,
Use MIRA  for assembling 454 data if you like to work with gap4.
It produces CAF  (among others) which can easily be converted to gap4.
AFAIK gap4 cannot assemble directly 454 data.
 = http://chevreux.org/projects_mira.html
 = http://www.sanger.ac.uk/Software/formats/CAF/
But it gets slow with many reads (I have assembled some 450,000 ESTs) and
even slower with many tags.
That's why we're eagerly waiting for (working) gap5 .. :-)
For what it's worth, I'm not intending gap5 to be a sequence assembler. I'll have to add some basic assembling capabilities at some stage to assemble in finishing reads, but writing a full blown short-read assembler is something that multiple teams of people are toiling over and there's no way I can compete solo and still be writing an editor and viewer at the same time.
Instead my plan is to be able to import already assembled data in a variety of formats.
PS. As for assembly of 454 data, I think the celera assembly has a mixed assembly mode as does newbler. We have 454 (ace) to gap4 conversion tools here, but they're being worked on again and I don't have access to the latest source for those yet.
Older variants though can be found at
Actually my starting database obtained by MIRA (http://chevreux.org/projects_mira.html) is a CAF or a ACE format.
I start my conversions form CAF2GAP-->GAP2BAF-->tg_index>GAP5.
there is some direct way to do this?
sorry for my previous mail, I just have the CAF but I do not know how to give this to the caf2baf, from the README seems that it needs the uotput coming from gap2caf via pipe
gap2caf -project xx -verion 0 -ace xxx.gap | caf2baf > xxx.baf
but if I already have the caf, how to give this to caf2baf?
thank you and sorry for the confusion.
caf2baf < in > out
or less efficiently: cat in | caf2baf > out
The reason I produced baf (which probably isn't likely to stay around, but it's a stop-gap for now until a decent caf2sam comes along) is because caf is simply HIDEOUS to parse. Specifically there are no requirements on data being in a specific order, so the only real way to parse it involves loading the entire file into memory. It's too inefficient for modern assemblies.