Leanne - 2007-07-27

I have just recently become involved in a new sequencing project.  I used to use Staden - oh 4-5 yrs ago.   Now I tried to bring in >1500 sequences and pregapped and then gap4 with a new installation of 1.7.0 on Redhat.  I am getting some wierd things.  For example.  Some sequences are clipped to 50 bases, and they look fine.  Others are not clipped and go out to 1400+ bases, giving huge regions that obviously have poor matching.  Some "aligned" sequences are a base off in the automatically generated contigs.  When I right click on a match from the find internal joins contig display, they don't come up aligned, but I need to manually click the "align" button. 

I don't know exactly where to start figuring out what is wrong.  I have tried assembling repeatedly and each time  get similar results.  These are AB1 reads.  I have tried with and without phred althoguh the quality clip was by confidence, as the AB files have a confidence value in them, I am using gap4/phrap.  -- maybe I should try it without phrap?.

Anyone else seeing these problems? Is it our installation, or some parameter I am setting wrong.  I have never dived into a project with this much data to start with.  This already has been analyzed and some finishing started, so there is a combination of vector primers (transposon mediated methodology on a bac) and finishing primers.  also alternative chemistry.  Of course file names were not created in any systematic way, so I am treating all the sequences the same.