I have sequencing reads generated usuing the a HyperMu Transposon Kit; however, I need to remove the transposon sequence from the reads prior to adding them to my database. I have tried 'vector clipping' in pregap4 version 1.4b1.
Then when I bring the sequences into gap4 version 4.9 using 'normal shotgun assembly'. When I look at the reads in gap4 the transposon sequence is still there just highlighted bright pink. This is not the pink that indicates sequence that has been removed, this sequence is still there and interfers with the formation of contigs.
These programs are running on an Apple Computer with MacOSx version 10.3.9.
Thanks in advance for any assistance that you can give me.
My usual way of working around this problem (and I've seen it many many times!) is to completely remove the offending sequence. In the contig editor, click on the very last base at the end of the offending read. Press control and an arrow key: the arrow key is to move the cursor INTO the offending read.
So, if you are at the left end of the read, press control and right, and if you are the right end, press control and left.
This has the effect of masking/removing the bases from the contig and they won't interfere with assemblies again.
After removing the offending sequence after it is in the database using contig editor sounds good. However, after the sequence is cleaned how do I get them to assemble with the other reads in the database? I guess what I am saying is if I use the shotgun assembly on all of the reads in the databse will that work since I am not bring in new reads?
Easy: Click on the View menu and choose Find Internal Joins.
Now this bit is a touch tricky. The options allow you to "probe" with either all contigs or probe with a single contig. By default, it selects probe with all contigs. Also by default, it searches all other contigs. This is why it's tricky: if you have a large database, your all-against-all searches can take a long time.
Other options: two alignment methods: quick and sensitive. They do what it says on the tin. The quick one is quick and the sensitive one is really slow but really sensitive. Play around with the options and see what you get.
One tip: by default, it sets the maximum mismatch at 30%. I don't know about you, but I never join anything with 30% mismatch. Dropping down the max mismatch makes the search go a bit faster and regardless, you won't need to see those results. The speed increase is especially true for an all-against-all on a large database.
However, the 30% mismatch is useful for short overlaps...
Manually clipping and then using find internal joins to put the data back together is a valid solution, but it's not ideal when you've got hundreds or more sequences to do.
Getting pregap4 to clip it properly is a better solution. However I've no idea quite why your transposons aren't being removed. Pregap4 itself uses a program called vector_clip which can be run on the command line. "vector_clip -t -T" (the -T seems to be undocumented) will work in test mode (it doesn't update any experiment file entries) and also displays the alignments it performs (the -T bit). This can be handy for manually playing to see what the impact of different parameters will be.
I can't really help with this though, unless I have both the transposon vector sequence and a sample of your files. Even then it wouldn't be quick as I've got lots of other stuff to do too.
You may also wish to just take the easy option out and try crossmatch for vector removal. It's not an ideal approach as it doesn't care what it throws away as long as it looks like something (anything!) in the vector file, but generally that's sufficient.
Thanks for your help. I brought the sequences into Gap4 and removed the transposon sequence using the "control and arrow keys", as suggested by eKstreme - pierrefar. I was then able to get the reads to join/form contigs by repeating the normal shotgun assembly. Then using the output information in the output window I can manually join any remaining reads into contigs.
However, I have encountered one problem. In some cases a contig would form in one of the reads was in the -1 orientation instead of the +1 (How can I get the complement of a sequence????).
Also, could you tell me how to create a file of my transposon sequence so that the vector_clip program will recognize this sequence in the future?
I really appreciate all of the help you have given me.
To complement a sequence: Search for it, and when it appears, right-click and select template display. At the bottom, there is a thick black bar representing the sequence. Right-click on it and select "complement". Tada.
Vector removal: You just add the sequence to the \Program Files\Staden Package\tables\vectors folder (not fasta, just a raw sequence file, call it hypermu.seq). Then select that as your
There's several ways to complement contig. Either use the main gap4 menu and select Complement Contig, or if you know the contig in the contig selector highlight over it and right click to get the menu, which also contains complement contig.
Note that you cannot complement a single reading unless it also happens to be the only reading in the contig. The reason is that obviously that would break alignments to other readings, so think about everything at a contig level and it becomes obvious (I hope).
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.