report hits in sense orientation although matched minus
Brought to you by:
c4chris
Hi,
I have EST sequence in plus orientation but it matches on minus strand of chromosome. With "-R 1" I can speedup SIBsim4 run and force it align to the minus strand only. However, I cannot affect how the result is printed. I would like to get the result printed in same orientation as my query input. Currently I have to interpret the alignments from bottom right to upper left while having to invert the bases.
Thanks!
M.
Hi,
Thanks for your message.
If I understand correctly, you would like an option to print the reverse-strand of the DNA sequence rather than the reverse-strand of the RNA when the match is on the opposing strand ? I can look into this, but it is not completely straightforward to implement due to the way things are currently coded.
Hi Chris, yes I believe mostly users use SIBsim4 to align mRNA sequences to a chromosome. It is therefore expected to interpret the match in "sense" orientation regardless the fact is the match is to a minus or plus strand of the chromosome.
I did not even bother to have a look into your code but I believe you can keep the implementation and just reverse the final output (print reverse complementary sequence
and swap the coordinates). That should be trivial. Thanks!
Ok
I agree it would make sense. I'll look into implementing this.
Hi Chris,
I just discovered there is a new 0.20 release, congratulations. Any news on this feature request? ;-) Or a chance to get into say a separate FASTA-formatted file sequence of the chromosomal match split line-by-line into exons? Please note current -A 4 includes the GTA...CAG in the sequence which is not not what I want to get included in the output. ;)
Martin
Hi Martin,
The current code does not lend itself easily to implement the reversal of the matches before printing.
I intend to implement that functionality in a perl script.
I'll post here when it's ready to test/use...
Cheers,
Christian
Hi Martin,
I produced a script that processes SIBsim4 -A=4 output and turns the alignment around:
http://sibsim4.cvs.sourceforge.net/viewvc/\*checkout*/sibsim4/SIBsim4/force_mRNA_plus_strand?revision=1.1
Please try and let me know how it goes.
Cheers,
- Christian
Hi Christian,
thank you for your work on this. The script seems to correctly reverse-complement the sequence. But, it keeps the line "(complement)" in the output, the regions of each exon are in the same order as before and the arrows "<-" also as well. In brief, the summary in -A 4 output is untouched, which is confusing. There used to be an an extra trailing newline at the very end of the last output.
In additon, could the bases corresponding to an intron be in lowercase? For example:
gtc...cag
>>>...>>>
ctg...tac
<<<...<<<
Finally, the best for me would be to get out FASTA file of the chromosomal regions, with looong lines split by "\n" at every exon end. A proposal for "-A 5" switch I think. ;-)
Hi Martin,
It is correct that the only piece that is touched by the script is the alignment itself, which is what I thought you wanted to see in the mRNA orientation.
I'm not sure I fully understand your -A 5 proposition. A couple examples, both on the forward and reverse strands, of exactly what you would expect to see in the output might go a long way in clarifying things...