|
From: Ben O. <bo...@gm...> - 2015-06-09 16:51:51
|
Hello, I am a computer since student, and I am currently working on a new bioinformatics project. One of the problem I must solve in order to proceed with my project is the follow: "Given m sets of strings (N1, .., Nm) - find all the maximal m-shared substrings" - I will explain: (1) substrings that appear at least once in *each *set. (2) with length greater than "l" that cannot be extended without breaking (1). (3) we must be able to attribute each such substring to all its occurrences in all the subsets. So I am thinking that the -mum command is the right start for me. Since the first two bullets actually mean that I want to find MUM (right?) with minimum length of l. *But, *as you can see in bullet (1) - I want only MUMs that are shared at least once in *each *string. As I understood, if I use the -mum command with N1 as the reference and N2, N3,...Nm as the queries - it will output the shared MUM between N1 and N2, and the shared MUM between N1 and N3 and..... and the shared MUM between N1 and Nm. This is not what I want! Can you please help to figure out what to do? Thanks a lot! Ben |