From: William P. Y. H. <wil...@gm...> - 2005-12-31 09:46:26
Attachments:
strfind.m
|
The strfind function is well written, so I had little trouble updating it to be compatible with Matlab. Now I think it's quite ready for submission into Octave, so I would like Alois, as the original author, and others to comment. If there are no comments, I will post it to bu...@oc... and let John include it in Octave. The file is attached. -- William Poetra Yoga Hadisoeseno |
From: Alois S. <alo...@tu...> - 2006-01-02 11:07:55
|
William Poetra Yoga Hadisoeseno wrote: >The strfind function is well written, so I had little trouble updating >it to be compatible with Matlab. Now I think it's quite ready for >submission into Octave, so I would like Alois, as the original author, >and others to comment. If there are no comments, I will post it to >bu...@oc... and let John include it in Octave. > >The file is attached. > >-- >William Poetra Yoga Hadisoeseno > > Thanks for including the docu and test functions and converting it to octave style. Concering the additional support of cellstrings, I suggest to changes as attached. Moreover, the result in IDX = strfind(STR,PATTERN) should return a vector, not a cell with an vector element. The attached version does this. I'll check in the changes into Octave-forge. You can post it to bug-octave and ask John to include it. Alois |
From: William P. Y. H. <wil...@gm...> - 2006-01-03 09:22:17
Attachments:
strfind.m
|
On 1/2/06, Alois Schloegl <alo...@tu...> wrote: > William Poetra Yoga Hadisoeseno wrote: > > Concering the additional support of cellstrings, I suggest to changes as > attached. Moreover, the result in > IDX =3D strfind(STR,PATTERN) should return a vector, not a cell with an > vector element. The attached version does this. > Actually, my version returns a vector. It works by 'generalizing' the input, like this: 1. If the first arg is a string or char matrix, then put it into a cell array and take note of this. 2. Now the object we have is a cellstr (otherwise an error would be generated). For every element in the cellstr, check for PATTERN and put the starting indexes in IDX. 3. If the first arg was a string or char matrix, then take the vector inside IDX and assign it to IDX. This way, we can avoid recursion of strfind(). What do you think? > I'll check in the changes into Octave-forge. You can post it to > bug-octave and ask John to include it. > OK :) I fixed the help message and changed isstr to ischar in my version (attached). What do you think about it? -- William Poetra Yoga Hadisoeseno |
From: Alois S. <alo...@tu...> - 2006-01-02 11:17:26
Attachments:
strfind.m
|
William Poetra Yoga Hadisoeseno wrote: >The strfind function is well written, so I had little trouble updating >it to be compatible with Matlab. Now I think it's quite ready for >submission into Octave, so I would like Alois, as the original author, >and others to comment. If there are no comments, I will post it to >bu...@oc... and let John include it in Octave. > >The file is attached. > >-- >William Poetra Yoga Hadisoeseno > > I forgot the attachment, here is it. Alois ----- Thanks for including the docu and test functions and converting it to octave style. Concering the additional support of cellstrings, I suggest to changes as attached. Moreover, the result in IDX = strfind(STR,PATTERN) should return a vector, not a cell with an vector element. The attached version does this. I'll check in the changes into Octave-forge. You can post it to bug-octave and ask John to include it. Alois |
From: Alois S. <alo...@tu...> - 2006-01-03 09:49:58
|
illiam Poetra Yoga Hadisoeseno wrote: >On 1/2/06, Alois Schloegl <alo...@tu...> wrote: > > >>William Poetra Yoga Hadisoeseno wrote: >> >>Concering the additional support of cellstrings, I suggest to changes as >>attached. Moreover, the result in >> IDX = strfind(STR,PATTERN) should return a vector, not a cell with an >>vector element. The attached version does this. >> >> >> > >Actually, my version returns a vector. > Ok. I see, you need an extra "if" and a flag variable. >It works by 'generalizing' the >input, like this: >1. If the first arg is a string or char matrix, then put it into a >cell array and take note of this. >2. Now the object we have is a cellstr (otherwise an error would be >generated). For every element in the cellstr, check for PATTERN and >put the starting indexes in IDX. >3. If the first arg was a string or char matrix, then take the vector >inside IDX and assign it to IDX. >This way, we can avoid recursion of strfind(). What do you think? > > I prefer the recursive version. It gives cleaner and shorter code and needs less temporary variables. >>I'll check in the changes into Octave-forge. You can post it to >>bug-octave and ask John to include it. >> >> >> source-octave (instead of bug-octave) would be more appropriate. > >OK :) > >I fixed the help message and changed isstr to ischar in my version >(attached). What do you think about it? > > I changed isstr to ischar, too. >-- >William Poetra Yoga Hadisoeseno > > Best, Alois |
From: William P. Y. H. <wil...@gm...> - 2006-01-04 10:50:18
|
On 1/3/06, Alois Schloegl <alo...@tu...> wrote: > I prefer the recursive version. It gives cleaner and shorter code and > needs less temporary variables. > I see your point. Actually our point of views are different: you view it from the "cellstr as an extra functionality" viewpoint, while I view it from the "cellstr as a generalization of string" viewpoint. Well, I don't know, but I'm sometimes reluctant to do something recursively unless absolutely neccessary. And our approaches both have advantages and disadvantages. Your code's advantage is: char matrix is handled as fast as before; disadvantage: uses recursion. My code's advantage is: no recursion; disadvantage: char matrix is handled a bit slower. Actually there's another way to do it; I first thought of this idea, but later preferred my current code: handle char matrix and cellstr differently (like your code), but don't use recursion for cellstr. Instead, use a loop (like mine) for cellstr. This has the advantage that simple arguments (char matrices) are handled without loss of speed (inherited from your code) and that cellstr is handled without recursion (inherited from my code). The disadvantage is that the searching code is duplicated, which might cause maintenance problems later on. I still prefer my code, though :p > >>I'll check in the changes into Octave-forge. You can post it to > >>bug-octave and ask John to include it. > >> > source-octave (instead of bug-octave) would be more appropriate. > I've never posted to source-octave, and it seems that the list is very quie= t... -- William Poetra Yoga Hadisoeseno |
From: William P. Y. H. <wil...@gm...> - 2006-02-19 03:45:45
|
On 1/4/06, William Poetra Yoga Hadisoeseno <wil...@gm...> wrote: > On 1/3/06, Alois Schloegl <alo...@tu...> wrote: > > I prefer the recursive version. It gives cleaner and shorter code and > > needs less temporary variables. > > > > I see your point. Actually our point of views are different: you view > it from the "cellstr as an extra functionality" viewpoint, while I > view it from the "cellstr as a generalization of string" viewpoint. > Well, I don't know, but I'm sometimes reluctant to do something > recursively unless absolutely neccessary. And our approaches both have > advantages and disadvantages. > > Your code's advantage is: char matrix is handled as fast as before; > disadvantage: uses recursion. > > My code's advantage is: no recursion; disadvantage: char matrix is > handled a bit slower. > > Actually there's another way to do it; I first thought of this idea, > but later preferred my current code: handle char matrix and cellstr > differently (like your code), but don't use recursion for cellstr. > Instead, use a loop (like mine) for cellstr. This has the advantage > that simple arguments (char matrices) are handled without loss of > speed (inherited from your code) and that cellstr is handled without > recursion (inherited from my code). The disadvantage is that the > searching code is duplicated, which might cause maintenance problems > later on. > > I still prefer my code, though :p > > > >>I'll check in the changes into Octave-forge. You can post it to > > >>bug-octave and ask John to include it. > > >> > > source-octave (instead of bug-octave) would be more appropriate. > > > > I've never posted to source-octave, and it seems that the list is very qu= iet... > I'm actually here replying my own mail and adding John and bu...@oc... to the list. I didn't see Alois' reply since Jan 4... Now I've tested 4 different ways of implementing strfind, with test_strfind= .m: rep =3D 5; text =3D {{"How much wood would a woodchuck chuck"; "if a woodchuck could chuck wood?"}; "Find the starting indices of the pattern string"}; pattern =3D {"wood"; "in"}; for p =3D 1:2 for i =3D 1:4 cd (num2str (i)); av =3D 0; for j =3D 1:rep t =3D cputime; for k =3D 1:1000 strfind (text{p}, pattern{p}); endfor av +=3D cputime - t; endfor av/rep cd (".."); endfor endfor and the output is: ans =3D 2.5420 ans =3D 2.5500 ans =3D 2.5600 ans =3D 2.5740 ans =3D 0.89800 ans =3D 0.90200 ans =3D 0.90400 ans =3D 0.89800 The first four are for the test with text as a cellstr, the next four are for the test with text as a string. The differences between implementations are: Implementation 1: The code for finding the pattern is duplicated, once in the case of text as a string, and once more in the case of text as a cellstr. Wins when text is a cellstr. Implementation 2: The code is put in a private function __strfind_string__, called in the case of text as a string, and called for every string in text if it is a cellstr. Loses when text is a cellstr, because __strfind_string__ is called many times. Implementation 3: (This is my initial implementation based on Alois' old implementation) If text is a string, it is converted into a cellstr containing one string. Then the search for pattern in applied, with text now being a cellstr. Lastly, if text was a string, then single content of idx will be extracted. This is slow if text is a string (because of some variable copying and stuff). Implementation 4: (This is Alois' new implementation based on Implementation 3) If text is a string, pattern is searched. If it is a cellstr, strfind is recursively called for every string in it. This is slow when text is a cellstr, because of the recursion. The implementations are attached as text attachments. I personally prefer implementation 1, but for inclusion to octave I think John should choose one. -- William Poetra Yoga Hadisoeseno |
From: Dmitri A. S. <das...@gm...> - 2006-02-20 20:18:32
|
On 2/18/06, William Poetra Yoga Hadisoeseno <wil...@gm...> wrote= : ... > Implementation 4: (This is Alois' new implementation based on > Implementation 3) If text is a string, pattern is searched. If it is a > cellstr, strfind is recursively called for every string in it. This is > slow when text is a cellstr, because of the recursion. > > The implementations are attached as text attachments. I personally > prefer implementation 1, but for inclusion to octave I think John > should choose one. > FWIW: In my experience I need to use strfind usually on text that is a string. Normally this is a part of parsing of some external data file with a not well-defined structure. The text is a cellstr usually when there is a known or well-defined structure and one can use this knowledge to optimise the search. So my vote would go for the Implementation 4 -- I would like to have the best performance for the most common case rather than the best performance in the most general case. > -- > William Poetra Yoga Hadisoeseno > Sincerely, Dmitri. -- |
From: William P. Y. H. <wil...@gm...> - 2006-02-21 13:01:21
|
On 2/21/06, Dmitri A. Sergatskov <das...@gm...> wrote: > > > > The implementations are attached as text attachments. I personally > > prefer implementation 1, but for inclusion to octave I think John > > should choose one. > > Reading my own words, it's like "John should choose 1". What I mean is, I think John should choose one of the above. > > FWIW: In my experience I need to use strfind usually on text that is > a string. Normally this is a part of parsing of some external data file > with a not well-defined structure. > The text is a cellstr usually when there is a known or well-defined > structure and one can use this knowledge to optimise the search. > So my vote would go for the Implementation 4 -- I would like to have > the best performance for the most common case rather than the best > performance in the most general case. > Well, Implementation 1 has the best performance in both cases I tested above... Although I must admit that the code for the pattern search (5 lines) is duplicated (once for string, once for cellstr). -- William Poetra Yoga Hadisoeseno |