From: William P. Y. H. <wil...@gm...> - 2006-02-19 03:45:45
|
On 1/4/06, William Poetra Yoga Hadisoeseno <wil...@gm...> wrote: > On 1/3/06, Alois Schloegl <alo...@tu...> wrote: > > I prefer the recursive version. It gives cleaner and shorter code and > > needs less temporary variables. > > > > I see your point. Actually our point of views are different: you view > it from the "cellstr as an extra functionality" viewpoint, while I > view it from the "cellstr as a generalization of string" viewpoint. > Well, I don't know, but I'm sometimes reluctant to do something > recursively unless absolutely neccessary. And our approaches both have > advantages and disadvantages. > > Your code's advantage is: char matrix is handled as fast as before; > disadvantage: uses recursion. > > My code's advantage is: no recursion; disadvantage: char matrix is > handled a bit slower. > > Actually there's another way to do it; I first thought of this idea, > but later preferred my current code: handle char matrix and cellstr > differently (like your code), but don't use recursion for cellstr. > Instead, use a loop (like mine) for cellstr. This has the advantage > that simple arguments (char matrices) are handled without loss of > speed (inherited from your code) and that cellstr is handled without > recursion (inherited from my code). The disadvantage is that the > searching code is duplicated, which might cause maintenance problems > later on. > > I still prefer my code, though :p > > > >>I'll check in the changes into Octave-forge. You can post it to > > >>bug-octave and ask John to include it. > > >> > > source-octave (instead of bug-octave) would be more appropriate. > > > > I've never posted to source-octave, and it seems that the list is very qu= iet... > I'm actually here replying my own mail and adding John and bu...@oc... to the list. I didn't see Alois' reply since Jan 4... Now I've tested 4 different ways of implementing strfind, with test_strfind= .m: rep =3D 5; text =3D {{"How much wood would a woodchuck chuck"; "if a woodchuck could chuck wood?"}; "Find the starting indices of the pattern string"}; pattern =3D {"wood"; "in"}; for p =3D 1:2 for i =3D 1:4 cd (num2str (i)); av =3D 0; for j =3D 1:rep t =3D cputime; for k =3D 1:1000 strfind (text{p}, pattern{p}); endfor av +=3D cputime - t; endfor av/rep cd (".."); endfor endfor and the output is: ans =3D 2.5420 ans =3D 2.5500 ans =3D 2.5600 ans =3D 2.5740 ans =3D 0.89800 ans =3D 0.90200 ans =3D 0.90400 ans =3D 0.89800 The first four are for the test with text as a cellstr, the next four are for the test with text as a string. The differences between implementations are: Implementation 1: The code for finding the pattern is duplicated, once in the case of text as a string, and once more in the case of text as a cellstr. Wins when text is a cellstr. Implementation 2: The code is put in a private function __strfind_string__, called in the case of text as a string, and called for every string in text if it is a cellstr. Loses when text is a cellstr, because __strfind_string__ is called many times. Implementation 3: (This is my initial implementation based on Alois' old implementation) If text is a string, it is converted into a cellstr containing one string. Then the search for pattern in applied, with text now being a cellstr. Lastly, if text was a string, then single content of idx will be extracted. This is slow if text is a string (because of some variable copying and stuff). Implementation 4: (This is Alois' new implementation based on Implementation 3) If text is a string, pattern is searched. If it is a cellstr, strfind is recursively called for every string in it. This is slow when text is a cellstr, because of the recursion. The implementations are attached as text attachments. I personally prefer implementation 1, but for inclusion to octave I think John should choose one. -- William Poetra Yoga Hadisoeseno |