|
From: William P. Y. H. <wil...@gm...> - 2006-02-19 03:45:45
|
On 1/4/06, William Poetra Yoga Hadisoeseno <wil...@gm...> wrote:
> On 1/3/06, Alois Schloegl <alo...@tu...> wrote:
> > I prefer the recursive version. It gives cleaner and shorter code and
> > needs less temporary variables.
> >
>
> I see your point. Actually our point of views are different: you view
> it from the "cellstr as an extra functionality" viewpoint, while I
> view it from the "cellstr as a generalization of string" viewpoint.
> Well, I don't know, but I'm sometimes reluctant to do something
> recursively unless absolutely neccessary. And our approaches both have
> advantages and disadvantages.
>
> Your code's advantage is: char matrix is handled as fast as before;
> disadvantage: uses recursion.
>
> My code's advantage is: no recursion; disadvantage: char matrix is
> handled a bit slower.
>
> Actually there's another way to do it; I first thought of this idea,
> but later preferred my current code: handle char matrix and cellstr
> differently (like your code), but don't use recursion for cellstr.
> Instead, use a loop (like mine) for cellstr. This has the advantage
> that simple arguments (char matrices) are handled without loss of
> speed (inherited from your code) and that cellstr is handled without
> recursion (inherited from my code). The disadvantage is that the
> searching code is duplicated, which might cause maintenance problems
> later on.
>
> I still prefer my code, though :p
>
> > >>I'll check in the changes into Octave-forge. You can post it to
> > >>bug-octave and ask John to include it.
> > >>
> > source-octave (instead of bug-octave) would be more appropriate.
> >
>
> I've never posted to source-octave, and it seems that the list is very qu=
iet...
>
I'm actually here replying my own mail and adding John and
bu...@oc... to the list. I didn't see Alois' reply since Jan 4...
Now I've tested 4 different ways of implementing strfind, with test_strfind=
.m:
rep =3D 5;
text =3D {{"How much wood would a woodchuck chuck";
"if a woodchuck could chuck wood?"};
"Find the starting indices of the pattern string"};
pattern =3D {"wood"; "in"};
for p =3D 1:2
for i =3D 1:4
cd (num2str (i));
av =3D 0;
for j =3D 1:rep
t =3D cputime;
for k =3D 1:1000
strfind (text{p}, pattern{p});
endfor
av +=3D cputime - t;
endfor
av/rep
cd ("..");
endfor
endfor
and the output is:
ans =3D 2.5420
ans =3D 2.5500
ans =3D 2.5600
ans =3D 2.5740
ans =3D 0.89800
ans =3D 0.90200
ans =3D 0.90400
ans =3D 0.89800
The first four are for the test with text as a cellstr, the next four
are for the test with text as a string.
The differences between implementations are:
Implementation 1: The code for finding the pattern is duplicated, once
in the case of text as a string, and once more in the case of text as
a cellstr. Wins when text is a cellstr.
Implementation 2: The code is put in a private function
__strfind_string__, called in the case of text as a string, and called
for every string in text if it is a cellstr. Loses when text is a
cellstr, because __strfind_string__ is called many times.
Implementation 3: (This is my initial implementation based on Alois'
old implementation) If text is a string, it is converted into a
cellstr containing one string. Then the search for pattern in applied,
with text now being a cellstr. Lastly, if text was a string, then
single content of idx will be extracted. This is slow if text is a
string (because of some variable copying and stuff).
Implementation 4: (This is Alois' new implementation based on
Implementation 3) If text is a string, pattern is searched. If it is a
cellstr, strfind is recursively called for every string in it. This is
slow when text is a cellstr, because of the recursion.
The implementations are attached as text attachments. I personally
prefer implementation 1, but for inclusion to octave I think John
should choose one.
--
William Poetra Yoga Hadisoeseno
|