[OctDev] Re: strfind.m

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 1/4/06, William Poetra Yoga Hadisoeseno <wil...@gm...> wrote:
> On 1/3/06, Alois Schloegl <alo...@tu...> wrote:
> > I prefer the recursive version. It gives cleaner and shorter code and
> > needs less temporary variables.
> >
>
> I see your point. Actually our point of views are different: you view
> it from the "cellstr as an extra functionality" viewpoint, while I
> view it from the "cellstr as a generalization of string" viewpoint.
> Well, I don't know, but I'm sometimes reluctant to do something
> recursively unless absolutely neccessary. And our approaches both have
> advantages and disadvantages.
>
> Your code's advantage is: char matrix is handled as fast as before;
> disadvantage: uses recursion.
>
> My code's advantage is: no recursion; disadvantage: char matrix is
> handled a bit slower.
>
> Actually there's another way to do it; I first thought of this idea,
> but later preferred my current code: handle char matrix and cellstr
> differently (like your code), but don't use recursion for cellstr.
> Instead, use a loop (like mine) for cellstr. This has the advantage
> that simple arguments (char matrices) are handled without loss of
> speed (inherited from your code) and that cellstr is handled without
> recursion (inherited from my code). The disadvantage is that the
> searching code is duplicated, which might cause maintenance problems
> later on.
>
> I still prefer my code, though :p
>
> > >>I'll check in the changes into Octave-forge. You can post it to
> > >>bug-octave and ask John to include it.
> > >>
> > source-octave (instead of bug-octave) would be more appropriate.
> >
>
> I've never posted to source-octave, and it seems that the list is very qu=
iet...
>

I'm actually here replying my own mail and adding John and
bu...@oc... to the list. I didn't see Alois' reply since Jan 4...

Now I've tested 4 different ways of implementing strfind, with test_strfind=
.m:

rep =3D 5;

text =3D {{"How much wood would a woodchuck chuck";
         "if a woodchuck could chuck wood?"};
        "Find the starting indices of the pattern string"};
pattern =3D {"wood"; "in"};

for p =3D 1:2
  for i =3D 1:4
    cd (num2str (i));
    av =3D 0;
    for j =3D 1:rep
      t =3D cputime;
      for k =3D 1:1000
        strfind (text{p}, pattern{p});
      endfor
      av +=3D cputime - t;
    endfor
    av/rep
    cd ("..");
  endfor
endfor

and the output is:

ans =3D 2.5420
ans =3D 2.5500
ans =3D 2.5600
ans =3D 2.5740
ans =3D 0.89800
ans =3D 0.90200
ans =3D 0.90400
ans =3D 0.89800

The first four are for the test with text as a cellstr, the next four
are for the test with text as a string.

The differences between implementations are:

Implementation 1: The code for finding the pattern is duplicated, once
in the case of text as a string, and once more in the case of text as
a cellstr. Wins when text is a cellstr.

Implementation 2: The code is put in a private function
__strfind_string__, called in the case of text as a string, and called
for every string in text if it is a cellstr. Loses when text is a
cellstr, because __strfind_string__ is called many times.

Implementation 3: (This is my initial implementation based on Alois'
old implementation) If text is a string, it is converted into a
cellstr containing one string. Then the search for pattern in applied,
with text now being a cellstr. Lastly, if text was a string, then
single content of idx will be extracted. This is slow if text is a
string (because of some variable copying and stuff).

Implementation 4: (This is Alois' new implementation based on
Implementation 3) If text is a string, pattern is searched. If it is a
cellstr, strfind is recursively called for every string in it. This is
slow when text is a cellstr, because of the recursion.

The implementations are attached as text attachments. I personally
prefer implementation 1, but for inclusion to octave I think John
should choose one.

--
William Poetra Yoga Hadisoeseno

[OctDev] Re: strfind.m

A collection of packages providing extra functionality for GNU Octave

[OctDev] Re: strfind.m