Thread: [gobo-eiffel-develop] Object-oriented byte-codes for regular expressions?

Brought to you by: ericb, farnaud

gobo-eiffel-develop

[gobo-eiffel-develop] Object-oriented byte-codes for regular expressions?

From: Colin P. A. <co...@co...> - 2008-01-25 14:23:08

Classes RX_PCRE_MATCHER and RX_PCRE_BYTE_CODE_CONSTANTS violate the
single choice principle as they both contain giant inspect statements 
on the operation code. And the list of operation codes is defined in
yet a third class (RX_PCRE_BYTE_CODE_CONSTANTS).

The pure OO way would be to have a class for the concept of the machine
operation and descendant class for each operation. But as these
classes can't be expanded, there is cost associated with this,
compared to using a 32-bit integer to represent instructions (which is
a resonable model of a 32-bit microprocessor instruction - that is -
its the model an assembler programmer has).

I have to decide which way to go for the Unicode engine I am working
on.

Any opinions one way or the other?
-- 
Colin Adams
Preston Lancashire

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: CRISMER Paul-G. <Pau...@gr...> - 2008-01-25 14:53:12

Hello Colin,

What is the "cost" you are ready to pay for?

- OO / Single choice :=0D
  cost of non-expandedness + dynamic binding (time)
  benefit of readability (development effort)

- Integer (instruction number) + giant inspect
  cost - readability (development effort)
  benefit - run-time efficiency (time)

The only way to know the difference between both solutions is to write
them both and measure the run-time cost.

- A mixed solution would be the following
* a class INSTRUCTION and its descendants model your instruction set
* at startup you fill an instruction table with one instruction object
* The program is a sequence of integers that refer to the appropriate
instruction in the instruction table
* execution is something like this : instructions.item (op_code).execute

Hope this helps.

My personal taste is the following :
- first model a well designed (OO) solution and make it work
- If measurable performance problems arise, then optimize.

Best regards,

Paul G. Crismer

-----Original Message-----
From: gob...@li...
[mailto:gob...@li...] On Behalf Of
Colin Paul Adams
Sent: vendredi 25 janvier 2008 15:22
To: gob...@li...
Subject: [gobo-eiffel-develop] Object-oriented byte-codes for
regularexpressions?

Classes RX_PCRE_MATCHER and RX_PCRE_BYTE_CODE_CONSTANTS violate the
single choice principle as they both contain giant inspect statements on
the operation code. And the list of operation codes is defined in yet a
third class (RX_PCRE_BYTE_CODE_CONSTANTS).

The pure OO way would be to have a class for the concept of the machine
operation and descendant class for each operation. But as these classes
can't be expanded, there is cost associated with this, compared to using
a 32-bit integer to represent instructions (which is a resonable model
of a 32-bit microprocessor instruction - that is - its the model an
assembler programmer has).

I have to decide which way to go for the Unicode engine I am working on.

Any opinions one way or the other?
--
Colin Adams
Preston Lancashire

------------------------------------------------------------------------
-
This SF.net email is sponsored by: Microsoft Defy all challenges.
Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
gobo-eiffel-develop mailing list
gob...@li...
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop

***** Disclaimer *****
http://www.groupes.be/1_mail-disclaimer.htm

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: Eric B. <er...@go...> - 2008-01-25 16:12:38

CRISMER Paul-Georges wrote:
> What is the "cost" you are ready to pay for?
> 
> - OO / Single choice :
>   cost of non-expandedness + dynamic binding (time)
>   benefit of readability (development effort)
> 
> - Integer (instruction number) + giant inspect
>   cost - readability (development effort)
>   benefit - run-time efficiency (time)

Assuming that from the client point of view they
have the same interface (only the implementation
differs), as a client of the library I'm only
concerned in speed and memory usage. And specially
in case of regexp. When there is a trade-off to be
made, the client should be the winner. That's what
happen for the Eiffel language itself: put the burden
on the compiler writers, not on the language users.
Now you might say that compiler writers should make
it so that the cost of non-expandedness + dynamic binding
should not be noticeable compared to giant inspect ;-)

-- 
Eric Bezault
mailto:er...@go...
http://www.gobosoft.com

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: Colin P. A. <co...@co...> - 2008-01-25 16:36:39

>>>>> "Eric" == Eric Bezault <er...@go...> writes:

    Eric> Assuming that from the client point of view they have the
    Eric> same interface (only the implementation differs), as a
    Eric> client of the library I'm only concerned in speed and memory
    Eric> usage.

So I should wrap a C library then?

    Eric> Now you might say that compiler writers should make
    Eric> it so that the cost of non-expandedness + dynamic binding
    Eric> should not be noticeable compared to giant inspect ;-)

I'd rather say allow ineritance of expanded types.
What are the issues apart from space layout (which could be solved by
forbidding conforming inheritance if attributes are added, or any
function/attribute redefinition, and use non-conforming inheritance
syntax for these cases)?
-- 
Colin Adams
Preston Lancashire

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: Emmanuel S. [ES] <ma...@ei...> - 2008-01-25 16:42:08

> So I should wrap a C library then?

I would not agree there, everything should be done in Eiffel because in the
long term it is better. We can achieve very good performance with Eiffel
when things are written with performance in mind.

Manu

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: Colin P. A. <co...@co...> - 2008-01-25 17:27:43

>>>>> "Eric" == Eric Bezault <er...@go...> writes:

    >> So I should wrap a C library then?

I wasn't actually serious about that, by the way (I don't actually
know of an available one, for a start, and I'm in favour of everything
in pure Eiffel).

    Eric> What I had in mind is more something like EiffelParse
    Eric> vs. geyacc.

Can you expand on that statement please?
-- 
Colin Adams
Preston Lancashire

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: Eric B. <er...@go...> - 2008-01-25 18:34:30

Colin Paul Adams wrote:
>>>>>> "Eric" == Eric Bezault <er...@go...> writes:
> 
>     >> So I should wrap a C library then?
> 
> I wasn't actually serious about that

Yes, I knew that you would prefer assembly code ;-)

> 
>     Eric> What I had in mind is more something like EiffelParse
>     Eric> vs. geyacc.
> 
> Can you expand on that statement please?

I don't know the current status, but 10 years ago using
EiffelParse to write parsers (yooc was there to help) produced
very slow parsers. EiffelParse has a very nice object-oriented
design for the parser implementation. I don't know if it's
related on not, but it's slow.

On the other hand, geyacc was written to produce ugly
non-object-oriented parsers, based on zillions of integers
put in tables or used in inspect statement. The resulting
parser looks more or less what the parser would look like
if generated by yacc in C, but with an Eiffel syntax.
I don't remember the exact results of the benchmarks, but
parsers generated by geyacc were much faster than EiffelParse.

Of course the geyacc solution is usable only if the clients
of the generated parsers don't have to dive into the code
of the parsers themselves. We should see it as a black-box.

I think that a regexp library has the same criteria. That's
typically something that clients want to be fast. And clients
see it as a black-box (not that many people decide to dive
into the code of regexp implementation to understand how it
works).

-- 
Eric Bezault
mailto:er...@go...
http://www.gobosoft.com

Re: [gobo-eiffel-develop] Object-oriented byte-codes for regularexpressions?

From: Eric B. <er...@go...> - 2008-01-25 17:22:39

Colin Paul Adams wrote:
>>>>>> "Eric" == Eric Bezault <er...@go...> writes:
> 
>     Eric> Assuming that from the client point of view they have the
>     Eric> same interface (only the implementation differs), as a
>     Eric> client of the library I'm only concerned in speed and memory
>     Eric> usage.
> 
> So I should wrap a C library then?

What I had in mind is more something like EiffelParse vs. geyacc.

-- 
Eric Bezault
mailto:er...@go...
http://www.gobosoft.com