Re: [gobo-eiffel-develop] Regular expression syntax

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

>>>>> "Colin" == Colin Paul Adams <co...@co...> writes:

>>>>> "Eric" == Eric Bezault <er...@go...> writes:
    Eric> UTF-8 byte representation? Even if it's a multibyte, you can
    Eric> replace [\i:] by (multibyte1|...|multibyten|:) and likewise
    Eric> for similar regexp constructs.

    Colin> While that is possible, I think the resultant string for
    Colin> \i, \c and similar properties will be of the order of 20KB.

    Colin> I don't know how the regular expression engine works, but
    Colin> if it needs to compare a space character (for instance)
    Colin> with each of 20000 characters in order to reject a test, I
    Colin> think it will be far too inefficient.

I wrote a test program to measure this.

For XML 1.1, the equivalent to \c is  3830417 bytes long.

This is definitely too big, so something is wrong with the test
program. Can anyone see where the fault is (I know it's not the most
efficient way of doing it):

class TEST

inherit

	UC_UNICODE_FACTORY

	UC_UNICODE_CONSTANTS

	XM_UNICODE_CHARACTERS_1_1

	KL_IMPORTED_STRING_ROUTINES

create

	make

feature {NONE} -- Initialization

	make is
		-- Test byte count of equivalent regexp to [\c]+.
		local
			i: INTEGER
			l_regexp: STRING
		do
			from
				l_regexp := ""
				i := 1
			until
				i > maximum_unicode_character_code
			loop
				if is_name_char (i) then
					l_regexp := STRING_.appended_string (l_regexp, 	new_unicode_string_filled_code (i, 1))
				end
				i := i + 1
			end
			print (utf8.to_utf8 (l_regexp).count.out + "%N")
		end

end
-- 
Colin Adams
Preston Lancashire