From: <no...@so...> - 2002-01-02 12:01:08
|
Bugs item #496456, was opened at 2001-12-24 03:18 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=496456&group_id=10894 Category: 41. Regexp Group: = 8.3 Status: Open Resolution: None Priority: 3 Submitted By: Sergey Kuzmin (kofeinik) Assigned to: Donal K. Fellows (dkf) Summary: regexp engine takes too much RAM Initial Comment: TCL version: 8.3 Platform: windows regexp engine takes too much RAM after executing the code from an example (see attached file) the total size of allocated RAM stays on the level of 24 megabytes that too much (IMHO) for this small array. ---------------------------------------------------------------------- >Comment By: Donal K. Fellows (dkf) Date: 2002-01-02 04:00 Message: Logged In: YES user_id=79902 It seems that the RE compiler allocates some fairly large objects during its processing (during the coloring of the graph) and in a complex RE these may occupy quite a lot of memory; maybe we need to write a custom allocator for these things? The alternative is to figure out a way to use smaller color arrays, but I don't understand the code anything like well enough to be able to do that at this point. :^( The details can be seen in regcustom.h and regguts.h if you're interested. (My guess is that the problems really stem from the way that support for characters wider than a byte was added, but that's just a guess at this point.) ---------------------------------------------------------------------- Comment By: Sergey Kuzmin (kofeinik) Date: 2001-12-27 08:01 Message: Logged In: YES user_id=286083 With concatenating of RE's i want to reduce time, needed for matching, because TCL compile RE only first time, and next matching works much faster. And i agree, this situation is typical tradeoff, but i can't agree with proportion 23KB->10-17MB. I'm not right? ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2001-12-27 07:36 Message: Logged In: YES user_id=79902 Experiment indicates that replacing ( with (?: shaves around 5MB off the size of the image. Don't know how much memory the RE engine needs to allocate for all that stuff, though I suspect that you don't want to use a single RE for this; the initial compile-time gets a bit humungous. Matching each RE line separately keeps the space required *much* smaller, but makes the length of time to check an unlisted site longer after the initial compilation (which is also much shorter.) Looks like a time-space tradeoff... ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2001-12-27 04:47 Message: Logged In: YES user_id=79902 The problem is probably related to the quantity of capturing parentheses involved. There may be some linkage to the use of arrays (which are *truly* not appropriate for this use; lists are much more like what is needed) and the retention of all the regexps used in building the main one too. Inded, I could rewrite the code to build the regexp quite quickly (though I've no idea how large the single resulting RE would actually be, or whether there would be any performance gain through doing this after the initial loading phase.) There is possibly a need for a flag to say that no capturing at all should be done by a regexp, (making "RE(RE)RE" equivalent to "RE(?:RE)RE") which might be useful in situations such as this. If that's really the case, then this is a FRQ, not a bug per se. I'll have a look at this in detail after the holidays. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2001-12-24 06:41 Message: Logged In: YES user_id=148712 Bug replicated on linux with tcl8.3.3: VSZ is 17MB in the original version, only 3MB when taking the [string length] version. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=496456&group_id=10894 |