From: <duv...@ya...> - 2005-07-29 08:01:37
|
Dear, A little update on the previous post. First there was a typo. The second should match (instead of **NOT**). The regular expression is now fixed with the following : regex = re.compile(r'(<!--(\s*\S+\s*)+-->)', re.I) This work fine on all the sample I made but as soon as the regular expression is run against a real life file : 150+ lines I get the following exception : RuntimeError: maximum recursion limit exceeded Extremly annoying. As my HTML page cannot be garantied to be XHTML compliant, I cannot use SAX to solve the parsing problem (as explained in the O'Reilley Jython book, page 173). Is the only working solution, the implementaion of my own 'finder' on a line perline basis ? Has anybody found smarter way to do this ... ? \T, --- duv...@ya... wrote: > Dear, > > I am confronted with the following problem in regular expression formulation. > I basically want to match to non empty comment in HTML documents (grinder related matching). > In an HTML file : > 1°. The following 4 should not match : > <!----> (No character) > <!-- --> (one space) > <!-- > --> (space + TAB + carriage-return + spaces) > <!-- --> (there is a space + a TAB) > 2° This one should match > <!-- a --> > > > but I seem to have \S+ matching the character ASCII 32, as the example shows below ... > > What do I do wrong ... ? > > \T, > > > > > > ------------ Sample ------------------------ > import re > > regex = re.compile(r'[\s*\S+\s*]+', re.I) > regexWithCharacterOnly = re.compile(r'\S+', re.I) > > if __name__ == '__main__' : > txt = ' ' # One 'space' followed by one TAB > match = regex.search(txt) > if (match!=None): > print 'Match found : [ %i, %i]\n first character idx : %i --> "%s"' % (match.pos, > match.endpos, > ord(txt[match.pos]), txt) > > match = regexWithCharacterOnly.search(txt) > if (match!=None): > print 'Match found : [ %i, %i], first character idx : %i --> "%s"' % (match.pos, match.endpos, > ord(txt[match.pos]), txt) > else: > print 'No match found against "\\S+"' > > Any fool can write code that a computer can understand. > Good programmers write code that humans can understand. > Martin Fowler > T. : +32 (0)2 742 05 94 > M. : +32 (0)497 44 68 12 > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO September > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Grinder-use mailing list > Gri...@li... > https://lists.sourceforge.net/lists/listinfo/grinder-use > Any fool can write code that a computer can understand. Good programmers write code that humans can understand. Martin Fowler T. : +32 (0)2 742 05 94 M. : +32 (0)497 44 68 12 |