jregex - regular expressions for Java / Discussion / Help: Bug or bad regex?

I want to parse log files. The log files have a nasty problem in that the end of a log entry may have text in any format, and that text may be over multiple lines. The start of the next log line will always begin with one of (debug, info... etc), except for the last line. Consider the sample input:

// begin sample --

DEBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] lots of text
that may be
over multiple lines
DEBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] more random text
that may be over
multiple lines

// end sample --

So I tried this regex...

(\w*) (.*)(?=DEBUG)

This worked ok, but could not match the last line because it is the end of the file and does not have a "DEBUG" entry. So then I tried this:

(?(?=.*DEBUG) ( ((\w*) (.*)(?=DEBUG)) ) | ((\w*) (.*)) )

This was closer, but still didn't work. The output was:

//-- begin output --

Groups:     0: <DEBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] lots of text
that may be
over multiple lines
>     1: <DEBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] lots of text
that may be
over multiple lines
>      2: <DEBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] lots of text
that may be
over multiple lines
>     3: <DEBUG>     4: < 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] lots of text
that may be
over multiple lines
>      5: -     6: -     7: - Groups:     0: <>      1: <>     2: <>     3: <>      4: <>     5: -     6: -      7: -Groups:      0: <EBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] more random text
that may be over
multiple lines
>     1: -     2: -      3: -     4: -     5: <EBUG 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] more random text
that may be over
multiple lines
>      6: <EBUG>     7: < 2004-01-23 11:29:58,818 {anonymous} [Servlet.Engine.Transports:9] more random text
that may be over
multiple lines
> Groups:     0: <>     1: -      2: -     3: -     4: -      5: <>     6: <>     7: <>

// end output --

This was obtained after hitting "apply" four times. What I expected was after hitting apply the first time to get all of log entry 1 parsed, and after hitting apply a second time getting all of log entry 2 parsed. As you can see, that didn't happen. Also, when the "apply" button is hit for the 3rd time I don't get "debug" as expected but instead get "ebug".

Is my regex bad (probably) or is this a bug in jregex. Once I have this working I will build upon it to parse out important information in each log entry.

Thanks for the help
Later
Rob

Bug or bad regex?

Forums

Help

Bug or bad regex? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Bug or bad regex?