Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

How to match "End of buffer"/"End of string"?

Help
Jordi
2004-12-29
2013-05-14
  • Jordi
    Jordi
    2004-12-29

    Hi,

    I'm new in regex and this is my first post, so maybe the solution is obvious but I couldn't find it in google...

    I need to parse the multiline output of a command, every line ends with a \n except the last one, which actually it ends with the end of buffe ("\0"
    character).  The output I need to parse is something like:

      "text1 this is a multiple-word text\n
      text2 another text"
    (the second line does not have a newline)

    As a result I want only two sub-expression in a line using a regex like:

    (\w+)\s+([^\n]+)\n

    The first submatch should be the first word ("text1" and "text2"), while the
    second submatch would be the rest of the line ("this is a multiple-word
    text" and "another text")

    In my program I use regex_search with the boost::match_continuous option,
    all the other regex objects are created with the default options.

    The first line matchs the regex expression without any problem but as the
    second line does not end with a "\n" it does not.  I'm unable to find a good
    regex expression which can match the two possible "ends of line" (the \n or
    \0 character)..

    I've tried some expressions without success:

    1.- First I tried to match \n || \0 using:

    (\w+)\s+([^\n\x00]+)([\n\x00])

    but it seems the \x00 is not part of the buffer, so the second line does not
    match.

    2.- Then I tried to use the "$" string without success (By the way, I
    assumed "$" would work as "\n" but it does not match the "end of line"
    character. When should I use??)

    3.- In google I found that I should use "\z" or "\Z".  I tried both, but
    they didn't work:  The last line of the text  never matches! (I suppose I
    need to add a new option to a regex object in order the "\z" o "\Z" strings
    to work)

    Finally I've found a workaround using the regex:

    (\w+)\s+([^\n]+)\n*

    and now it works but I would like to find a way to match the end of
    buffer/end of string.  Any idea??

    Thanks in advance,

    Jordi