mgrep - A Multiline grep Implementation Wiki

mgrep - A Multiline grep Implementation

Brought to you by: lenardpi

Home

mgrep - A Multiline grep Implementation

Description

mgrep is a simple command line utility that allows you to search for both single- and multiline patterns. The latter is useful when you want to search for blocks of text that can be identified by several lines only. Log file analysis, for instance, is a typical application where such a functionality is needed.

mgrep is very similar to the Unix grep in its usage, but not all functionality of grep is implemented (so it is not a grep replacement). The available functionalities, however, have the same meaning and behaviour as in grep.

Supported Options

Not all options of grep are supported. Those that can be used should have the same meaning and behaviour as in grep. Currently, the following (most commonly used) options can be used.

Option	Description
-h	Display brief help.
--help	Display verbose help.
-V, --version	Show version information.
-i, --ignore-case	Perform case-insensitive matching.
-v, --invert-match	Select non-matching lines. (Not allowed with -c when using multiline patterns.)
-c, --count	Count number of matches. (Not allowed with -v when using multiline patterns.)
-l, --files-with-matches	List matching files only.
-n, --line-number	Display line numbers for matching lines.
-q, --quiet, --silent	Suppress normal output.

Multiple options can be used simultaneously, and the single-character versions can be combined: eg. "-ic", "-i -c" and "--ignore-case -c" all mean the same thing.

Usage Examples

Note: Windows seems to like quotation marks (") instead of apostrophes (') around command line parameters so under Windows, use mgrep like mgrep "some pattern" instead of mgrep 'some pattern'.

Single-line Patterns

These are exactly the same as with grep. Some examples:

:::bash
# Search for lines with "Romeo" in the file shakespeare-romeo_and_juliet:
mgrep Romeo shakespeare-romeo_and_juliet

:::bash
# Count the number of include directives in each .cc file:
mgrep -c '#include' *.cc

:::bash
# List files containing "John Doe", regardless of capitalization:
mgrep -il 'John Doe' *.cc

:::bash
# List lines that have either "apple" or "cherry" in them:
mgrep 'apple\|cherry' fruits

:::bash
# Backreferences
#
# The pattern '\(.\)MIDDLE\1' in mgrep will match both "xMIDDLEx"...
echo "xMIDDLEx" | mgrep '\(.\)MIDDLE\1'
# ... but not "xMIDDLEy":
echo "xMIDDLEy" | mgrep '\(.\)MIDDLE\1'

Multiline Patterns

You can denote a line break with "$^" in the patterns for mgrep.

:::bash
# Search the file misty_mountains for blocks having "dragon" in the first line
# and "sun" in the second:
mgrep 'dragon.*$^.*sun' misty_mountains

:::bash
# Count the number of double cases in all .cpp files:
mgrep 'case.*:.*$^.*case.*:' *.cpp

:::bash
# List blocks having "Mars" in the first line, anything in the second, 
# and "planet." at the beginning of the third line:
mgrep 'Mars.*$^.*$^planet\.' solar_system

:::bash
# List blocks having "Mars" in the first line, anything in the second, 
# and "planet." *anywhere* in the third line:
mgrep 'Mars.*$^.*$^.*planet\.' solar_system

:::bash
# Count the number of four-line blocks that start with "Lion" in the first line, 
# contain anything in the second line, an empty line as the third, and 
# "tiger." at the end of the forth line:
mgrep -ic '^Lion.*$^.*$^$^.*tiger\.$' *

:::bash
# List those 2-line blocks that have either "red" or "green" in the first line,
# and either "stone" or "wood" in the second:
mgrep 'red\|green.*$^.*stone\|wood' craftsmanship

Note that in case of multiline patterns, line breaks have higher precedence than alternatives (see last example above). This means that first, the pattern is split into single line patterns, and then, alternatives are taken into account within the single line patterns (ie. "multiline pattern of alternatives", not "alternative of multiline patterns").

Installation

Linux

Compilation from Source

You will need the Boost.Regex library for compiling mgrep. Install it on your system before you proceed.

Download mgrep-<version>.tar.gz, extract it, then change to the directory containing the source. You can then compile and install it the usual way:

:::bash
tar xvzf mgrep-<version>.tar.gz
cd mgrep-<version>
./configure --prefix=<destination directory>
make 
make install

If --prefix is omitted, the target directory will default to /usr/local/bin.

You also have the option to build mgrep in debug mode. In this case, debug messages during the execution will be dumped to the standard error channel. To compile mgrep in debug mode, use the following command.

:::bash
make debug

If you want to build a static binary, you can do it like this:

:::bash
make static

Some regression tests are also bound with the source code. You can run them with

:::bash
make test

Please note that the regression test will not pass if you have built the debug binary, so build a non-debug variant if you intend to run the suite.

You can remove your program either by manually removing the mgrep binary from the installation directory or by running

:::bash
make uninstall

You can clean all compilation generated files by running

:::bash
make clean

If you want to clean everything that is generated by configure (Makefile, config.log, etc.) and make, run

:::bash
make purify

You will need to re-run configure if you want to build mgrep again after it.

Using the Pre-Built Static Binary

If you don't have Boost.Regex on your machine, you cannot compile mgrep. Still, you can download the statically built binary (for 64-bit architecture starting from v1.1.3, for 32-bit before that), and simply copy it to the destination folder of your liking:

:::bash
tar xvzf mgrep-<version>-static.tar.gz -C <destination directory>

Windows

The easiest way is to download the pre-built binary in mgrep-<version>-binary-win64.zip (*-win32.zip before v1.1.3). Just extract the exe from the zip file, and use it to your liking.

If you want to compile mgrep yourself, you need to download the source code (mgrep-<version>.tar.gz) and install the Boost.Regex library. Build it then in your favorite development environment. (You can also find Dev-C++ solution files in the win directory along with a brief README file. These can give you some help.)

Other OS's

As of now, you need to figure out how to build mgrep on all other platforms, but the source code should be compilable anywhere where the std and Boost.Regex libraries are installed.

Version Numbering

The version format and semantics used in mgrep is described below.

Beta Versions

Beta versions are development versions intended to be tested by the community. They are stable enough to be tried by end-users, but may include bugs that should not be present in a stable release.

The version numbering of such releases follows the scheme betaN, where N is a positive integer.

There is no restriction whatsoever on the contents of a beta package.

Stable Versions

Stable versions follow the major.minor.patch scheme.

Patches include bug fixes or other smaller amendments (eg. changes to text files or code modifications without apparent effect to the user).

Minor versions indicate changes containing new functionality. They may contain bug fixes as well.

Major versions denote significant code modifications, eg. complete refactoring, or non-backward compatible changes (these will hopefully be very rare). They may contain both additional functionality and bug fixes.

When a higher level number in the version is incremented, all subsequent numbers are nulled (eg. 1.3.2 may be followed by 1.4.0 but not by 1.4.2).

The first stable release after beta versions is 1.0.0.

The versioning logic outlined above tries to follow the Semantic Versioning scheme detailed here: http://semver.org/.