rex-text-tool Wiki

Shell tool for processing text files with regular expressions

Brought to you by: raybert
Home
rex 1.4.0 (2018-09-17)

Usage: rex [options] <exp> [<inputs>]

Rex is a tool for processing text using regular expressions.  Two specific
tools are provided: a substitution tool (similar to rpl or sed and vi's
's///' (subst) commands) and a parsing tool which can be used to parse fields
out of an input string and format them (similar to cut but using regular
expressions).  Extended regular expressions are used in all cases.

Rex processes single lines of text from either files, strings (given on the
command line) or stdin.  One or more regular files can be listed on the
command line or one or more strings can be specified using the -s switch (but
the two cannot be mixed).  If neither are given standard input is processed.

Common Options:

  -s <string>

      A string to process.  The -s option may be specified multiple times and
      each is treated as a separate line of input (in the order listed on the
      command line).  This option may not be combined with files or stdin.

  --all,-a

      Find all matches on each line.  The pattern search is repeated until the
      entire line is exhausted.  By default searching stops after the first
      match on each line.

  --ignore-case,-i

      Ignore alphabetic case in regular expressions.

  --verbose,-v

      Display verbose output.  Note that this may be counter-productive when
      processing stdin.

  --line-numbers,-l

      Display line numbers.

  --color,-c

      Force output to be colorized.  Colorizing is normally enabled
      automatically if stdout is a tty.

  --no-color,-C

      Disable colorized output.

Substitution Tool

    The substitution tool applies a replacement string, specified with the
    --subst switch, to each line that matches the given regular expression.
    The replacement string can have group references as well as certain
    backslash character translations and operators.

    All processed lines are printed to stdout or a file, regardless of
    whether they were effected by substitutions or not.  If the input is
    stdin or one or more strings the lines are printed to stdout.  If the
    input is a file, the file is re-written with the new text.  If the
    --backup switch is used, the original file is renamed with a tilde
    appended to its file name.

    If the --preview switch is used, rex instead produces output on stdout in
    a format similar to diff: the original line is prefixed with a minus sign
    and the replacement line is prefixed with a plus sign along with file
    names and line numbers (when these apply).

Options Specific to the Substitution Tool:

    --subst,--rep,-r <string>

        Enables the Substitution Tool and defines the replacement string.
        All matches on all input lines will be substituted by the expansion
        of <string>.  <string> may contain group references, backslash
        character translations and operators (see below).

        When substitutions are performed on a file, the file is re-written
        with the substitutions and the original file is overwritten (a backup
        can be made using the --backup switch).

        When substitutions are performed on stdin or on strings specified
        using the -s switch, the results are written to stdout.

        If --verbose is used, all substituted lines are written to stdout
        using a diff-like format.  (Note that this is counter-productive with
        stdin.)

        If --preview is used, all would-be substituted lines are written to
        stdout using a diff-like format and no changes are actually made.

        <string> may contain group references (sometimes called 'back
        references').  These consist of a backslash followed by a single digit.
        The digits 1-9 refer to groups in the regular expression, in the order
        that they appear, and are substituted by such.  Any given group may
        match (and subsequently substitute) a blank string but it is an error
        to refer to a group that does not exist.  The digit 0 refers to the
        entire match (note that some tools use the ampersand for this purpose
        but rex does not recognize it).  If --all is used, group references
        refer individually to each single match of the regular expression
        (rather than the entire line, in the case of multiple matches in a
        single line).

        <string> may contain certain operators that do not themselves insert
        any text but which effect how other text is inserted.  Presently the
        uppercase (\U), lowercase (\L) and word case (\W) operators are
        supported: they will cause the case of the next group substitution to
        be modified accordingly.

        <string> may contain certain backslash character translations, as
        shown in this table:

            \n     newline
            \r     carriage-return
            \t     tab
            \f     form feed
            \b     backspace
            \z     ASCII 0

        Any character in <string> that is preceeded by a backslash but is not
        identified above is output verbatim without the backslash.  To insert
        a literal backslash a double backslash must be used.  A backslash at
        the end of a line is output literally.

    --preview,-p

        Do not actually make any changes.  Would-be substitutions are
        displayed using a diff-like format.

    --backup,-b

        When performing a substitution on a file, make a backup of the file.
        Files are backed-up by renaming them with a tilde (~) on the end.  If
        a file with the same name as the backup file already exists the
        corresponding input file will not be processed (a warning is issued).

    --context-lines,-x <num-lines>

        Enable the display of context lines when previewing a set of
        substitutions and specify the number of context lines to display.
        Context lines are the unchanged lines that surround the line or lines
        that contain a substitution.  <num-lines> indicates the number of
        context lines to display both before and after the changed lines.

Parse Tool

    The parse tool searches each input line for one or more matching
    sub-strings (or 'fields').  Normally the regular expression is searched
    for once on each line but if --all is used the line is searched repeatedly
    until the line is exhausted.  By default, each matching line is printed
    with the matching parts highlighted (when colorization is enabled).  If
    --only-matching is used then only the matching parts are printed (with
    each match/group on a separate line).

    Note that so far this is very similar to grep.  The output format can be
    altered by the remaining options (some of which are are mutually-exclusive;
    i.e. only one can be used per invocation).

    If --single-line is used (and groups are present) then all group matches
    are printed on a single line separated by spaces.

    If --groups is used (and groups are present) only the matching groups are
    output.  Note that --all and --groups have a significant effect on exactly
    what gets parsed and output.

    --csv generates a CSV-like output format.

    --names can generate a shell-variable-like output format or, if used in
    conjunction with --csv, can define a header line for CSV output.

    More details on these options follow:

Options Specific to the Parse Tool:

    --only-matching,-o

        Display only the matching parts of each line (with each match/group on
        a seperate line).  By default the entire line is printed with matches
        highlighted (when colorization is enabled).

    --single-line,-1

        Causes all matches from the same line to be output on a single line
        separated by spaces.

    --groups,-g

        Output only the matching groups instead of the whole match.

    --csv

        Enables output to be formatted like a CSV file.  The results for each
        matching input line are printed on a single line with the fields
        separated by commas.  Any field that contains a comma is written in
        double quotes (embedded quotes are doubled, as per CSV standards).
        A simple example:

            $ rex --all '[a-z]+' --csv -s '--cat--bat--rat'
            cat,bat,rat

    --names,-n <name>[,<name> ...]

        Defines a list of names that is used to augment the results.  If
        --names is used in conjunction with --csv the names are output on the
        first line and serve as the CSV header defining the column names.
        Otherwise, --names enables the shell-variable-like output format.

        In this format, each <name> is printed in sequence followed by an
        equals sign ('=') and the corresponding matching text.  Ideally, the
        number of names given is equal to the number of parsed fields.  If
        there are fewer names than fields the extra fields are not printed.
        If there are more names than fields the extra names are printed with a
        blank string ("") after the equals sign.  The intention is to produce
        a set of definitions formatted like shell variables which may be
        'sourced' by a shell script.

        A simple example:

            $ rex --all '[a-z]+' --names 'CAT,BAT,RAT' -s '--cat--bat--rat'
            CAT="cat"
            BAT="bat"
            RAT="rat"

        An example using groups:

            $ rex --groups '([a-z]+)--([a-z]+)--' --names 'CAT,BAT,RAT' \
                -s '--cat--bat--rat'
            CAT="cat"
            BAT="bat"
            RAT=""

        A CSV example:

            $ rex --all '[a-z]+' --csv --names 'CAT,BAT,RAT' -s '--cat--bat--rat'
            CAT,BAT,RAT
            cat,bat,rat

    --format,-f <output-format>

        Defines a custom format for outputting parsed fields.  <output-format>
        works exactly like a replacement string, as used in substitution mode
        (see --subst), except that only the matching text is output and no
        error is generated for undefined group references, which expand to a
        blank string (this allows for partial matches on some lines).  When a
        custom format is in use, the matches are output exclusively by way of
        the expanded <output-format> string.

        A simple example:

            $ rex --all '[a-z]+' --format '\3:\U\2:\1' -s '--cat--bat--rat--'
            rat:BAT:cat