rex-text-tool Wiki
Shell tool for processing text files with regular expressions
Brought to you by:
raybert
rex 1.4.0 (2018-09-17)
Usage: rex [options] <exp> [<inputs>]
Rex is a tool for processing text using regular expressions. Two specific
tools are provided: a substitution tool (similar to rpl or sed and vi's
's///' (subst) commands) and a parsing tool which can be used to parse fields
out of an input string and format them (similar to cut but using regular
expressions). Extended regular expressions are used in all cases.
Rex processes single lines of text from either files, strings (given on the
command line) or stdin. One or more regular files can be listed on the
command line or one or more strings can be specified using the -s switch (but
the two cannot be mixed). If neither are given standard input is processed.
Common Options:
-s <string>
A string to process. The -s option may be specified multiple times and
each is treated as a separate line of input (in the order listed on the
command line). This option may not be combined with files or stdin.
--all,-a
Find all matches on each line. The pattern search is repeated until the
entire line is exhausted. By default searching stops after the first
match on each line.
--ignore-case,-i
Ignore alphabetic case in regular expressions.
--verbose,-v
Display verbose output. Note that this may be counter-productive when
processing stdin.
--line-numbers,-l
Display line numbers.
--color,-c
Force output to be colorized. Colorizing is normally enabled
automatically if stdout is a tty.
--no-color,-C
Disable colorized output.
Substitution Tool
The substitution tool applies a replacement string, specified with the
--subst switch, to each line that matches the given regular expression.
The replacement string can have group references as well as certain
backslash character translations and operators.
All processed lines are printed to stdout or a file, regardless of
whether they were effected by substitutions or not. If the input is
stdin or one or more strings the lines are printed to stdout. If the
input is a file, the file is re-written with the new text. If the
--backup switch is used, the original file is renamed with a tilde
appended to its file name.
If the --preview switch is used, rex instead produces output on stdout in
a format similar to diff: the original line is prefixed with a minus sign
and the replacement line is prefixed with a plus sign along with file
names and line numbers (when these apply).
Options Specific to the Substitution Tool:
--subst,--rep,-r <string>
Enables the Substitution Tool and defines the replacement string.
All matches on all input lines will be substituted by the expansion
of <string>. <string> may contain group references, backslash
character translations and operators (see below).
When substitutions are performed on a file, the file is re-written
with the substitutions and the original file is overwritten (a backup
can be made using the --backup switch).
When substitutions are performed on stdin or on strings specified
using the -s switch, the results are written to stdout.
If --verbose is used, all substituted lines are written to stdout
using a diff-like format. (Note that this is counter-productive with
stdin.)
If --preview is used, all would-be substituted lines are written to
stdout using a diff-like format and no changes are actually made.
<string> may contain group references (sometimes called 'back
references'). These consist of a backslash followed by a single digit.
The digits 1-9 refer to groups in the regular expression, in the order
that they appear, and are substituted by such. Any given group may
match (and subsequently substitute) a blank string but it is an error
to refer to a group that does not exist. The digit 0 refers to the
entire match (note that some tools use the ampersand for this purpose
but rex does not recognize it). If --all is used, group references
refer individually to each single match of the regular expression
(rather than the entire line, in the case of multiple matches in a
single line).
<string> may contain certain operators that do not themselves insert
any text but which effect how other text is inserted. Presently the
uppercase (\U), lowercase (\L) and word case (\W) operators are
supported: they will cause the case of the next group substitution to
be modified accordingly.
<string> may contain certain backslash character translations, as
shown in this table:
\n newline
\r carriage-return
\t tab
\f form feed
\b backspace
\z ASCII 0
Any character in <string> that is preceeded by a backslash but is not
identified above is output verbatim without the backslash. To insert
a literal backslash a double backslash must be used. A backslash at
the end of a line is output literally.
--preview,-p
Do not actually make any changes. Would-be substitutions are
displayed using a diff-like format.
--backup,-b
When performing a substitution on a file, make a backup of the file.
Files are backed-up by renaming them with a tilde (~) on the end. If
a file with the same name as the backup file already exists the
corresponding input file will not be processed (a warning is issued).
--context-lines,-x <num-lines>
Enable the display of context lines when previewing a set of
substitutions and specify the number of context lines to display.
Context lines are the unchanged lines that surround the line or lines
that contain a substitution. <num-lines> indicates the number of
context lines to display both before and after the changed lines.
Parse Tool
The parse tool searches each input line for one or more matching
sub-strings (or 'fields'). Normally the regular expression is searched
for once on each line but if --all is used the line is searched repeatedly
until the line is exhausted. By default, each matching line is printed
with the matching parts highlighted (when colorization is enabled). If
--only-matching is used then only the matching parts are printed (with
each match/group on a separate line).
Note that so far this is very similar to grep. The output format can be
altered by the remaining options (some of which are are mutually-exclusive;
i.e. only one can be used per invocation).
If --single-line is used (and groups are present) then all group matches
are printed on a single line separated by spaces.
If --groups is used (and groups are present) only the matching groups are
output. Note that --all and --groups have a significant effect on exactly
what gets parsed and output.
--csv generates a CSV-like output format.
--names can generate a shell-variable-like output format or, if used in
conjunction with --csv, can define a header line for CSV output.
More details on these options follow:
Options Specific to the Parse Tool:
--only-matching,-o
Display only the matching parts of each line (with each match/group on
a seperate line). By default the entire line is printed with matches
highlighted (when colorization is enabled).
--single-line,-1
Causes all matches from the same line to be output on a single line
separated by spaces.
--groups,-g
Output only the matching groups instead of the whole match.
--csv
Enables output to be formatted like a CSV file. The results for each
matching input line are printed on a single line with the fields
separated by commas. Any field that contains a comma is written in
double quotes (embedded quotes are doubled, as per CSV standards).
A simple example:
$ rex --all '[a-z]+' --csv -s '--cat--bat--rat'
cat,bat,rat
--names,-n <name>[,<name> ...]
Defines a list of names that is used to augment the results. If
--names is used in conjunction with --csv the names are output on the
first line and serve as the CSV header defining the column names.
Otherwise, --names enables the shell-variable-like output format.
In this format, each <name> is printed in sequence followed by an
equals sign ('=') and the corresponding matching text. Ideally, the
number of names given is equal to the number of parsed fields. If
there are fewer names than fields the extra fields are not printed.
If there are more names than fields the extra names are printed with a
blank string ("") after the equals sign. The intention is to produce
a set of definitions formatted like shell variables which may be
'sourced' by a shell script.
A simple example:
$ rex --all '[a-z]+' --names 'CAT,BAT,RAT' -s '--cat--bat--rat'
CAT="cat"
BAT="bat"
RAT="rat"
An example using groups:
$ rex --groups '([a-z]+)--([a-z]+)--' --names 'CAT,BAT,RAT' \
-s '--cat--bat--rat'
CAT="cat"
BAT="bat"
RAT=""
A CSV example:
$ rex --all '[a-z]+' --csv --names 'CAT,BAT,RAT' -s '--cat--bat--rat'
CAT,BAT,RAT
cat,bat,rat
--format,-f <output-format>
Defines a custom format for outputting parsed fields. <output-format>
works exactly like a replacement string, as used in substitution mode
(see --subst), except that only the matching text is output and no
error is generated for undefined group references, which expand to a
blank string (this allows for partial matches on some lines). When a
custom format is in use, the matches are output exclusively by way of
the expanded <output-format> string.
A simple example:
$ rex --all '[a-z]+' --format '\3:\U\2:\1' -s '--cat--bat--rat--'
rat:BAT:cat