Download Latest Version Frodo.jar (133.6 kB)
Email in envelope

Get an email when there's a new version of Frodo, the LOTR

Home
Name Modified Size InfoDownloads / Week
README 2011-03-02 6.3 kB
Manual.txt 2011-03-02 8.7 kB
directives.txt 2011-03-02 25.0 kB
Frodo.jar 2011-03-01 133.6 kB
Totals: 4 Items   173.6 kB 0
Frodo, the LOTR
====================

Frodo is a Line-Oriented Text Rewriter, a tool for scripting bulk
rewrites of line-oriented text data. It can be used for reformatting
data, or for other editing tasks more challenging than
global-search-and-replace.

The easiest way to explain is just to give some examples.  Suppose
you have a data file that looks like this:

    last="Ambrose" first="Alice" bdate="1885-04-22"
    last="Baldwin" first="Bernard" bdate="1887-09-13"
    last="Corbin" first="Charles" bdate="1879-11-03"
    last="Dalton" first="Dennis" bdate="1881-01-07"

That's nice and all, but it would be more convenient if it were in
CSV format, with maybe a header that gave names to the column.  This
is a perfect job for Frodo.  Try this script on it.

    insert "LAST,FIRST,BDATE"
    while (
        replace /last="(.*)" first="(.*)" bdate="(.*)"/ |{1},{2},{3}|
        next
    )

You'll get something that looks like this.

    LAST,FIRST,BDATE
    Ambrose,Alice,1885-04-22
    Baldwin,Bernard,1887-09-13
    Corbin,Charles,1879-11-03
    Dalton,Dennis,1881-01-07

This is probably pretty obvious, except for the "replace" directive.

The first argument - the part between the slashes (/) - is a regular
expression.  Because Frodo is a Java application, these are the
java.util.regex.Pattern regular expressions.  The parentheses mark
capture groups.  When the regex matches the current line, the parts
of the string inside the parentheses are captured and saved to be used
later - for example, by the second argument.

The second argument of the replace directive - the part between the
vertical bars (|) - is a string format as defined by the
java.text.MessageFormat class.  The numbers inside the braces indicate
arguments which are replaced by the corresponding capture groups from
the previous regex match.  The entire line is then replaced by this
newly formatted string.

If we wanted, we could make the script slightly more readable by
giving names to the all the strings.

    define CsvHeader "LAST,FIRST,BDATE"
    define OldFormat /last="(.*)" first="(.*)" bdate="(.*)"/
    define NewFormat |{1},{2},{3}|

    insert CsvHeader
    while (
        replace OldFormat NewFormat
        next
    )

Let's amp it up to 0.5.  Suppose we want the output in XML format,
rather than CSV.  This script will do it.

    define InputFormat     /last="(.*)" first="(.*)" bdate="(.*)"/
    insert "<records>"
    while (
        match InputFormat
        insert "  <record>"
        insert |    <last>{1}</last>|
        insert |    <first>{2}</first>|
        insert |    <bdate>{3}</bdate>|
        insert "  </record>"
        remove
    )
    append "</records>"

The "match" directive captures the parts of the line for use by the
inserts.  Note that all lines are inserted before the current line
being edited.  When the current line is finally removed, all lines
after it move up in the list, and so the next line after it
automatically becomes the new current line.  Once we've processed all
lines, the match directive will fail because there is no current line,
and the while loop will exit, allowing the append directive to be
executed.

Here's the output.

    <records>
      <record>
        <last>Ambrose</last>
        <first>Alice</first>
        <bdate>1885-04-22</bdate>
      </record>
      <record>
        <last>Baldwin</last>
        <first>Bernard</first>
        <bdate>1887-09-13</bdate>
      </record>
      <record>
        <last>Corbin</last>
        <first>Charles</first>
        <bdate>1879-11-03</bdate>
      </record>
      <record>
        <last>Dalton</last>
        <first>Dennis</first>
        <bdate>1881-01-07</bdate>
      </record>
    </records>

Finally, let's amp it up to about a 1.

Suppose we want to convert the XML back to CSV.  Check out this script.

    define CSV_Header "LAST,FIRST,BDATE"
    define XML  /\s*<last>(.*)</last>\s*<first>(.*)</first>\s*<bdate>(.*)</bdate>"
    define CSV  |{1},{2},{3}|

    insert CSV_Header
    match "<records>" remove
    while (
        match "<record>" remove
        catenate 2 -- join the current line with the next two lines
        replace XML CSV
        next
        match "</record>" remove
    )
    match "</records>" remove

You might notice here that comments begin with a double dash and
continue to the end of the line.

Also notice that terminators are not required to mark the end of a
directive.  Each directive begins with a keyword, and the next keyword
marks the start of the next directive.  This means, for example, that

    match "<records>" remove

is the same as

    match "<records>"
    remove

So why do we match "<records>" when there are no capture groups to
extract?  Every directive either succeeds or fails, and a sequence of
directive continue only as long as every one of them succeeds.  If the
current line - in this case the first line - doesn't match the regex,
then the directive will fail and Frodo will stop processing the script.

It's possible to provide an alternate sequence of directives in case a
sequence doesn't complete.  For example,

    match "<records>"
    remove
    else
    log "Didn't find a <records> tag at the start of the file"
    fail

In this example, if both the match and remove directives succeed,
then neither the log or the fail directive will be attempted.  If the
match fails, or if the match succeeds but the remove fails, then Frodo
will continue at the "log" directive.  The remove will only be attempted
if the match succeeds.

Obviously, there's a lot more to Frodo than this.  See the programmer
manual for more information


How to Execute Frodo
====================

Frodo is a Java application.  Make sure you have Java installed.
Download the Frodo jar file from the dist folder, and run this command:

    java -jar frodo.jar <script-file> [ <input-file> [ <output-file> ] ]

The first argument, the script file is required.  If the output file
is not specified, the output will be written to the console.  If the
input file is also not specified, then the input will be read from the
console.
Source: README, updated 2011-03-02