Menu

Tree [1fc891] master /
 History

HTTPS access


File Date Author Commit
 .mailmap 2016-01-05 Steffen (Daode) Nurpmeso Steffen (Daode) Nurpmeso [e058f4] Bump v0.8.7; copyright 2016; move to sdaoden.eu
 README 2022-12-22 Steffen Nurpmeso Steffen Nurpmeso [1fc891] README: clarify term "input checksum"
 config.rc 2022-11-09 Steffen Nurpmeso Steffen Nurpmeso [d5a69f] New source code style
 footer 2015-05-06 Steffen (Daode) Nurpmeso Steffen (Daode) Nurpmeso [728b49] Update Copyright to 2015, modeline adjustments
 header 2019-04-19 Steffen Nurpmeso Steffen Nurpmeso [c41f6f] config.rc,header: use new better <^> expansion,...
 s-web42 2022-11-09 Steffen Nurpmeso Steffen Nurpmeso [d5a69f] New source code style
 template.html-w42 2020-05-12 Steffen Nurpmeso Steffen Nurpmeso [89105f] Adjust copyright for 2020, style tweaks
 test.sh 2021-01-16 Steffen Nurpmeso Steffen Nurpmeso [96ba9d] test.sh: add another test

Read Me

S-Web42
#######

:Author:    Steffen (Daode) Nurpmeso
:Contact:   steffen@sdaoden.eu
:Copyright: ISC license
:Date:      1997 - 2005, 2010, 2012 - 2022
:Version:   0.9.4
:Status:    Red and Hot!

.. _`S-Web42`: https://www.sdaoden.eu/code.html#s-web42
.. _`S-SymObj`: https://www.sdaoden.eu/code.html#s-symobj
.. _`perl(1)`: http://www.perl.org

.. Expansions used in the text: file expansion-trigger RE, ..

.. |ssym| replace:: `S-SymObj`_
.. |ssymscm| replace:: ``https://git.sdaoden.eu/scm/s-symobj.git``
.. |ssymcgit| replace:: ``https://git.sdaoden.eu/cgit/s-symobj.git``

.. |fextre| replace:: ``(.*?)(-w42(?:-(x|[icewpatsm]+))?(-new)?)$``
.. |checksum| replace:: MD5 checksum

Overview
========

`S-Web42`_ is one more option to manage your website.  It offers the
possibility to expand a directory hierarchy of input files, some of
which may consist of a mixture of PIs, (uninterpreted) (X)HTML or XML
markup and normal text.  In those which do, PIs can be defined and
undefined, they may be defined to take arguments (think format strings),
which can be, e.g., a handy way to create complex tables; `perl(1)`_
code can be evaluated, files can be included, also recursively; simple
control statements can be used to conditionalize file content, and
some simple MarkLo tags aid in make editing life a bit easier, too.

The ``s-web42`` converter is a `perl(1)`_ script, i.e., it needs an
installed Perl, version 5.8.1 or above.  It also requires the authors
|ssym| module -- you may install it as a regular part of your `perl(1)`_
installation by issuing the command ``$ cpan S-SymObj``.
Alternatively you may also get it from http://www.CPAN.org or clone the
git repository from |ssymscm| (browse it at |ssymcgit|).

.. contents::

Synopsis
========

::

   s-web42 [-v[v]]
   s-web42 [-v[v]] [--no-rc] --no-update-cache (or: --nuc)
   s-web42 [-v[v]] [--no-rc] --force-rebuild [--nuc]
   s-web42 [-v[v]] [--no-rc] --expand-one [FILE] (or: --eo [FILE])

   perl -C -I/PATH/TO/SymObj.pm s-web42

   PERL5LIB=/PATH/TO/SymObj.pm
   export PERL5LIB
   s-web42

Use ``--no-rc`` to suppress reading of an existent ``config.rc`` file.
The ``--no-update-cache`` option can be used to suppress an update of
the cache database -- i.e., rerunning ``s-web42`` again will generate
the very same output.  With the ``--force-rebuild`` option an existing
cache database can be ignored so that effectively everything is rebuild
from scratch; this mode may be combined with ``--no-update-cache``.

The ``--expand-one`` option will read the programs standard input or
``FILE``, if one was specified, expand and filter it just as described
below, and write the resulting content to the programs standard output.
This is an isolated and special mode in that none of the described
website management actions are performed, except of reading in the
optional configuration file, as described below.

The ``-v`` option can be used to gain some verbosity, using it twice
will be even more verbose; if used in conjunction with ``--expand-one``
these messages will go to the standard error instead.  The trailing
two examples show how to extend the `perl(1)`_ ``@INC`` path so that
the required `S-SymObj`_ module will be found without installing that
in a regular place, a task that often requires administrator privileges.
And please be aware that no effort is put in parsing command line
arguments: one needs to use the very format shown above.

.. note::
   S-Web42 is charset agnostic.  It reads and writes files, and simply
   reuses the character encoding that the user placed in the
   ``locale(7)`` environment, for example in ``LC_ALL``.

WhatIs?
=======

S-Web42 consists of only one file: ``s-web42``.  In the repository
there is also the script ``test.sh``, which is the unit test of S-Web42,
and ``header``, ``footer``, ``template.html``, ``hook`` and
``config.rc``, but these form a (somewhat primitive and rather identical
to the authors very own website) usage example only, they are not
required for operation.

Once invoked, the converter script ``s-web42`` requires the subdirectory
``site`` to exist in the current working directory, since that is used
as the input tree.  It will also assume it can use the filenames
``cache.dat`` (the cache database), ``cache.old`` (database state
before last run), and, temporarily, ``*.tmp`` in there.  The generated
shell archive will always be ``w42-update.sh``, and it will not have
any executable bits set.

If the file ``config.rc`` is found in the working directory, and the
``--no-rc`` command line option has not been used, it will be read --
the `Assignments`_ of PI variables seen there will form the outermost
context and will thus be inherited by all files under ``site`` (PIs
can be overridden and undefined for anything *deeper* in the hierarchy
only).  Here, and *only* here is it possible to specify some very
`Special PI variables`_.

The presence of a file ``hook`` is also recorded, and it will be used
as a per-directory fallback hook in all those ``site`` subdirectories
which do not provide their own.  The optional per-directory hooks can
be used to create and/or modify the contents of the directory (and
subdirectories and only) on the fly.  It is a fatal error if such a
hook is not an executable program or if it does not exit successfully.
Hooks will be given one command line argument, and that is the current
working directory from within which the hook is run.

Then the converter will enter the ``site`` directory and recursively
parse the tree therein.  An existent per-directory hook is run at that
point, and before deeper levels of the hierarchy are entered.  Note
that the directory entries ``SCCS``, ``CVS`` and anything which starts
with a dot (thus also ``.git`` and ``.RCS``) are completely ignored
by S-Web42, except that files may be included when placed in the
`WHITELIST`_ variable of ``config.rc``.  It is not possible to savely
create directories on the fly except in deeper, not yet parsed
hierarchy levels below the current directory, too.

.. note::
   S-Web42 will not handle paths with embedded quotation marks, i.e.,
   neither ``"`` nor ``'`` characters may be used for directory- and
   filenames.  You should possibly be conservative about filenames in
   general, mostly in respect to embedded whitespace, as the implemented
   quoting rules are primitive, yet sufficient.

After the entire tree has been traversed like that once, the content
of all directories will be reread, and the resulting tree will be
compared against the version recorded in the cache database.  Anything
which is missing or which seems to be modified will be rebuild, either
as a bitwise copy of the source or as the result of S-Web42 filtering,
as below.  Filtering will be applied to all files which' name includes
the string ``-w42``, or, to be exact, which name' end with a string
that matches |fextre|.

S-Web42 assumes that a file is modified if its modification timestamp
is different from that recorded in the cache database (but see
`IGNORE_MODTIME`_), and if its |checksum| is different from the recorded
one.  One can force rebuilding of S-Web42-filtered files on a per-file
base by using a filename that includes the suffix ``-new``, as in
``index.html-w42-new`` -- neither modification time nor input checksums
will matter for such files, they will *always* be rebuild.
Input checksum of S-Web42-filtered files means that the configuration
and the `Assignments`_ part have been fully expanded, but anything
thereafter is treated bitwise.

.. note::
   Files included via the ``<?include?>`` PI are *not* checked, i.e.,
   no dependency tracking is performed.
   The `Assignments`_ directive ``?include?`` will however be resolved
   and therefore affects the checksum that may cause file rebuilding.

Any file rebuilt non-bitwise is then checksummed again, and will *only*
be part of the result if the |checksum| of the rebuilt target is
different from the checksum stored in the cache database, but not
otherwise.  (The only exception to this rule is if the user uses the
``--force-rebuild`` command line option, as that will rebuild everything
from scratch.)

So, what will the result look like?  One may be astonished, but the
result will be generated as a shell archive with ``uuencode(1)``\ d
(and, optionally, `COMPRESS`_\ ed) members.  These archives can either
be used to update a local mirror (``$ sh w42-update.sh --local
TARGET-PATH``) or as a ``sftp(1)`` batchfile (``$ sh w42-update.sh
--sftp TMP-PATH | sftp -b - user@host[:dir]; rm -rf TMP-PATH``).  In
the latter case ``TMP-PATH`` must be some temporary path that can be
used by the archive to unpack its contents therein (it will issue a
``mkdir TMP-PATH``, i.e., create that directory as necessary).

During operation, the generated shell archive will try to ignore any
errors, continuing its operation until all commands have been issued.
E.g., in ``sftp(1)`` batch mode, the termination-on-error is suppressed
by prefixing commands with a hyphen ``-``.  Any error, and some other
informational messages will be logged to standard error, however.

File and directory removals will also be properly handled.  However,
if the cache database (a textfile) is lost, then the only solution is
to delete the target directory manually and to rebuild it with a new
from-scratch archive.

On filenames and -content
=========================

S-Web42 does not look into files unless their name ends with a special
suffix.  If that is not seen, files will be treated as bins of undefined
binary content and handled bitwise.  If, on the other hand, the filename
ends with the suffix |fextre|, then it will be subject to content
filtering, as described in the rest of this document.  Let us inspect
that cryptic expression:

``(.*?)``
   S-Web42 does not care, simply name any file the way you want, with
   or without a file extension.  It does not matter.

``(-w42 ... )$`` -- S-Web42 filter trigger
   If, after the ignored part, the string ``-w42`` is seen, as in
   ``index.html-w42``, then this file is flagged as being subject to
   S-Web42 content filtering.

   .. note::
      The trailing S-Web42 specific part will be stripped from the
      filename so that, e.g., an input file ``index.html-w42-new``
      will produce an output file ``index.html``.

``(?:-(x|[icewpatsm]+))?`` -- Mode configuration
   It is possible to fine tune the behaviour of S-Web42 content filtering
   and expansion by continuing the ``-w42`` suffix with a hyphen ``-``
   and then either the letter ``x`` or any combination of the letters
   ``icewpatsm``.  This is described in detail in the section
   `About.Filter`_.  An example of such a filename would be
   ``index.html-w42-cea``.

(-new)? -- Forced rebuild trigger
   If the suffix ends with a trailing ``-new`` part, then both, the
   modification time and the files input checksum will not be used to
   decide whether a file has been modified or not.  Instead it will
   always be rebuild, and the decision whether to include the file in
   the result or not is based upon the checksum of the generated file.
   A filename example would be ``index.php-w42-x-new``.

Moreover, all files that are subject to S-Web42 content filtering (and
``config.rc`` but this is a special case in that S-Web42 will complain
if it contains anything else but the `Assignments`_ part) have to
comply to a very specific content layout scheme, henceforth called
a S-Web42 *context*:

   | `Assignments`_
   | ``<?begin?>``
   | `Content`_
   | ``<?end?>``
   | `Ignored`_

About.Filter
------------

These filter operations will be performed on the `Assignments`_ part
of *all* *contexts* which are subject to filtering.  They will also be
applied to the `Content`_ part unless the mode configuration suffix was
``-x`` or otherwise excluded these actions:

*Drop of trailing whitespace* (cannot be disabled)
   A lines' trailing whitespace is discarded.

*Drop of introductional whitespace* (disable mode: ``i``)
   A lines' leading whitespace is discarded.  This step will always be
   performed on follow lines after escaped newlines, as below.

*Handling of shell style comments* (disable mode: ``c``)
   If the first non-whitespace character of a line is a number sign
   ``#``, then this line is a comment and as such discarded.

*Escaping of newlines* (disable mode: ``e``)
   If the last character of a non-comment line is a backslash that is
   not itself escaped by a(n uneven number of) backslash(es), then the
   next line is joined with the current line after its leading whitespace
   has been discarded.

*Wiping away empty lines* (disable mode: ``w``)
   If a line is empty then remember it was there but ignore it otherwise.

These filter operations will be performed on the `Content`_ part unless
the mode configuration excluded them:

*PI expansion* (disable mode: ``p``)
   Processing Instructions will be expanded.

These filter operations will be performed on the `Content`_ part unless
the mode configuration suffix was ``-x`` or otherwise excluded these
actions:

*Automatic paragraphs* (disable mode: ``a``)
   If a textblock is surrounded by empty lines it will be enclosed
   in a ``<p></p>`` pair unless the block seems to be enclosed in
   a tag, or unless so-called "mode-switching" PIs are used within
   it.  I.e., no automatic paragraph will be provided for a otherwise
   perfectly legal textblock if an ``<?include?>`` directive is
   contained therein, or ``<?perl?>`` or even a ``<?pre?>``.
   Automatic paragraphs are ment for human friendly editing of,
   well, paragraphs, not for fancy markup.  By starting such a
   paragraph with a special trigger character sequence several
   different kinds of markup can be generated automatically:

   ``= Text``
      Generates a heading; ``=`` generates a ``h1``, ``==`` a ``h2``
      etc., and ``======`` generates a ``h6``.

   ``_ Text``
      Generates a blockquote.

   ``* Text``
      Generates a bullet list.

   ``DecimalDigits. Text``
      Generates a numbered list with an item that uses a value
      that equals "DecimalDigits".

   ``@ Text1 @ Text2``
      Generates a definition list.  "Text1" will be the content of the
      definition term, and "Text" will form the body of the item.

   ``-----``
      5 or more hyphen characters create a separating horizontal rule.
      This is a bit special because the textblock must solely consist
      of this single line.

   Example::

      == It is not a Wiki!

      You would not believe what i saw:

      * Cats

      * Mice

      * Birds

      -------

      _ Wow!

      _ Or Wuff-Wuff!

      =>

      <h2>It is not a Wiki!</h2>
      <p>You would not believe what i saw:</p>
      <ul><li><p>Cats</p></li><li><p>Mice</p></li><li><p>Birds</p></li></ul>
      <hr />
      <blockquote><p>Wow!</p><p>Or Wuff-Wuff!</p></blockquote>

*Tagsoup joining* (disable mode: ``t`` *TODO: not yet implemented*)
   The DOM standard (http://www.w3.org/TR/DOM-Level-3-Core) is used
   (by browsers and such poor software) to transform site content to
   DOM objects.  Unfortunately code like::

      </p>
      <p>

   (may) result(s) in a useless DOM object covering the newline in
   between the two tags.  To circumvent that S-Web42 tries to join
   tags like these together.

*Whitespace normalization* (disable mode: ``s``)
   Once the line content is fully expanded (leading and) trailing
   whitespace is removed (again) and multiple adjacent whitespace
   characters are squeezed to a single (ASCII) space character.

*MarkLo expansion* (disable mode: ``m``)
   Some *MarkLo* markers will be converted to markup.  Expanded content
   is reevaluated until no more expansion is possible.  *MarkLo*
   detection and expansion is neither performed across newline boundaries
   nor PI occurrences, and there is no possibility to escape *MarkLo*
   expansion (via backslash escaping for example) except by turning it
   off entirely.  It is possible to embed a closing brace by escaping it
   like that, however::

      \c{CONTENT} -> <tt>CONTENT</tt>
      \i{CONTENT} -> <em>CONTENT</em>
      \b{CONTENT} -> <strong>CONTENT</strong>
      \u{CONTENT} -> <u>CONTENT</u>

      # (These are bit special, but nice to use)
      \a{NAME}    -> <a name="NAME"></a>
      \l{LINK}    -> <a href="LINK">LINK</a>

      \i{I \b{really \u{{love\}} you}, baby!}
      ->
      <em>I <strong>really <u>{love}</u> you</strong>, baby!</em>

Finally a ``-w42`` file content example::

   WHO = Ziggy
   <?begin?>

   Yo S-Web42.

   <?WHO?>.
   # This is a comment line.
      # Yet another comment line
    This LN \
      use\
         s \i{NL} escaping.\\\\

   <?end?>
   ->
   <p>Yo S-Web42.</p><p>Ziggy. This LN uses <em>NL</em> escaping.\\</p>

But -- feel free::

   <?begin?>\i{Hello}, S-Web42.<?end?>
   ->
   <em>Hello</em>, S-Web42.

Assignments
-----------

The content is prefixed by the (PI) variable assignment block. ::

   var1 = content of var1
   var2 ?= conditional assignment (if not yet defined)
   var3 += content (assigned or) added to var3
   var4 @= var4 will be an array, this is the first value
   var4 @= the var4 array gains another value
   var5 ?@= conditional array assignment (if not yet defined)
   var5 @= append a member to the now anyway existent var5 array
   var6 = <em>markup</em><?def foo<>stupid example?><?foo?><?undef foo?>
   ?include? = path

All PI variables defined like that may contain any content, including
complete PIs and markup.  There is no technical difference in between
these and stuff defined via `def`_ and `defa`_ (and `defx`_) with the
exceptional possibility to include the PI start and close tags ``<?``
and ``?>``, respectively (as shown in the ``var6`` example).  S-Web42
PI and variable names may consist of alphabetical characters, digit
characters and the hyphen (``-``).  Note that the case matters, just
as usual for XML processing.

``=``
   Assignment to variable.
``+=``
   Assign to non-existent variable, else append value to it.
``?=``
   Assign to variable, but only if that not yet exists.
``@=``
   Array assignment -- create array as necessary and push a value onto
   it.
``?@=``
   Push value onto array, but only if that is to be newly created.

The ``?include?`` directive can be used to include assignment directives
from other files, also recursively.  It is not allowed to start the
`Content`_ section whilst doing so.

   .. note::
      If the path starts with a tilde ``~``, that will be replaced by
      the value of the environment variable ``HOME``.  Likewise, if
      the path starts with a plus sign ``+``, then that will be replaced
      with the content of the environment variable ``WEB42INC``.  Note
      it is really that simple.

   .. note::
      By **definition** there is really no notion of a "chroot".  One
      may leave ``site`` absolutely or relatively, just as desired.

Content
-------

Everything in between the ``<?begin?>`` and ``<?end?>`` PIs is expanded
and will thus produce real output.  Line content is recursively expanded
until no further expansion is possible.

Ignored
-------

Any content after the ``<?end?>`` PI is not parsed at all.

Special PI variables
====================

There are a few PI variables which are treated in a special way, either
because they can only be set in ``config.rc``, or they are readonly
PIs that are provided automatically, or because they are used by
`Processing Instructions (PIs)`_ as content-injection hooks.

_`COMPRESS`
   **Only recognized in config.rc**.
   If used, it can be set to any of ``gzip``, ``bzip2``, ``xz`` and
   ``lzma``; it thus specifies a un-/compression method to be used
   before ``uuencode(1)``\ ing shell archive members.  Note that the
   corresponding `perl(1)`_ module is lazy loaded upon request: at the
   time of this writing only ``gzip`` and ``bzip2`` are shipped with
   a standard `perl(1)`_ installation.

_`IGNORE_MODTIME`
   **Only recognized in config.rc**.
   It set, causes file modification times to be ignored when deciding
   whether a file has to be updated or not.  Maybe necessary if
   ``git(1)`` is used as a source code control system.

_`WHITELIST`
   **Only recognized in config.rc**.
   A set of file globs of files to include in the output even if they
   would be normally ignored, eg, because they start with a dot.

_`MODTIME_AUTC`, _`MODTIME_ALOCAL`, _`MODTIME_SUTC`, _`MODTIME_SLOCAL`
   **Readonly**.
   Here ``_A`` stands for "array" and ``_S`` for "string".  These are
   readonly PIs that correspond to the modification time of the currently
   processed (outermost) file, in UTC and LOCAL time, respectively.
   The order of the arrays is: 0=year, 1=month, 2=day, 3=hour, 4=minute,
   5=second.  The entry at index 6 is the string "UTC" for the UTC
   versions and the offset from UTC in the ISO 8601:2000 standard
   format (+hhmm or -hhmm) otherwise.  The strings use the format
   "YYYY-MM-DD HH:MM:SS" and, again, the UTC version appends the string
   " UTC" whereas the local version appends the string " +-ZONEOFFSET".

   .. note::
      If the ``Time::Piece`` `perl(1)`_ module cannot be loaded (it is
      believed to be a standard module since Perl version 5.10), then
      S-Web42 will only provide UTC values, even for the local
      versions.  I.e., the local ones are only aliases, then.

_`NOW_AUTC`, _`NOW_ALOCAL`, _`NOW_SUTC`, _`NOW_SLOCAL`
   **Readonly**.
   Identical to `MODTIME_AUTC`_ and friends, as above, but expand to
   the current time instead.

_`WWW_PREFIX`, _`WWW_SUFFIX`
   **Injection**.
   Will be injected before and after expansions of `href`_ and `hreft`_,
   respectively.  Default to the empty string.

Processing Instructions (PIs)
=============================

It follows the list of predefined processing instructions.  PIs marked
"paired" in the list below are special in that they need a closing
``end`` tag -- to, e.g., end the paired PI ``<?perl?>`` the PI
``<?perl end?>`` is necessary.

For `def`_, `defa`_ (also via implicit array vivification, as below)
and `defx`_ the same name restrictions apply to the introduced PI
variable name as has been documented for `Assignments`_.  Note that
the content that is assigned to PI variables created by those PIs may
not contain PIs themselves, of course.  It is however possible to use
the pseudo-tags ``<^`` and ``^>`` as aliases there, which will be
automatically converted to ``<?`` and ``?>``, respectively, and via a
simple regular expression, once the PI variable is expanded; to create
an empty tag ``<>``, the alias ``<^>`` has been provided for completeness
sake.  See `def`_ and `defx`_ for examples.

_`def`
   Define a PI which expands to its value part when used. ::

      <?def var1<>varcontent, may contain <em>tagsoup</em>?>
      <?def var2<>any <> content but PI start and end tags?>
      <?def var3<><^def var4<>es^><^var4^><^undef var4^>?>
      ...
      <?var1?>
      <?var2?>
      T<?var3?><?pi-if var4?>T
      ->
      varcontent, may contain <em>tagsoup</em>
      any <> content but PI start and end tags
      TesT

_`defa`
   Define or extend a PI that serves as an array.  Separate members
   are indicated by placing the empty tag ``<>`` in the value content
   (which is different to ``def``, which would simply expand the empty
   tag).  Individual array members may be accessed by "calling" the
   array with the desired member index (starting at 0) as an argument::

      <?defa arrnam<>m 1?>
      <?defa arrnam<>m 2?>
      <?defa arrnam<>m 3<>m 4?>
      ...
      <p><?arrnam 0?><?arrnam 1?>\
         <?arrnam 2?><?arrnam 3?></p>
      ->
      <p>m 1m 2m 3m 4</p>

   One may loop over an entire array by giving *loop* as the first
   argument.  In this mode two additional, optional arguments may be
   given; the first will be injected before the value, the second
   after the value.  E.g.::

      <p><?arrnam loop<><b><></b>?></p>
      ->
      <p><b>m 1</b><b>m 2</b><b>m 3</b><b>m 4</b></p>

   If the first argument is instead one of *unshift* and *push*, then
   the remaining arguments are joined to a single one and inserted at
   the front/the back of the array, respectively, auto-vivificating
   the array as necessary.  Likewise, an argument of one of *shift*
   and *pop* removes the first/the last member of the array, causing
   a log message in verbose mode if the array does not have any members.
   And an argument *undef-empty* will `undef`_ the array if it is
   empty.  E.g.::

      <?test push<>entry 1?>
      <?test unshift<>entry 0?>
      <?test push<>entry 2?>
      <?test loop<><> ?><br />
      <?test pop?>
      <?test shift?>
      <?test loop<><> ?><br />
      <?test pop?>
      <?test loop<><> ?><br />
      <?test undef-empty?>
      ->
      entry 0 entry 1 entry 2 <br />entry 1 <br /><br />

_`defx`
   Define a PI which takes arguments (think format strings).  Arguments
   are indicated by placing the empty tag ``<>`` in the value content
   -- those tags will be replaced by the corresponding user-supplied
   argument when used (which is different to ``def``, which would
   simply expand the empty tag). ::

      <?defx var2<>Expanded <> var <> content <>?>
         <?defx note<><em><></em>: <>.?>
      <?defx subscription<><^note For subscribers only<^><>^>?>

      ...
      <?var2 arg1<>arg2<>arg3?>
      <?subscription nonexistent list?>
      ->
      Expanded arg1 var arg2 content arg3
      <em>For subscribers only</em>: nonexistent list.

_`href`, _`hreft`, _`lref`, _`lreft`
   These purely convenience PIs expand to (X)HTML hyperlinks; the first
   two should be used to create links which leave the site (see
   `WWW_PREFIX`_ and `WWW_SUFFIX`_), the other two for site-local ones.
   The versions without the ``t`` suffix take one argument, the others
   will also create a ``title`` attribute and thus require two
   arguments. ::

      <?href http://www.netbsd.org?>
      -> <a href="http://www.netbsd.org">www.netbsd.org</a>
      <?hreft http://www.opensource.org<><em>Lots</em> of licenses?>
      -> <a href="http://www.opensource.org" title="Lots of licenses"
         ><em>Lots</em> of licenses</a>

_`ifdef`, _`ifndef`, _`else`, _`fi`
   Simple conditional control statements which test for (non)existence
   of a variable (PI) and process the enclosed block only if the
   condition is true.  (You may also "test" for ``0``, which evaluates
   as not defined.) ::

      <p>
      <?ifndef HOMEPAGE?>
       <?lreft index.html<>[HOME]?>
      <?else?>
       Jo-ho hooo -- welcome on my homepage, dude!
      <?fi?>
      </p>

_`include`, _`xinclude`, _`raw_include`, _`frank_include`
   The PI ``<?include?>`` can be used to include a file that itself
   will be subject to the same expansion and filtering that is in use
   for the including file, i.e., it must form a valid S-Web42 *context*.
   Paths are interpreted relative to the source directory of the file
   which uses the include directive.

   ``<?xinclude?>`` is similar to ``<?include?>``, except that the
   included file is expected to (implicitly) consist of `Content`_ only,
   so that its inclusion may modify the PI environment of the *current*
   *context*.

   The ``<?raw_include?>`` PI will simply include the given file raw
   and **without any** S-Web42 processing.  ``<?frank_include?>`` will
   also include the given file raw, but it will process the lines and
   escape ``&``, ``<`` and ``>`` characters, so that HTML parsers will
   be able to display the (rather) raw content as desired. ::

      HOME = ../index.html
      <?begin?>
      <?include ../../header?>

      Hello World, v2.

      <pre>
      <?frank_include /etc/passwd?>
      </pre>

      <?include ../../footer?>
      <?end?>

   .. note::
      If the path starts with a tilde ``~``, that will be replaced by
      the value of the environment variable ``HOME``.  Likewise, if
      the path starts with a plus sign ``+``, then that will be replaced
      with the content of the environment variable ``WEB42INC``.  Note
      it is really that simple.

   .. note::
      By **definition** there is really no notion of a "chroot".  One
      may leave ``site`` absolutely or relatively, just as desired.

_`mode`
   This PI can be used to change the modes described in `About.Filter`_
   on the fly; the required argument must either be ``%``, in which
   case the previously active mode is restored, or a combination of
   the filter mode configuration characters.  Mode changes are inherited
   by deeper contexts, but they will not be propagated to outer ones.
   Note that PI expansion mode cannot be disabled like that.  Be aware -
   you should really know what you are doing when you use this PI.

_`perl`, _`sh`, _`xperl`, _`xsh`
   These are paired statements which can be used to embed ``perl(1)``
   or ``sh(1)`` code, respectively.  The code will be executed in a
   subprocess, with its working directory set to that of the *topmost*
   *context*, i.e., the file that is currently being produced output for.

   The output that the subprocess produces on its standard output
   channel will be subject to the same expansion and filtering that is
   in use for the surrounding *context*.  In fact the output is expected
   to form a valid S-Web42 *context* except for the *x*\ -versions,
   which are expected to produce `Content`_ output (implicitly) only,
   and which thus modify the PI environment of the *current* *context*.

   The code is subject to the normal S-Web42 processing before it is
   passed through to the subprocess that runs the interpreter.  I.e.,
   PI expansion can be used to "pass arguments" through.  This raises
   the question how a valid S-Web42 *context* can be produced if PIs
   are expanded during code evaluation.  Well, it turns out that S-Web42
   injects four variables automatically:

   =========  =============
   ``PIS``    ``<?``
   ``PIE``    ``?>``
   ``BEGIN``  ``<?BEGIN?>``
   ``END``    ``<?END?>``
   =========  =============

   (Since it is not necessary to quote the closing ``?>`` of a PI
   ``$PIE`` is provided only for completeness sake.) So -- unfortunately
   one has to perform the uncomfortable task of PI building via string
   concatenation to avoid unwanted data expansions.  E.g.::

      <?perl?>
      my $gmd = gmtime;
      my $lmd = localtime;
      my $user = fetch_user();

      print <<__EOT__;
      USER = $user
      ${BEGIN}
      ${PIS}include +header?>
      <h1>Hi ${PIS}USER${PIE}!</h1>

      It is ${lmd}.
      That is ${gmd} UTC!

      ${PIS}include +footer?>
      ${END}
      __EOT__
      <?perl end?>

      # This works:
      $ echo '<?begin?>START<?sh?>' \
      > 'printf "${PIS}BEGIN?>IN${PIS}END?>" | tr [:upper:] [:lower:]' \
      > '<?sh end?>END<?end?>' | s-web42 --no-rc --eo

      # An <?eval PERL-EXPR?> may at some time be a regular PI..
      $ echo '<?begin?><?defx eval<><^xperl^><><^xperl end^>?>\
      > <?eval my $t=gmtime; print $t?><?end?>' | s-web42 --no-rc --eo

   .. note::
      The entire -- expanded -- script is read into memory before
      being passed to the interpreter that runs in the subprocess and
      will ``eval`` the script.  These PIs cannot be nested
      (especially not with each other).

_`pi-if`
   If it is unsure whether a PI (variable) exists, use this PI to
   "invoke" it, instead of the PI itself.  I.e., pass arguments etc.
   just as you would do if you would use the PI directly, but give the
   name of the PI as the first argument.  ``pi-if`` will not cause
   errors but only produce some log messages if the used PI does not
   exist.

_`pre`, _`xpre`, _`xcdata`
   The "preformatted" paired PIs.  While these PIs are active *none* of
   the filters that have been documented in `About.Filter`_ are active
   except for the PI expansion filter (indeed they create a new
   *context* for their content).  Since that is mostly of interest inside
   of HTML ``<pre>`` tags, the expansion of the ``<?xpre?>`` PI contains
   this tag already.  And the ``<?xcdata?>`` PI enwraps the content in
   a CDATA section in addition.

   .. note::
      Leading whitespace on the line of the ``<?xpre end?>`` will of
      course be copied through to the output.
      This may look odd in an otherwise beautifully indented source.

_`undef`
   Undefine an user defined array or PI.

   .. note::
      This undefines in the current *context* **only**!  I.e., if
      ``file1`` defines the PI ``DOIT`` and then includes ``file2``,
      then an ``<?undef DOIT?>`` from within ``file2`` will **not**
      affect ``file1``, but only ``file2`` and deeper *contexts*.

News
====

0.8.1
-----

* The ``NOW_*`` series of PI variables has been added.

* New PIs: ``<?frank_include?>``, ``<?xcdata?>``.

* Fixed bug in the template ``header`` file.

0.8.2
-----

* Automatic paragraphs have been diversed and now support different
  target markups, like headings, blockquotes and lists.

0.8.3
-----

* Automatic paragraphs now also support definition lists.

* Added anchor and hyperlink MarkLo support.

0.8.4
-----

* Thanks to the suggestion of Dave Mitchell from perl-porters the
  MarkLo regular expression is now proof against user input syntax
  errors and no longer enters a rather endless loop if a closing brace
  has been forgotten.

0.8.5
-----

* The unit tests should now work with all shells.

* Automatic paragraphs are now able to join successive instances: e.g.,
  X successive bullet list items are now joined into a single bullet
  list, instead of producing X bullet lists.

0.8.6
-----

* The automatic paragraph commit and tag v0.8.5 were premature.
  A bit better now.

0.8.7
-----

* The special PI variable ``WWW`` variable has been removed, instead
  there are now `WWW_PREFIX`_ and `WWW_SUFFIX`_.

* We are a bit more relaxed regarding uudecode(1) POSIX compatibility,
  and now give a ``-o /dev/stdout`` command line option instead of
  requiring that the in-stream /dev/stdout variant is properly
  understood.

0.8.8
-----

* Fix the test which was broken in v0.8.7 after not being updated to
  reflect the change to `WWW_PREFIX`_ and `WWW_SUFFIX`_.

* MarkLo is now implemented recursively, and closing braces can be
  embedded by escaping them with a backslash.

0.8.9
-----

* Bugfix!  Or, at least that is what i think today, it may have been
  desired (we explicitly warn), but let's just not transport ``mode``
  changes to outer contexts.

0.9.0
-----

* S-Web42 no longer needs external programs for creating the shell
  archive.  (My performance increasement was pretty dramatical.)

* The ``<?mode?>`` PI has been rethought and using it for adjustments
  no longer works on a global basis, but changes will not affect outer
  *contexts*.

  Before that rewrite a long standing bug of ``<?mode?>`` had been fixed
  which mattered for false use cases where the reverting ``<?mode %?>``
  had been forgotten.

0.9.1
-----

* Add support for variables with empty values in the assignment block,
  as well as for empty arrays.

* Improve recursive expansion of <^> constructs within normal PIs.
  Especially handy for arrays.
  The shipped config templates make use of that and use <^TOPDIR^>
  prefix for URLs, so that an entire site can be driven with a single
  template instance.

* We no longer ignore files starting with a dot when they are placed in
  the new WHITELIST variable of ``config.rc``.

0.9.2
-----

* Improve recursive expansion of <^> constructs even further.
  This time with test case.  Should now just do fine everywhere, but
  i am very far out this perl code, on the other hand.

0.9.3
-----

* Add an ``?include?`` directive for the `Assignments`_ section.

0.9.4
-----

* Use POSIX::setlocale() instead of relying on PERL5OPT=-C.
  The latter roots in old habits originating in perl(1) evolution
  problems in the years 2002-3 (5.8.0 and 5.8.1).   Time to move on.

* Back to tabulator coding style.  Use 120(+) columns, not 79.

Thanks
======

Thanks to Ian Abbott, who alluded the missing ``<?xcdata?>`` and
``<?frank_include?>`` PIs (at least indirectly through statements on
the ``tz AT iana DOT org`` mailing list).

Thanks to Dave Mitchell from perl-porters for giving the hint on
backtracing that made the fancy recursive regular expression that is
used for MarkLo proof against user syntax misses.  (I never use and
do not really understand these super-fancy extended regular expressions
beyond assertions, but remembered the example from the documentation
and was too lazy to write the hand-driven recursive parser for that,
so i tried it.)

Future Directions
=================

Add a Markdown compatible syntax block.

With automatic parapraphs too many flushs occur, and tagsoup joining is
not yet implemented.

The convertion phase can be parallelized without much (in fact almost
no) effort; add a special ``MAXJOBS`` or similar special ``config.rc``
PI variable to control this, then.

An ``<?eval? PERL-EXPR?>`` could at some time become a regular PI,
that simply evaluates in the *current* context.  Maybe lazy start a
concurrent child and communicate with that for this purpose (pay once
once needed).  If parallelized, one such evalizer for each worker
thread/process.

.. vim:set ft=rst:s-ts-mode
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.