Menu

#17 Too much white-space in subcomponent/method output.

closed-invalid
nobody
compiler (3)
1
2015-01-14
2005-09-13
No

This is more of an anoying behavior than a bug.

The compiler inserts (probably) unintended whitespace
in subcomponents and methods.

Here's an example:
<%flags>
inherit = None
</%flags>
<%def mycomp>
% m.write("Binary data")
</%def>
<& mycomp &>

which produces "\nBinary data" (note the leading "\n").

This behavior is particularly troublesome in
subcomenents/methods which are producing binary data
(say a GIF image).

It can be worked around using m.clear_buffer() (which,
btw, does not seem to be mentioned in the
documentation.) (Using the trim attribute is not a
safe workaround for subcomponents which produce binary
data, since the binary data may include white-space
characters.)

My guess as to the best fix is to make the lexer ignore
all white-space which is not contiguous with either
some non-white-space text, or a substitution or
component call.

Discussion

  • Geoffrey T. Dairiki

    • priority: 5 --> 1
     
  • Mike Bayer

    Mike Bayer - 2005-09-13

    Logged In: YES
    user_id=1100624

    try adding a backslash like so:

    <%flags>
    inherit = None
    </%flags>
    <%def mycomp>\ % m.write("Binary data")
    </%def>
    <& mycomp &>

    backslashes are used to escape out excess newlines.

    also what works equally well here is the "trim" flag:

    <%flags>
    inherit = None
    </%flags>
    <%def mycomp trim="both">
    % m.write("Binary data")
    </%def>
    <& mycomp &>

    which filters whitespace from the component output.

    that said, why use a markup template to output a GIF ?
    better to write to the stream directly from a regular
    module, or at least a template that is only an <%init>
    section no ?

     
  • Mike Bayer

    Mike Bayer - 2005-09-13
    • status: open --> closed-invalid
     
  • Geoffrey T. Dairiki

    Logged In: YES
    user_id=45814

    Okay, the backslash works. Thanks for the tip.

    The problem re-appears however if one uses a python block:

    <%def mycomp> \ <%python>
    m.write("Binary data")
    </%python>
    </%def

    (The back-slash hides the newline, but not the indent on the
    next line. Of course, one could get around this by not
    indenting the %python block. OTOH, it seems clear that in
    code like the above, no output white-space is intended Not
    also, that in the current code there is an assymetry in that
    the white space between the <%def> and <%script> tags gets
    output, while the white-space between the </%script> and
    </%def> tags does not.)

    There are many times besides generating GIFs, when extra
    white-space from a template may be significant. Some more
    examples are when generating content of a <script>, or even
    a <pre> tag.

    The unintended white-space can also bite when when a
    subcomponent or method is intended to return a value rather
    than output it. Consider:

    <%def get_url>
    <%python>
    return "http://some/url"
    </%python>
    </%def>
    <a href="<% m.comp('get_url') | h %>">a link</a>

    which results in surprising white space within the href
    attribute. Yes, once you figure out what's going on, you
    can work around it, but this behavior does seem to violate
    the "principle of least surprise."

    It seems like a relatively simple fix to ignore whitespace
    which is bounded on both ends by either block tags or python
    code lines. I also can't imagine that making this change
    will break much existing code.

    (Trim="both" is not safe for binary data, since the binary
    data may start with white-space characters.)

     
  • Mike Bayer

    Mike Bayer - 2005-09-13

    Logged In: YES
    user_id=1100624

    you dont have to indent the <%python> tag:

    <%def get_url>
    <%python>
    return "http://some/url"
    </%python>
    </%def>
    <a href="<% m.comp('get_url') | h %>">a link</a>

    also, if a component is just going to return a value and not
    have any content output, use an <%init> tag instead:

    <%def get_url>
    <%init>
    return "http://some/url"
    </%init>
    </%def>
    <a href="<% m.comp('get_url') | h %>">a link</a>

    You probably should be using <%init> for binary content,
    combined with some buffer-clearing stuff I found in the
    Mason FAQ. I generally defer to Mason on issues like these,
    and the FAQ makes similar suggestions, including a future
    task for a "trim leading and trailing whitespace" option,
    which I have already provided via the "trim" tag. For
    binary files it makes the suggestion to use clear_buffer()
    combined with abort():

    http://www.masonhq.com/?FAQ:Components#h-why_does_my_output_have_extra_newlines_whitespace_and_how_can_i_get_rid_of_it_
    http://www.masonhq.com/?FAQ:Components#h-i_m_trying_to_generate_an_image_or_other_binary_file__but_it_seems_to_be_getting_corrupted_

    This also works in Myghty and looks like this:

    <%def mycomp>
    <%init>
    m.clear_buffer()
    m.write("Binary data")
    m.abort()
    </%init>
    </%def>

    As far as modifying Lexer, Im not thrilled about changing
    the code's default behavior, because you are asking the
    template engine to make arbitrary decisions about the
    content inside of tags, without specification by the user.
    whereas the backslash provides an explicit way to indicate a
    particular newline character inside the tags should not be
    displayed. its one simple way to trim out newlines
    anywhere, not just around tags but anywhere else within
    content where some code might need to linebreak but the
    output does not.

    I would consider making it an optional parameter to Lexer if
    you provided a working patch, and I would most prefer it in
    the form of a subclass or "mixin" of Lexer since youd
    probably have to rewrite several very large regular
    expressions that are better placed in some other file, thus
    leaving the original Lexer pretty much unchanged. As there
    is no test suite for the Lexer right now, and there are a
    *lot* of potential errors when changing the regexps, the
    parameter would be "at the users risk" for a long time
    unless a decent test suite could be built up.

     
  • Geoffrey T. Dairiki

     
  • Geoffrey T. Dairiki

    Logged In: YES
    user_id=45814

    Using the <%init> block instead of the <%python> block does
    make me happier.

    > you dont have to indent the <%python> tag

    Yes, but what if I want to? I find it quite
    counter-intuitive that merely indenting the <%python> block
    causes my code to crap-out.

    The fact that Mason has the same behavior is, I will
    concded, a pretty strong argument for leaving things the way
    they are.

    I lstarted to look into modifying the lexer and got scared.
    So I regrouped and have now hacked up a modified version
    of the stock compiler which seems to do the trick. I've
    attached it below, if you'd like to take a look.