Menu

#4711 regsub example doesn't perform as advertized

obsolete: 8.6b1
closed-fixed
5
2010-09-10
2010-09-10
Jasper
No

It's this one:

Convert all non-ASCII and Tcl-significant characters into \u escape sequences by using regsub and subst in combination:

# This RE is just a character class for everything "bad"
set RE {[][{};#\\\$\s\u0080-\uffff]}

# We will substitute with a fragment of Tcl script in brackets
set substitution {[format \\\\u%04x [scan "\\&" %c]]}

# Now we apply the substitution to get a subst-string that
# will perform the computational parts of the conversion.
set quoted [subst [regsub -all $RE $string $substitution]]

===========================================
The RE for "bad" characters includes \s which matches all whitespace characters including newline. However, inserting \ before newline before calling subst on a string containing it does not preserve the newline, it causes it to be replaced by a space, so the whole procedure replaces newlines with \u0020

Not sure how to get this to work properly! Little help?

Discussion

  • Donal K. Fellows

    Good catch. Newlines need to be handled specially. :-(

     
  • Donal K. Fellows

    • assigned_to: pvgoran --> dkf
     
  • Donal K. Fellows

    Updated example text is as below.

    # This RE is just a character class for almost everything "bad"
    set RE {[][{};#\\\$ \r\t\u0080‐\uffff]}

    # We will substitute with a fragment of Tcl script in brackets
    set substitution {[format \\\\u%04x [scan "\\&" %c]]}

    # Now we apply the substitution to get a subst‐string that
    # will perform the computational parts of the conversion. Note
    # that newline is handled specially through string map since
    # backslash‐newline is a special sequence.
    set quoted [subst [string map {\n {\\u000a}} \ [regsub -all $RE $string $substitution]]]

     
  • Donal K. Fellows

    Fixed in HEAD and 8.5 branch (alas, just missed the train for 8.5.9...)

     
  • Donal K. Fellows

    • status: open --> closed-fixed