#555 Tidy not ready for Internationalized Domain Names (IDN)


Tidy does not allow me to make links using
Internationalized Domain Names (IDN), like this:
<a href="http://www.räksmörgĺs.se/">A link</a>

Tidy converts this to:
href="http://www.r%C3%A4ksm%C3%B6rg%C3%A5s.se/">A link</a>

The result is a link that cannot be loaded by Mozilla
or Opera.

My configuration is:
char-encoding: latin1
enclose-text: yes
wrap: 0
quiet: true
output-xhtml: no
add-xml-decl: no
gnu-emacs: true

If I add "fix-uri:false" to my config, Tidy will output
a warning:
Warning: <a> improperly escaped URI reference

I find this warning inappropriate, since the URI is
valid. Tidy should not issue this warning for the
domain-name URI component if the domain is valid
according to the IDN rules.

I have a strict no-warning policy, so when Tidy issues
a warning it causes my publishing machinery to halt. If
I somehow suppress this particular warning I run the
risk of ignoring cases when other parts of a URI are
truly illegal.


  • Per Ångström

    Per Ångström - 2004-03-28

    Test case

  • Björn Höhrmann

  • Björn Höhrmann

  • Geoff

    Geoff - 2016-04-01

    Thanks for the report... now so long ago... sorry for the delay...

    Tidy source has moved on to https://github.com/htacg/tidy-html5, site to http://www.html-tidy.org/

    There has been some discussion on this. The most recent and closest to this is Issue #378.

    At present tidy will escape such characters unless you add a config --fix-uri no. Of course, for your idn-test.html sample you need to also add --char-encoding latin1, since modern tidy defaults to utf-8.

    However, if you want to persue this, or find another tidy bug, or feature request, then please file an issue, together with sample html and config used, and if you find, fix, and test the feature in a tidy fork then you can issue a Pull Request. Always appreciated... thanks...

    Tidy needs your support...

    Meantime closing this here as out-of-date...

  • Geoff

    Geoff - 2016-04-01
