Is there a way to "pretty print" a document without HTML Tidy changing anything except spacing/formatting? That means no new tags or other "non-whitespace changes". I'd like to integrate something like this with my product, CSE HTML Validator (per user request).


  • Rebeccah

    Rebeccah - 2010-10-28

    I second this request. I'm frequently having to compare HTML pages across sandbox/development copies, production copies, and revision control system copies. Tidy is used in conjunction with my file comparison program, to format the files identically so that I can compare their content more effectively.

    For this purpose, I do not need to fix broken HTML (especially if the page does in fact work for its intended users), I just need it all formatted the same.



  • Rebeccah

    Rebeccah - 2010-12-13

    1. Is that even a Windows application? Most Windows apps on SourceForge are not archived as tarballs, and the support documentation usually mentions Windows somewhere.

    2. I'm not looking for a word-by-word diff; a block diff is fine. I just want the HTML or XML to be formatted with indentation so that I can see the context. My current diff program (which I really like, BTW) allows for text formatters/prettifiers to be plugged in, and includes HTML Tidy in the distribution and configured as a plug-in, but because Tidy tries to correct errors, I can't use the built-in merging capabilities at the same time as Tidy or I end up with unintended changes.

    When I get some time, I may rewrite a javascript prettifier that I found online, that does almost exactly what I want, in Java (my hammer) so that I can call it from the commandline and plug it in to my diff program. But I say "almost", because it doesn't include Tidy's ability to sort the attributes.

    My ideal scenario would be to have Tidy have additional commandline options to disable some of the changes that it is making, so that I could keep the attribute sorting and the indentation and the addition of closing tags for tags like <br/> and <img/> that Microsoft normally doesn't close, and leave everything else alone.

  • Geoff

    Geoff - 2016-04-03

    Thanks for the feature request... now so long ago... sorry for the delay...

    Tidy source has moved on to https://github.com/htacg/tidy-html5, site to http://www.html-tidy.org/

    Wow, that is a tall order - no text changes! My first thought would be no. Tidy loads a document into its own DOM like tree of nodes, so the original text is completely lost when it comes to Pretty Printing the node tree. That is particularly true for text, which will have all newlines removed, and additional space reduced to just one... etc, etc...

    One idea for Rebeccah would be to also run tidy on the old document, and compare the two tidied documents. This is sort of what I do when comparing pages on my web site...

    However, if you want to persue this, or find another tidy bug, or feature request, then please file an issue, together with sample html and config used, and if you find, fix, and test the feature in a tidy fork then you can issue a Pull Request. Always appreciated... thanks...

    Tidy needs your support...

    Meantime closing this here as out-of-date...

    PS: Mariusz Pekala, thanks for the WDiff link... this looks interesting, for other reasons...

  • Geoff

    Geoff - 2016-04-03
    • status: open --> closed-out-of-date
    • Group: --> Current - all platforms

