#343 Pretty Print without fixing/changing text


Is there a way to "pretty print" a document without HTML Tidy changing anything except spacing/formatting? That means no new tags or other "non-whitespace changes". I'd like to integrate something like this with my product, CSE HTML Validator (per user request).


  • Rebeccah

    I second this request. I'm frequently having to compare HTML pages across sandbox/development copies, production copies, and revision control system copies. Tidy is used in conjunction with my file comparison program, to format the files identically so that I can compare their content more effectively.

    For this purpose, I do not need to fix broken HTML (especially if the page does in fact work for its intended users), I just need it all formatted the same.



  • @rebeccah: Maybe 'wdiff' would be a tool you are looking for? See: http://www.gnu.org/software/wdiff/

  • Rebeccah

    1. Is that even a Windows application? Most Windows apps on SourceForge are not archived as tarballs, and the support documentation usually mentions Windows somewhere.

    2. I'm not looking for a word-by-word diff; a block diff is fine. I just want the HTML or XML to be formatted with indentation so that I can see the context. My current diff program (which I really like, BTW) allows for text formatters/prettifiers to be plugged in, and includes HTML Tidy in the distribution and configured as a plug-in, but because Tidy tries to correct errors, I can't use the built-in merging capabilities at the same time as Tidy or I end up with unintended changes.

    When I get some time, I may rewrite a javascript prettifier that I found online, that does almost exactly what I want, in Java (my hammer) so that I can call it from the commandline and plug it in to my diff program. But I say "almost", because it doesn't include Tidy's ability to sort the attributes.

    My ideal scenario would be to have Tidy have additional commandline options to disable some of the changes that it is making, so that I could keep the attribute sorting and the indentation and the addition of closing tags for tags like <br/> and <img/> that Microsoft normally doesn't close, and leave everything else alone.