An (x)html file with non-compliant characters inside the script tag usually has its script contents enclosed in CDATA. However, when an existing CDATA appears inside the script tag the garbage persists, and the output is not valid XML.
I ran this with tidy -asxml -output /tmp/clean2.html ~/www/dirty2.html
version: HTML Tidy for Linux released on 7 December 2008
hopefully this is an old bug. I apologize for not testing on the latest version.