Gavin Lee - 2006-12-16

I'm trying to write a python application that parses the XML output of qstat but i'm getting parse errors on certain servers because of "invalid characters" or "invalid token" inside some tags.

To test i used the "-of" switch of qstat and ran xmlwf on the output files.
I managed to narrow it down to a couple of servers where my app is choking, I get different errors from when using the "-utf8" switch.
The errors i see in my app are the same as the errors in xmlwf, i think that is because they both use expat to parse the xml.

error from xmlwf using "iso-8859-1" on server Q3S 213.219.238.20:30720
qstat.xml:21:30: reference to invalid character number

the element:
<rule name="sv_hostname">-=&#1; www.ROCKETARENA.de &#1;=- RA3-Server</rule>

error from xmlwf using same server but in "utf8":
qstat.xml:5:10: not well-formed (invalid token)

this time it's the a different element:
<name>-=^A www.ROCKETARENA.de ^A=- RA3-Server</name>

error from xmlwf using a different server in "iso-8859-1":
qstat.xml:22:29: reference to invalid character number

the element:
<rule name="sv_hostname"> &#2; ^2P^7laneta ^2M^7oscow</rule>

different element again but in "utf8":
<name> ^B ^2P^7laneta ^2M^7oscow</name>

Sorry for the long post but i can't figure out how to work around this in python, Could this be a bug in qstat xml output?

Best regards
Gavin