Background: I want to examine the HTML that comes out of my web application
(via a web server) for the purpose of creating an "end-user" regression
test suite that simulates a surfer visiting my site. I want the examination
to be granular and structured, as opposed to relying on plain substrings
which have various problems.
I was contemplating converting my site output to XHTML so that I could suck
up the results with a Python XML parser and go from there. Along the way, I
found a small SGMLParser example that can read an HTML file (well, it was
slightly broken), and starting from there created HTMLTag.py which can read
ordinary HTML and return a Pythonic data structure.
No XHTML conversion needed.
The first version of HTMLTag is in Webware.WebUtils. There is a unit test
in WebUtils/Tests.
This is in pretty good shape for a first release:
* I fed the front page of the WebKit examples to it with no problem.
* The unit tests pass.
* There are thorough doc strings with information and examples.
I imagine the API will expand with more conveniences as I begin to apply
this in my testing.
You can grab this out of Webware, or be sneaky and just grab it directly
out of the CVS web browser if (gasp) you're not hooked into Webware cvs:
http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/Webware/WebUtils/HTMLTag.py?cvsroot=Webware&sortby=date
Feedback is welcome,
-Chuck
|