Anonymous
-
2014-03-19
Post awaiting moderation.
Hello,
I'm having unexpected trouble while parsing HTML code when the value-part for an element attribute is either omitted or is not wrapped in quotes. While I'm not completely certain that it's a bug, it's not obvious to me that it's not a bug.
Code:
from htmldom import htmldom htmlInput = """<form> <select name="country_code"> <option value="GB" selected>United Kingdom</option> <option value="AL">Albania</option> </select> </form>""" dom = htmldom.HtmlDom().createDom(htmlInput) form = dom.find("form") print("Countries:") for option in form.find("select[name=country_code] > option"): value = option.attr("value") text = option.text() print(" {0} = {1}".format(value, text))
Output:
Countries: AL = selected>United Kingdom AL = Albania
I expected the first element to have the value "GB" and the text "United Kingdom".
If I do this:
<option value="GB" selected>United Kingdom</option>
or
<option value="GB" selected=>United Kingdom</option>
or
<option value="GB" selected=something>United Kingdom</option>
then the issue exists.
However, if I do this:
<option value="GB" selected="">United Kingdom</option>
or
<option value="GB">United Kingdom</option>
then I get the expected result.
Thank you for any help you can provide!
Anonymous