Menu

#5 HTMLDOM cannot correclty parse html element without proper closing tag

1.0
open
nobody
bug (2)
2014-09-03
2014-09-03
Anonymous
No

Htmldom cannot parse tags which aren't closed properly.
For example:

<AREA SHAPE="RECT" COORDS="2,2,95,30" HREF="../index.shtml" alt="Home">

According to html standats it is acceptable, but htmldom fails to correctly parse it.

For example:

from htmldom import htmldom
dom = htmldom.HtmlDom().createDom("""
<body>
<MAP NAME="top_nav_map">
    <AREA SHAPE="RECT" COORDS="2,2,95,30" HREF="../index.shtml" alt="Home">
    <AREA SHAPE="RECT" COORDS="99,2,220,30" HREF="../Components.shtml" alt="Components">
    <AREA SHAPE="RECT" COORDS="224,2,319,30" HREF="../HardwareMain.shtml" alt="Hardware">
    <AREA SHAPE="RECT" COORDS="324,2,402,30" HREF="../Boards.shtml" alt="Boards">
    <AREA SHAPE="RECT" COORDS="406,2,477,30" HREF="../BooksMain.shtml" alt="Books">
    <AREA SHAPE="RECT" COORDS="482,2,535,30" HREF="../Kits.shtml" alt="Kits">
</MAP>
<h1>Hello</h1>
</body>
""")
table = dom.find("body")
print(table.html())

This code print:

<body>
    <map NAME="top_nav_map">
        <area COORDS="2,2,95,30" SHAPE="RECT" alt="Home" HREF="../index.shtml">
            <area COORDS="99,2,220,30" SHAPE="RECT" alt="Components" HREF="../Components.shtml">
                <area COORDS="224,2,319,30" SHAPE="RECT" alt="Hardware" HREF="../HardwareMain.shtml">
                    <area COORDS="324,2,402,30" SHAPE="RECT" alt="Boards" HREF="../Boards.shtml">
                        <area COORDS="406,2,477,30" SHAPE="RECT" alt="Books" HREF="../BooksMain.shtml">
                            <area COORDS="482,2,535,30" SHAPE="RECT" alt="Kits" HREF="../Kits.shtml">
                            </area>
                            <h1>
                                Hello
                            </h1>
                        </area>
                    </area>
                </area>
            </area>
        </area>
    </map>
</body>

Discussion

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB