Menu

#207 Issue with empty tag not closed using '/>'

v2.30
open
nobody
None
5
2023-06-19
2018-07-16
No

I had an issue with an HTML where there was links in body such as:

<body>
<link rel="shortcut icon" href="/favicon.ico">
</body>

Notice that it is closed by '>' and not '/>'.

This messes up the parsing of the HTML completely.

I fixed this by updating the HtmlTokenizer like this:

In tagStart:

Keep the opened TagInfo in outer scope:

        TagInfo tagInfo = null;

        if (tagName != null) {
            ITagInfoProvider tagInfoProvider = cleaner.getTagInfoProvider();
            tagInfo = tagInfoProvider.getTagInfo(tagName);
            if ( (tagInfo == null && !props.isOmitUnknownTags() && props.isTreatUnknownTagsAsContent() && !isReservedTag(tagName) && !props.isNamespacesAware()) ||
                 (tagInfo != null && tagInfo.isDeprecated() && !props.isOmitDeprecatedTags() && props.isTreatDeprecatedTagsAsContent()) ) {
                content();
                return;
            }
        }

Do not open a special context for empty tags, direclty inject an EndTagToken:

            if ( isChar('>') ) {
                go();
                if (tagInfo == null || !tagInfo.isEmptyTag()) {
                    if (props.isUseCdataFor(tagName)) {
                        _isSpecialContext = true;
                        _isSpecialContextName = tagName;
                    }
                } else {
                    addToken(new EndTagToken(tagName));
                }
            } else if ( startsWith("/>") ) {
                go(2);
                //
                // If the tag is self-closing, add an end tag token here to avoid
                // encapsulating the following content. See issue #93.
                //
                addToken(new EndTagToken(tagName));
            }

Discussion

  • Scott Wilson

    Scott Wilson - 2018-07-24

    Thanks Anthony,

    I'll apply your patch and rerun the test cases.

     
  • Scott Wilson

    Scott Wilson - 2019-09-04
    • Group: v 2.7 --> v2.24
     
  • Scott Wilson

    Scott Wilson - 2020-04-29
    • Group: v2.24 --> v2.25
     
  • Scott Wilson

    Scott Wilson - 2021-09-24
    • Group: v2.25 --> v2.26
     
  • Scott Wilson

    Scott Wilson - 2023-04-29
    • Group: v2.26 --> v2.29
     
  • Scott Wilson

    Scott Wilson - 2023-06-19
    • Group: v2.29 --> v2.30
     

Log in to post a comment.

MongoDB Logo MongoDB