HtmlCleaner / Bugs / #207 Issue with empty tag not closed using '/>'

#207 Issue with empty tag not closed using '/>'

Milestone: v2.30

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2023-06-19

Created: 2018-07-16

Creator: Anthony Pessy

Private: No

I had an issue with an HTML where there was links in body such as:

<body>
<link rel="shortcut icon" href="/favicon.ico">
</body>

Notice that it is closed by '>' and not '/>'.

This messes up the parsing of the HTML completely.

I fixed this by updating the HtmlTokenizer like this:

In tagStart:

Keep the opened TagInfo in outer scope:

        TagInfo tagInfo = null;

        if (tagName != null) {
            ITagInfoProvider tagInfoProvider = cleaner.getTagInfoProvider();
            tagInfo = tagInfoProvider.getTagInfo(tagName);
            if ( (tagInfo == null && !props.isOmitUnknownTags() && props.isTreatUnknownTagsAsContent() && !isReservedTag(tagName) && !props.isNamespacesAware()) ||
                 (tagInfo != null && tagInfo.isDeprecated() && !props.isOmitDeprecatedTags() && props.isTreatDeprecatedTagsAsContent()) ) {
                content();
                return;
            }
        }

Do not open a special context for empty tags, direclty inject an EndTagToken:

            if ( isChar('>') ) {
                go();
                if (tagInfo == null || !tagInfo.isEmptyTag()) {
                    if (props.isUseCdataFor(tagName)) {
                        _isSpecialContext = true;
                        _isSpecialContextName = tagName;
                    }
                } else {
                    addToken(new EndTagToken(tagName));
                }
            } else if ( startsWith("/>") ) {
                go(2);
                //
                // If the tag is self-closing, add an end tag token here to avoid
                // encapsulating the following content. See issue #93.
                //
                addToken(new EndTagToken(tagName));
            }

Discussion

Scott Wilson - 2018-07-24

Thanks Anthony,

I'll apply your patch and rerun the test cases.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2019-09-04

Group: v 2.7 --> v2.24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2020-04-29

Group: v2.24 --> v2.25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2021-09-24

Group: v2.25 --> v2.26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2023-04-29

Group: v2.26 --> v2.29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Wilson - 2023-06-19

Group: v2.29 --> v2.30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Issue with empty tag not closed using '/>'

Group

Searches

Help

#207 Issue with empty tag not closed using '/>'

Discussion