Originally this looked like a problem with the lexer
handling attribute values. I even wrote a test case to
replicate it:
/**
* Bug #1465471 Problem with \" in meta tags
* @throws ParserException
*/
public void testEscapedQuoteInContent () throws
ParserException
{
String content = "This is test data \\\" and more
test.\\\" ha";
createParser(
"<meta name=\"description\" content=\"" +
content + "\"/>"
);
parseAndAssertNodeCount (1);
assertType ("meta tag", MetaTag.class, node[0]);
MetaTag metaTag = (MetaTag)node[0];
assertStringEquals(
"meta content",
content,
metaTag.getMetaContent ()
);
}
However, the escaping of double quotes with a back-slash is
not valid HTML. If double quotes are to be included in a
value, either of the character entities " or " can
be used, see: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
The value of the content attribute for the META tag is
CDATA, described by: http://www.w3.org/TR/html4/types.html#type-cdata
which has additional constraints on the interpretation of
values, but does not alter the handling of quotes.
This is rejected, unless the submitter would like to raise a
RFE to handle bogus attribute values.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Logged In: YES
user_id=605407
Originally this looked like a problem with the lexer
handling attribute values. I even wrote a test case to
replicate it:
/**
* Bug #1465471 Problem with \" in meta tags
* @throws ParserException
*/
public void testEscapedQuoteInContent () throws
ParserException
{
String content = "This is test data \\\" and more
test.\\\" ha";
createParser(
"<meta name=\"description\" content=\"" +
content + "\"/>"
);
parseAndAssertNodeCount (1);
assertType ("meta tag", MetaTag.class, node[0]);
MetaTag metaTag = (MetaTag)node[0];
assertStringEquals(
"meta content",
content,
metaTag.getMetaContent ()
);
}
However, the escaping of double quotes with a back-slash is
not valid HTML. If double quotes are to be included in a
value, either of the character entities " or " can
be used, see:
http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
The value of the content attribute for the META tag is
CDATA, described by:
http://www.w3.org/TR/html4/types.html#type-cdata
which has additional constraints on the interpretation of
values, but does not alter the handling of quotes.
This is rejected, unless the submitter would like to raise a
RFE to handle bogus attribute values.