Share

HTML Tidy

Tracker: Bugs

5 Multiple URLs in profile incorrectly modified - ID: 1264455
Last Update: Comment added ( arnaud02 )

Multiple URLs in the profile attribute of a head
element are incorrectly converted into a single URL
with escaped whitespace, for example

<head profile="http://gmpg.org/xfn/11
http://dublincore.org/documents/dcq-html/">

becomes

<head
profile="http://gmpg.org/xfn/11%20http://dublincore.org/documents/dcq-html/
">

and a warning is issued:
line 2 column 1 - Warning: <head> escaping malformed
URI reference

HTML 4.01 reference:
http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.1

profile = uri [CT]
This attribute specifies the location of one or
more meta data profiles, separated by white space. For
future extensions, user agents should consider the
value to be a list even though this specification only
considers the first URI to be significant.


Klaus Johannes Rusch ( krusch ) - 2005-08-19 23:08

5

Open

Wont Fix

Björn Höhrmann

HTML/XHTML Parser

Current - all platforms

Public


Comments ( 10 )

Date: 2007-01-04 21:10
Sender: arnaud02Project Admin


See query January 2007:
http://sourceforge.net/mailarchive/forum.php?thread_id=31313699&forum_id=1650

A possible solution is to make fix-uri an AutoBool defaulting to "Auto"
which "Auto" is identical to "yes" except for the profile attribute for the
head element. To me, it is yet another adaptation to make life of users
easier.


Date: 2006-02-25 15:44
Sender: nobody

Logged In: NO

http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.1


Date: 2005-08-21 21:02
Sender: krusch

Logged In: YES
user_id=365576

Yes I make assumptions based on typical usage of the profile
attribute. All profiles I have seen -- things like Dublin
Core, XFN -- and I extend that to assume that it applies to
the majority of profiles use URIs without encoded whitespace.

--fix-uri applies to all URIs, not to the specific case of
the profile attribute which allows, depending on your
reading, multiple values whereas all other attributes that
take a URI do not.


Date: 2005-08-21 19:52
Sender: hoehrmannProject AdminAccepting Donations

Logged In: YES
user_id=188003

You make assumptions about what the author had in mind,
what if it's profile="http://example.org/profile no 2"? And
again, you can configure this using the --fix-uri option.


Date: 2005-08-21 12:11
Sender: krusch

Logged In: YES
user_id=365576

I agree there is no fix that will please everyone given the
ambiguity in the standard.

The current behaviour, however, is the only one of the three
options that results in a broken profile URL that a
conforming user agent cannot use any more (a user agent
would interpret A B as A, would interpret A as A, but cannot
interpret A%20B correctly).



Date: 2005-08-20 23:43
Sender: hoehrmannProject AdminAccepting Donations

Logged In: YES
user_id=188003

Well, I don't think changing it to profile="a" helps here,
we'd still get bug reports for considering profile="a b"
beeing considered an error, and if we change Tidy to allow
it we would get bug reports for not considering an error.


Date: 2005-08-20 23:17
Sender: krusch

Logged In: YES
user_id=365576


Input (valid according to the wording of the HTML 4.01
specification but not matching the profile=uri pattern)
<head profile="A B">

Intended interpretation according to the wording of the HTML
4.01 specification (only the first value is honored):
<head profile="A">

Yes you can find arguments for both <head profile="A"> and
<head profile="A B"> in the HTML specification, which
obviously is not consistent here and it is unlikely that an
agreement will be reached.
With the interpretation instructions that only the first
value is currently honored and additional values are
reserved for future use, dropping the extra URIs would be a
reasonable though and not change the behaviour of a
conforming user agent.

<head profile="A%20B"> neither matches the intent of the
document author, nor the interpretation instructions in the
specification.



Date: 2005-08-20 18:06
Sender: hoehrmannProject AdminAccepting Donations

Logged In: YES
user_id=188003

Well, this specific issue came up many many times in the
past, my conclusion is that we don't do anything about it
until the HTML WGs gets around to clarify their specs. You
can easily avoid Tidy's behavior by using the --fix-uri
option.


Date: 2005-08-20 14:06
Sender: krusch

Logged In: YES
user_id=365576


The description does indicate though that while only one URI
is currently considered significant more than one can be
provided.

Stripping additional URLs might be reasonable since that is
what browsers are supposed to do as well, and what the DTD
and later specifications support as well, converting "url1
url2" to "url1%20url2" is not since the result is a URL that
is formally correct but no longer valid.

With the restriction to one and the lack of a namespace-like
mapping of eta elements to profile definition the profile
attribute looks pretty useless anyway but at least tidy
should not break the profile definition.



Date: 2005-08-19 23:15
Sender: hoehrmannProject AdminAccepting Donations

Logged In: YES
user_id=188003

It says "profile = uri" and that's what's been implemented
in Tidy ever since. This is a known problem with the HTML
and XHTML specifications. I'm happy to change Tidy's
behavior once the W3C clarifies this issue. I think it's
pretty clear that in HTML 4.01 and document types that
build on it only a single URI is allowed and the above just
describes error handling behavior. This is quite evident
from even recent specifications like the M12N in XML Schema
where the anyURI type is used here instead of list types.


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
close_date 2005-08-19 23:15 2005-08-20 14:06 krusch
status_id Pending 2005-08-20 14:05 krusch
close_date - 2005-08-19 23:15 hoehrmann
status_id Open 2005-08-19 23:15 hoehrmann
resolution_id None 2005-08-19 23:15 hoehrmann
assigned_to nobody 2005-08-19 23:15 hoehrmann