From: Jorge A. C. <jc...@ne...> - 2000-12-23 13:02:30
|
Hi everyone! Once upon a time ;-) there was BUG#95 (Entities parsing within URLs) and now there's BUG#114 (basically the same), and a lot of email and doc-research had been done. The fact is somewhat complicated and although I explained it before, when re-reading my post I found it less clear to follow than when writing it. I hope this new explanation to be better... * The URL (URI) RFCs don't say a word about entities. The only escaping mechanism allowed there is hexadecimal (i.e. %xx sequences). So according to this source, they MUST NOT be parsed. * Up to HTML 3.2 this wasn't an issue, so no clue was found here (AFAIK). * When HTML (4.0 or 4.1) was reformulated as a SGML application, the entities-parsing problem sprung-in. It wasn't there until this point. SGML say they MUST be parsed. And the URL rfc says that characters MUST be escaped according to the context in which they're used. So, URLs may have entities in them, and we should parse them. This problem is BUG#114. Jorge.- |