Re: [tclwebtest] HTML entities should be substituted in urls
Status: Abandoned
Brought to you by:
tils
From: Grzegorz A. H. <gr...@ef...> - 2003-02-25 15:28:03
|
On Tue, Feb 25, 2003 at 09:12:02AM +0000, Tilmann Singer wrote: > > http://validator.w3.org/docs/errors.html#bad-entity > > I don't find any information on this topic in this link, Well, the first list item states that using the character '&' inside an href is invalid HTML, and should be substituted with '&'. If you go to slashdot and take a look at the source, they use this '&' in their url, and the browser is meant to substitute '&' with '&' in the http request. Another explanation here: http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html#h-B.2.2 I've experimented explicitly using '&' in a manual http request, and slashdot still recognises this and parses correctly the parameter separator. However, this doesn't happen with our aolserver: if I request http://something?q=bla&t=bla, the server will answer that the parameter 't' wasn't found, probably because aolserver is detecting 'amp;t' as the next parameter. Without the '&' our pages are invalid HTML. With it (and without the commited patch) tclwebtest doesn't work on our servers. Maybe using translate_entities is overkill because only a few have to be replaced (IIRC >,< and &). > but shouldn't links be URL-encoded instead - e.g. " " becomes > %20 etc.? In my opinion this should be done automatically by > the tcl http package, but it isn't done - just checked it. It > is done automatically by a browser (checked it with Konqueror: > http://localhost:9000/Some Url/ becomes "GET /Some%20Url/ > HTTP/1.1"). AFAIK this is the standard escape sequence for ASCII characters below 33 and over 126, and it's an additional substitution tclwebtest could perform, though I still haven't been biten by this case. -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |