From: Bernd E. <eid...@we...> - 2006-07-12 08:28:25
|
Hi, maybe you can give me a hint whats going on... Background: My app uses a registered filter and proc to handle requests to files. ADP files (UTF-8) are ns_adp_parse'd (including data from database, UTF-8) and ns_returned (w/o -binary). I don't do any mapping of ADP files via config, additionally I have ns_unregister_adp calls on GET, HEAD, POST. My default config says: -------------------------------------------- section ns/encodings .adp: utf-8 .html: utf-8 section ns/mimetypes .adp: text/html; charset=utf-8 .html: text/html; charset=utf-8 section ns/parameters outputcharset: utf-8 urlcharset: utf-8 preferredcharsets: utf-8 -------------------------------------------- This szenario works. Now, if I want to automatically change the output encoding from utf-8 to let's say iso-8859-15, I try to do it like this: -------------------------------------------- section ns/parameters outputcharset: iso8859-15 section ns/mimetypes .adp: text/html; charset=iso-8859-15 .html: text/html; charset=iso-8859-15 -------------------------------------------- In my test case, 'string length' on the parsed adp string gives me 7109 bytes, 'string bytelength' 7147 bytes, in the Header 'Content-Length' is 7147 and wget stops after byte 7109 (e.g. Opera requests the page twice, haha, I lost one day to figure out why): string length: 7109 bytes = bytes returned string bytelength: 7147 = Content-Length header If I now 'ns_return -binary' those 7109-Tcl-ByteArray-Bytes and request them (7109bytes + Content-length 7109) via wget (that works now, and Opera is happy again), I can recode iso-8859-15..utf-8 testpage.html ("hey, recode, assume it's iso, transform it to utf-8") but not recode utf8..iso-8859-15 testpage.html. (the testpage.html is created from the parsed adp). I agree, I'm confused. Where to look for the bug? In my configuration/app or in the server? Of course, using -binary switch is not the proper solution. Thanks! Bernd. |
From: Michael L. <mic...@gm...> - 2006-08-08 09:40:33
|
I'm new to Naviserver and I'm not quite sure about the intended use of ns_conn encoding. There are three situation where it is possible to change the encoding by hand: 1. When using ns_write, you can set the encoding with ns_startcontent -type. You could also use ns_conn encoding, but ns_startcontent was designed only for this purpose, so i think it would be better to use this function. 2. Inside adp-files one can use ns_conn encoding. But it is necessary to change the Content-Type-Header explicitly. The function ns_adp_mimetype will set the Content-Header and change the encoding, so the use of this function is easier and less dangerous. 3. ns_return expects a type-argument which will be used to create the Content-Type-Header. If you specify a charset, naviserver will automatically load the corresponding tcl-encoding. So here you have also no need to change the encoding with ns_conn encoding (btw if you do so, the result is very confusing). For me it looks like the use cases for ns_conn encoding are rare, e.g. if you use it in a context where the client knows exactly what encoding to expect. The documentation should mention the above alternatives and warn not to use ns_conn encoding if you "don't know all side effects." Michael |
From: Zoran V. <zv...@ar...> - 2006-08-14 14:48:44
|
On 08.08.2006, at 11:40, Michael Lex wrote: > For me it looks like the use cases for ns_conn encoding are rare, e.g. > if you use it in a context where the client knows exactly what > encoding to > expect. The documentation should mention the above alternatives and > warn not > to use ns_conn encoding if you "don't know all side effects." Lets put it this way: if you were to write all that from scratch, what would you do? Or, if you were allowed to revamp that (existing) interface(s) what would you remove/add? To be honest, we always serve utf-8 and never had any need to change the encodings, hence I (up to today) largely avoided to look at that code... But I see it deserves some cleanup. Cheers Zoran |
From: Michael L. <mic...@gm...> - 2006-08-14 16:30:11
|
What I would do is remove ns_conn encoding or make it read-only. It is not really necessary as it can be replaced by ns_startcontent or ns_adp_mimetype and is (largely) ignored by ns_return. And it confuses programmers (like me). The problem is: backwards compatibility. Sry ... I forgot a possible use: When you want to change the encoding while sending streamed content (adp-file or ns_write), you have to use ns_conn encoding. So if it is not possible (or sensible) to remove the function, i would be grateful if there was a hint in the (future) documentation, that one should be VERY careful with the use of ns_conn encoding and that ns_return is completely independant of anything you set with ns_conn encoding (if there's a default OutputCharset). |
From: Zoran V. <zv...@ar...> - 2006-08-14 16:44:20
|
On 14.08.2006, at 18:30, Michael Lex wrote: > ns_conn encoding and that > ns_return is completely independant of anything you set with ns_conn > encoding (if there's a default OutputCharset) Hmmm??? Is this really so? I mean, I would expect the [ns_return] to ignore any optional encoding stuff and delegate all to [ns_conn] in the similar way how you use channels in Tcl. After all, you do not [puts] with an encoding. You just [puts] and the [fconfigure channel -encoding] sets the channel to the desired encoding. I mean, If I wrote that, I'd do it so. You will have to give me some hints on how you use all those commands. What we do is just [ns_return] and never fiddle with that -binary switch (hey, I didn't even know it existed). Also, we never use [ns_conn] to manipulate any encoding so everthing is "default" and just works. So we never had to mingle with encodings (thankfully). Cheers Zoran |
From: Michael L. <mic...@gm...> - 2006-08-14 17:11:25
|
> Is this really so? I mean, I would expect the [ns_return] > to ignore any optional encoding stuff and delegate all > to [ns_conn] in the similar way how you use channels in Tcl. Encoding set via [ns_conn] is only respected by [ns_return] if no OutputCharset is defined and no charset is defined with the type-argument of [ns_return]. > After all, you do not [puts] with an encoding. You just [puts] > and the [fconfigure channel -encoding] sets the channel to the > desired encoding. > I mean, If I wrote that, I'd do it so. I'd do it like this, too. And in fact this is why I was so confused about [ns_conn encoding]. > You will have to give me some hints on how you use all > those commands. We don't use them at the moment. We set the encoding(s) in the config and with [ns_return]. Bernd asked me to write encoding-tests and thats why I stumbled over the [ns_conn] command. |
From: Zoran V. <zv...@ar...> - 2006-08-14 17:22:02
|
On 14.08.2006, at 19:11, Michael Lex wrote: > > Encoding set via [ns_conn] is only respected by [ns_return] if no > OutputCharset is defined and no charset is defined with the > type-argument of [ns_return]. Strange! I see that there are too many knobs to tweak! We must reduce that, absolutely. I mean this whole thing is complicated enough per-se w/o us allowing the people to turn just about every place upside down, leading to absolute confusion (the state that I'm now in). I will have to sit for a while and check this in detail as I do not think that it must be that complicated and versatile. Only thing you need to set is either the default encodings (per config file) and *eventually* be able to override that at runtime, preferably *only* using the [ns_conn]. This would make sense to me. Somebody has (yet) to persuade me that this is not enough! BTW, thank you very much for poking into this can or worms... I believe we will have to remove all those worms or throw away the can and get us a new one! Cheers Zoran |
From: Bernd E. <eid...@we...> - 2006-07-12 13:21:23
|
Hi, it looks like the mechanism of changing the encoding works as expected, but the computation of the correct contentlength not. I downloaded the same page with 'ns_return', one time UTF-8 and one time ISO-8859-15. The download of the latter stops at the correct number of bytes but the Content-Length-Header is larger (the "UTF-8" size), so wget barfs, Opera reloads page etc. (The "recoding" is then only a test of the encoding. And see, recoding the UTF-8 page to ISO results in the same byte length where the wget download stops aka. "breaks", the "ISO-content-length"). Maybe this came with adding the -binary switch? tclresp.c: if (binary) { data = (char *) Tcl_GetByteArrayFromObj(dataObj, &len); result = Ns_ConnReturnData(conn, status, data, len, type); } else { data = Tcl_GetStringFromObj(dataObj, &len); result = Ns_ConnReturnCharData(conn, status, data, len, type); } |
From: Bernd E. <eid...@we...> - 2006-07-12 16:19:12
|
Am Mittwoch, 12. Juli 2006 15:23 schrieb Bernd Eidenschink: > it looks like the mechanism of changing the encoding works as expected, but > the computation of the correct contentlength not. ok. I think this is going on: 1. NsTclReturnObjCmd If ns_return is called _without_ -binary switch, Ns_ConnReturnCharData is called. The length is computed from the UTF-8 representation. 2. Ns_ConnReturnCharData calls 3. ReturnCharData Knows what output encoding to use. If not UTF-8, it will call Ns_WriteCharConn ...BUT BEFORE... 4. Ns_ConnSetRequiredHeaders calls Ns_ConnSetLengthHeader (Set the Content-Length output header) for 5. Ns_ConnQueueHeaders calls NS_ConnConstructHeaders where "Update the response length value directly from the header to be sent, i.e., don't trust programmers" So the content-length is set. NOW: 7. Ns_WriteCharConn is called, calls Ns_ConnWriteChars that with the help of Ns_ConnWriteVChars encodes with the help of "Tcl_UtfToExternal" to the final encoding. I can be fundamentally wrong, but it seems to me like the whatever-it-will-become final encoding length should be computed earlier... What do you think? Bernd. |
From: Vlad S. <vl...@cr...> - 2006-07-13 02:03:46
|
I think the only solution here will be to use Chunked-Encoding to output encoded content, in this case Content-Length should be removed fromthe headers. Ns_ConnWriteVChars handles that, so it shoudl be chnaged i guess. Bernd Eidenschink wrote: > Am Mittwoch, 12. Juli 2006 15:23 schrieb Bernd Eidenschink: >> it looks like the mechanism of changing the encoding works as expected, but >> the computation of the correct contentlength not. > > ok. I think this is going on: > > 1. NsTclReturnObjCmd > If ns_return is called _without_ -binary switch, Ns_ConnReturnCharData is > called. The length is computed from the UTF-8 representation. > > 2. Ns_ConnReturnCharData > calls > > 3. ReturnCharData > Knows what output encoding to use. If not UTF-8, it will call > Ns_WriteCharConn > > ...BUT BEFORE... > > 4. > Ns_ConnSetRequiredHeaders > calls Ns_ConnSetLengthHeader > (Set the Content-Length output header) > > for > > 5. > Ns_ConnQueueHeaders > calls NS_ConnConstructHeaders > where > "Update the response length value directly from the header to be sent, i.e., > don't trust programmers" > > So the content-length is set. > > NOW: > > 7. Ns_WriteCharConn is called, calls > Ns_ConnWriteChars > that with the help of Ns_ConnWriteVChars encodes > with the help of "Tcl_UtfToExternal" to the final encoding. > > I can be fundamentally wrong, but it seems to me like the > whatever-it-will-become final encoding length should be computed earlier... > > What do you think? > > Bernd. > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |
From: Vlad S. <vl...@cr...> - 2006-07-13 02:10:29
|
Or you can try to replace ReturnCharData in return.c and see if this works (i did not test it): static int ReturnCharData(Ns_Conn *conn, int status, CONST char *data, int len, CONST char *type, int sendRaw) { Conn *connPtr = (Conn *) conn; int result; Tcl_Encoding enc; Tcl_DString type_ds; int new_type = NS_FALSE; if (conn->flags & NS_CONN_SKIPBODY) { data = NULL; len = 0; } if (len < 0) { len = data ? strlen(data) : 0; } if (len > 0 && !sendRaw) { /* * Make sure we know what output encoding (if any) to use. */ NsComputeEncodingFromType(type, &enc, &new_type, &type_ds); if (new_type) { type = Tcl_DStringValue(&type_ds); } if (enc != NULL) { connPtr->encoding = enc; conPtr->flags |= NS_CONN_WRITE_CHUNKED; } else if (connPtr->encoding == NULL) { sendRaw = NS_TRUE; } } Ns_ConnSetRequiredHeaders(conn, type, len); Ns_ConnQueueHeaders(conn, status); if (sendRaw) { result = Ns_WriteConn(conn, data, len); } else { result = Ns_WriteCharConn(conn, data, len); } if (result == NS_OK) { result = Ns_ConnClose(conn); } if (new_type) { Tcl_DStringFree(&type_ds); } return result; } Vlad Seryakov wrote: > I think the only solution here will be to use Chunked-Encoding to output > encoded content, in this case Content-Length should be removed fromthe > headers. > > Ns_ConnWriteVChars handles that, so it shoudl be chnaged i guess. > > Bernd Eidenschink wrote: >> Am Mittwoch, 12. Juli 2006 15:23 schrieb Bernd Eidenschink: >>> it looks like the mechanism of changing the encoding works as expected, but >>> the computation of the correct contentlength not. >> ok. I think this is going on: >> >> 1. NsTclReturnObjCmd >> If ns_return is called _without_ -binary switch, Ns_ConnReturnCharData is >> called. The length is computed from the UTF-8 representation. >> >> 2. Ns_ConnReturnCharData >> calls >> >> 3. ReturnCharData >> Knows what output encoding to use. If not UTF-8, it will call >> Ns_WriteCharConn >> >> ...BUT BEFORE... >> >> 4. >> Ns_ConnSetRequiredHeaders >> calls Ns_ConnSetLengthHeader >> (Set the Content-Length output header) >> >> for >> >> 5. >> Ns_ConnQueueHeaders >> calls NS_ConnConstructHeaders >> where >> "Update the response length value directly from the header to be sent, i.e., >> don't trust programmers" >> >> So the content-length is set. >> >> NOW: >> >> 7. Ns_WriteCharConn is called, calls >> Ns_ConnWriteChars >> that with the help of Ns_ConnWriteVChars encodes >> with the help of "Tcl_UtfToExternal" to the final encoding. >> >> I can be fundamentally wrong, but it seems to me like the >> whatever-it-will-become final encoding length should be computed earlier... >> >> What do you think? >> >> Bernd. >> >> >> ------------------------------------------------------------------------- >> Using Tomcat but need to do more? Need to support web services, security? >> Get stuff done quickly with pre-integrated technology to make your job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> naviserver-devel mailing list >> nav...@li... >> https://lists.sourceforge.net/lists/listinfo/naviserver-devel >> > > -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |
From: Bernd E. <eid...@we...> - 2006-07-13 07:02:14
|
Hi Vlad, > Or you can try to replace ReturnCharData in return.c and see if this > works (i did not test it): yes, it seems to work: HTTP/1.1 200 OK MIME-Version: 1.0 Accept-Ranges: bytes Date: Thu, 13 Jul 2006 06:45:55 GMT Server: NaviServer/4.99.2 Content-Type: text/html; charset=iso8859-15 Transfer-encoding: chunked Connection: close (Length: not specified [text/html]) I think we should somehow solve this and compute a correct Content-Length, reconsidering also the note from Stephen concerning open bug nr. 1377059. Bernd. |
From: Stephen D. <sd...@gm...> - 2006-07-13 22:03:22
Attachments:
encoding-tests.patch
|
On 7/13/06, Bernd Eidenschink <eid...@we...> wrote: > > Hi Vlad, > > > Or you can try to replace ReturnCharData in return.c and see if this > > works (i did not test it): > > yes, it seems to work: > > HTTP/1.1 200 OK > MIME-Version: 1.0 > Accept-Ranges: bytes > Date: Thu, 13 Jul 2006 06:45:55 GMT > Server: NaviServer/4.99.2 > Content-Type: text/html; charset=iso8859-15 > Transfer-encoding: chunked > Connection: close > (Length: not specified [text/html]) > > I think we should somehow solve this and compute a correct Content-Length, > reconsidering also the note from Stephen concerning open bug nr. 1377059. > Yeah, I don't think chunked encoding is the real answer here... >From your explanations of encodings, it sounds like you've got a real handle on this! A lot of nice debugging and testing there too. If you turn that into some real tests and put them in tests/ecoding.test, I'll give you a big kiss! Re the open bug: I'm pretty sure removing the shortcut for utf-8 will fix it, but I'm having trouble writing tests to prove it! Attached are some tests I was working on (ouch!) 4 months ago... Big kisses all round for anyone who can figure this out. |
From: Bernd E. <eid...@we...> - 2006-08-08 08:57:20
|
Hi Stephen, I asked my colleague Michael to apply your patch and start writing tests for the encoding. He wrote some and it looks like there should some other things be considered/fixed first to make them meaningful. After that we can discuss a solution for the Content-Length/Encoding problem which was introduced by the scatter/gather algorithms. I'll ask Michael to join the list and summarize what he found in the code. Bernd. > >From your explanations of encodings, it sounds like you've got a real > > handle on this! A lot of nice debugging and testing there too. If > you turn that into some real tests and put them in tests/ecoding.test, > I'll give you a big kiss! > > Re the open bug: I'm pretty sure removing the shortcut for utf-8 will > fix it, but I'm having trouble writing tests to prove it! > > Attached are some tests I was working on (ouch!) 4 months ago... Big > kisses all round for anyone who can figure this out. |