From: SourceForge.net <no...@so...> - 2006-02-21 12:09:32
|
Bugs item #1422272, was opened at 2006-02-02 08:37 Message generated for change (Comment added) made by gregwilkins You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=107322&aid=1422272&group_id=7322 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open >Resolution: Fixed Priority: 5 Submitted By: Harsh the barsh (oetzpold) Assigned to: Greg Wilkins (gregwilkins) Summary: Jetty6-Beta8: 30-second delay in connections Initial Comment: I have a testcase which starts a server with 3 WebAppContexts and then accesses them via HttpURLConnection. Every context only consists of a textfile containing only one letter. The test does the following: init WebAppContexts start server foreach WebAppContext do { open a HttpURLConnection read the response code read from input stream } The first time I open a HttpURLConnection, reading the response code takes 188 ms, the second time 30004 ms. I can reproduce this every time with java 1.5.0_06 on linux 2.6.15.1-amd64 and winxp-32bit. The attached archive contains the source code for the textcase, the 3 webcontexts used and a file called output.txt which is the console output on my system. During the 30 second delay, I sent a SIGQUIT to the java process to show where jetty hangs. The resulting thread dump is also contained in output.txt. This seems to be similar to #1415225 ---------------------------------------------------------------------- >Comment By: Greg Wilkins (gregwilkins) Date: 2006-02-21 12:09 Message: Logged In: YES user_id=44062 EXCELLENT!!!! thanks for the debugging and the patch. In fact the whole _hasContent boolean can be removed. I still am a little concerned about messages with content type but no content length, but I think this would only be an issue for parsing responses, not requests. thanks again... in SVN trunk now. ---------------------------------------------------------------------- Comment By: Christopher Taylor (c_s_taylor) Date: 2006-02-21 10:25 Message: Logged In: YES user_id=1428360 Couldn't attach the patch file to this bug, so I submitted it as a patch ---------------------------------------------------------------------- Comment By: Christopher Taylor (c_s_taylor) Date: 2006-02-21 10:22 Message: Logged In: YES user_id=1428360 Found it. The problem is *not* in the connectors, but in the org.mortbay.jetty.HttpParser. In the duplicate bug that I had submitted, look at what Ethereal is showing for the header: ... Content-type: application/x-www-form-urlencoded\r\n \r\n Looking at HttpParser, line 405, then line 442 HttpParser assumes that if we have a Content-Type header, we will have a Content-Length header as well, but we don't in this case. Now, I'm not an HTTP expert, and I couldn't find a clear answer in the HTTP 1.1 spec if it is valid to send a Content-Type without a Content-Length (I believe it is, in case of a FORM GET), but if Sun's code for HttpURLConnection is *not* wrong, then the solution is attached as HttpParser.patch ---------------------------------------------------------------------- Comment By: Christopher Taylor (c_s_taylor) Date: 2006-02-21 09:46 Message: Logged In: YES user_id=1428360 Okay, the easiest way to find this bug in the Jetty 6 code is to switch the connector to SocketConnector, then place a breakpoint in StreamEndPoint::fill(Buffer) on the _in.read() call. That's where the problem is (off by 1?) In my marked-duplicate bug report with the 3.jpg and 4.jpg examples, we get two incoming HTTP requests on the same socket. fill is called three times, and on the third time it hangs on the read call because the underlying socket has no more data (a call to _in.available() returns 0). The work-around that fixes StreamEndPoint is modifying fill to look like: int count = _in.available(); if ( count > 0 ) { int n=_in.read(bytes,buffer.putIndex(),space); if (n>=0) { buffer.setPutIndex(buffer.putIndex()+n); return n; } } I will take a look at the NIO version next. I don't think this is the correct solution to the bug, because I don't know why fill is getting called a third time. Stopping that third call would be the best solution IMHO. ---------------------------------------------------------------------- Comment By: Christopher Taylor (c_s_taylor) Date: 2006-02-10 05:47 Message: Logged In: YES user_id=1428360 Okay, I modified AbstractConnector::configure's calls to setTcpNoDelay(false) and setSoLinger(false,0), and the hanging issue is still there, so does that mean the Nagle theory is not correct? I'll spend some more time on the weekend looking this over. Perhaps there's another off-by-one error that's causing the amount of data sent by Jetty not to match the number of bytes reported in Content-Length? ---------------------------------------------------------------------- Comment By: Christopher Taylor (c_s_taylor) Date: 2006-02-10 05:28 Message: Logged In: YES user_id=1428360 Is there anything I can do to help out? I have plenty of Windows boxes here that I can work on. BTW: I noticed the delay is directly related to the setMaxIdleTime on the SelectChannelConnector. Increase the delay, and the pause increases. If TcpNoDelay is an issue, probably the easiet way to test that theory would be changing AbstractConnector::configure(Socket)'s call to setTcpNoDelay and see what happens If you're on Linux, could you post your Kernel .config so I can see what options you're using in the ipv4 section? ---------------------------------------------------------------------- Comment By: Nigel Canonizado (nigelc) Date: 2006-02-09 02:58 Message: Logged In: YES user_id=1435794 I tried the same test on Windows XP and I also experienced the 30 seconds delay on the second response. Here is the result. :INFO: NO JSP Support for /b, did not find org.apache.jasper.servlet.JspServlet :INFO: NO JSP Support for /c, did not find org.apache.jasper.servlet.JspServlet :INFO: NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet :INFO: Started SelectChannelConnector @ 0.0.0.0:8080 started server... ..a opening http://localhost:8080/a.txt calling HttpURLConnection.getResponseCode()... 200 getResponseCode() took 10 ms get input stream reading... read 1 bytes ..b opening http://localhost:8080/b/b.txt calling HttpURLConnection.getResponseCode()... 200 getResponseCode() took 30023 ms [surefire] Running org.mortbay.jetty.Test30Seconds [surefire] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 31.405 sec <<<<<< << FAILURE !! ---------------------------------------------------------------------- Comment By: Harsh the barsh (oetzpold) Date: 2006-02-08 21:34 Message: Logged In: YES user_id=1331644 During the delay, other processes (for example lynx) can connect to the server without suffering from a delay. I found 2 ways to avoid the delay: - wait (Thread.sleep) at least 5 seconds between the URLConnections - call URLConnection.disconnect() after reading But why does leaving one connection open prevent another connection from being established? ---------------------------------------------------------------------- Comment By: Harsh the barsh (oetzpold) Date: 2006-02-08 20:36 Message: Logged In: YES user_id=1331644 > good news is that I have fixed the ordering issue > (a,b,c works as well as c,b,a) now (in SVN). Great, thanks ! > Bad news is that the delay does not happen on my > development system (linux) Strange... Additional to the two systems I mentioned above, I just tested it on two more boxes 1. linux 2.4.20 x86 sun-jdk1.5.0_05 2. solaris 9 sparc sun-jdk1.5.0_05 On both systems, the 30 second delay happened, too. I also tried a new svn checkout. The ordering issue is indeed fixed, but not the 30 seconds :-( What linux kernel version are you using? ---------------------------------------------------------------------- Comment By: Greg Wilkins (gregwilkins) Date: 2006-02-08 17:18 Message: Logged In: YES user_id=44062 Thanks for the good test harness etc. good news is that I have fixed the ordering issue (a,b,c works as well as c,b,a) now (in SVN). Bad news is that the delay does not happen on my development system (linux) and I don't have access to a good windows dev env. I have had some other reports of this issue, so I am sure there is an issue. But the thread dump that you have indicates nothing that helps either. The issue is almost certainly something to do with the setup of the TCP layer. The naggle algorithm is preventing the close from being seen before a timeout. I will look for a windows resource to investigate some more. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=107322&aid=1422272&group_id=7322 |