From: Craig F. <cr...@wi...> - 2002-08-06 05:28:08
|
Okay, I've had a couple more evenings to sit down and play with apt-proxy- v2. I figured out a couple of things (such as the existence of /etc/init.d/apt-proxy-v2 (-; ), and I have also come across a few oddities. Some of these I think I mentioned in my last email to the list, but I'll elaborate more here: 1) apt-proxy-v2 does not seem to serve files from its own cache to clients. This happens for Packages.gz and Release files as well as packages. The attached file (getpackages) contains the log messages for the second of two "apt-get update"'s. The second "apt-get update" was executed immedately after the twistd process had gone idle from the first one, and min_refresh_delay is set to 1d in /etc/apt-proxy/apt-proxy-v2.conf. Therefore, apt-proxy-v2 should have served up Packages.gz and Release from its cache, but instead it downloaded them again from the backend and sent them to the client (again). Lines which may be relevant or important include: Line 10 (note the two very different dates): 06/08/2002 00:14 [AptProxy,0,127.0.0.1] [debug:9]If-Modified-Since: Fri, 19 Jul 2002 19:03:33 GMT Lines 15-20: 06/08/2002 00:14 [AptProxy,0,127.0.0.1] [debug:9]CHECKING_CACHED 06/08/2002 00:14 [-] [verify:9]Process Status: -1 06/08/2002 00:14 [-] [verify:9]unknown file: not verified 06/08/2002 00:14 [-] 06/08/2002 00:14 [-] [verify:9]verication failed 06/08/2002 00:14 [-] [debug:9]NOT_CACHED These messages were for the retrieval of Packages.gz. Similar messages exist for Release as well. Any idea of where to start looking to debug this one? 2) apt-proxy-v2 gives 403 Forbidden responses for certain files. For instance, trying to do an "apt-get install gs" always yields: Err http://localhost woody/main gs 6.53-3 403 Forbidden Failed to fetch http://localhost:8000/debian/pool/main/g/gs/gs_6.53- 3_i386.deb 403 Forbidden This behavior is the same for both sid and woody on multiple backend servers. The file in question does exist and is accessible via HTTP on the backend servers. The apt-proxy-v2 log shows the following: 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]Connection: keep-alive 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]User-Agent: Debian APT-HTTP/1.3 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9] 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]/../ in simplified uri 06/08/2002 00:39 [AptProxy,2,127.0.0.1] (harmless warning): discarding zero-length data for request <GET /debian/pool/main/g/gs/gs_6.53-3_i386.deb HTTP/1.1> 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]Client connection closed Note that the "simplified uri" and "harmless warning" lines do not occur in the logs for successful transfers. 3) apt-roxy-vs fails to start upon installation. I didn't save the actual error message, but I seem to remember that there was a permissions error relating to the log file. I also remember that the aptproxy user was created after that error. Perhaps this order is reversed? At any rate, doing an "/etc/init.d/apt-proxy-v2 start" after the install starts it up normally. I will try to take the time to do a complete uninstall and reinstall sometime soon to get the details of all of this. 4) twistd eats a lot of CPU cycles for a rather long time after downloads are complete and the client has disconnected. What is it doing? 5) I see mention of memory leaks in the TODO file, so I assume everyone knows about this, but: Memory usage by twistd gradually grows over time as apt-proxy-v2 serves more and more requests. My system finally OOM'ed and the vm killed twistd after a couple of "apt-get install -d kde"'s on my firewall machine (which does not have any of X installed). 6) Does apt-proxy-v2 really NEED to be a daemon that runs constantly (as opposed to a daemon controlled by inet.d). I understand that there is maintenance to be done on the cache, but it would seem that all of that could be handled at the end of requests and/or by cron scripts. Likewise, I understand that there may a slight performance degradation if the process needs to be started by inet.d, but for small sites, having the large daemon in memory all the time is overkill. Perhaps there could be an option to either run apt-proxy-v2 as a daemon or from inet.d (as is done in the samba and apache packages). Running apt-proxy-v2 from inet.d would also help lessen the effects of the apparent memory leak in apt-proxy-v2 since a new daemon would be started with each request and ended afterwards. This is similar in principle to the way apache kills off child processes after they have served so many pages in order to keep memory leaks under control. What am I missing here that necessitates having twistd running all the time? -- Craig Foster cr...@wi... |
From: Manuel E. S. <ra...@bi...> - 2002-08-08 11:39:01
|
On Tue, Aug 06, 2002 at 05:27:57AM -0000, Craig Foster wrote: > Okay, I've had a couple more evenings to sit down and play with apt-proxy- > v2. I figured out a couple of things (such as the existence > of /etc/init.d/apt-proxy-v2 (-; ), and I have also come across a few > oddities. Some of these I think I mentioned in my last email to the list, > but I'll elaborate more here: > > 1) apt-proxy-v2 does not seem to serve files from its own cache to clients. > This happens for Packages.gz and Release files as well as packages. Those files are mutable (meaning that a file with the same name may change over time, other versioned files like .deb's don't change) and receive different treatment. > Lines 15-20: > 06/08/2002 00:14 [AptProxy,0,127.0.0.1] [debug:9]CHECKING_CACHED > 06/08/2002 00:14 [-] [verify:9]Process Status: -1 > 06/08/2002 00:14 [-] [verify:9]unknown file: not verified > 06/08/2002 00:14 [-] > 06/08/2002 00:14 [-] [verify:9]verication failed > 06/08/2002 00:14 [-] [debug:9]NOT_CACHED Integrity verification seams to have failed, which makes the cached file irelevant. But the file is considered unknonw by the file verifier so the validation should not fail. There is something wrong with the FileVerifier class. > 2) apt-proxy-v2 gives 403 Forbidden responses for certain files. For > instance, trying to do an "apt-get install gs" always yields: > > Err http://localhost woody/main gs 6.53-3 > 403 Forbidden > Failed to fetch http://localhost:8000/debian/pool/main/g/gs/gs_6.53- > 3_i386.deb 403 Forbidden > > This behavior is the same for both sid and woody on multiple backend > servers. The file in question does exist and is accessible via HTTP on the > backend servers. The apt-proxy-v2 log shows the following: > > 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]Connection: keep-alive > 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]User-Agent: Debian > APT-HTTP/1.3 > 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9] > 06/08/2002 00:39 [AptProxy,2,127.0.0.1] [debug:9]/../ in simplified uri It is refusing to serve the file for security reasons, because after trying to simplify all '..' ocurrences some where left, and that is a security problem. simplify_path and a complicated uri are provably to blame. > 4) twistd eats a lot of CPU cycles for a rather long time after downloads > are complete and the client has disconnected. What is it doing? It is generating Packages.gz from Packages or viceversa depending on the backend involved. It shouldn't happen when downloading .deb files. > 5) I see mention of memory leaks in the TODO file, so I assume everyone > knows about this, but: Memory usage by twistd gradually grows over time as > apt-proxy-v2 serves more and more requests. My system finally OOM'ed and the > vm killed twistd after a couple of "apt-get install -d kde"'s on my firewall > machine (which does not have any of X installed). I haven't been very careful with memory leaks, mainly because I don't really know how to control them in python, this confirms my fears :( > 6) Does apt-proxy-v2 really NEED to be a daemon that runs constantly (as > opposed to a daemon controlled by inet.d). I understand that there is > maintenance to be done on the cache, but it would seem that all of that could > be handled at the end of requests and/or by cron scripts. Likewise, I > understand that there may a slight performance degradation if the process > needs to be started by inet.d, but for small sites, having the large daemon > in memory all the time is overkill. Perhaps there could be an option to > either run apt-proxy-v2 as a daemon or from inet.d (as is done in the samba > and apache packages). Running apt-proxy-v2 from inet.d would also help > lessen the effects of the apparent memory leak in apt-proxy-v2 since a new > daemon would be started with each request and ended afterwards. This is > similar in principle to the way apache kills off child processes after they > have served so many pages in order to keep memory leaks under control. What > am I missing here that necessitates having twistd running all the time? The inetd afair, was a limitation of twisted, apt-proxy-v2 is based on it, and twisted did'nt support inetd based daemons when development started. apt-proxy-v2 is not so big when you start it, the big problem is the memory leaks. The daemon does a very light weight cache waking to make sure that there are no files there which it is not keeping track of, and it would have loking issues to do outsite of the main daemon. There are other looking issues which get quite simplified by running a permanent daemon. That said, it would be nice to have a cut-down inetd mode, but it will probably not happen soon. And from your previous email: apt-proxy-v2 should happyly work with an apt-proxy-v1 cache directory, and I believe that apt-proxy-v1 will do the same with an apt-proxy-v2 cache directory, but Chris should confirm the second. What will fail in strange ways is runnig both at the same time on the same cache directory, because they don't do locking with one another and will certainly step on each other's toes. Hope that helps ranty PS: I am not willing to work on apt-proxy-v2 right now because working remotely was not taking me anywhere, I will be back with my development machine on mid augost. You are very welcomed to try to debug apt-proxy-v2, just keep in mind that I will be more helpful when I come back with my development machine/permanent internet access. -- --- Manuel Estrada Sainz <ra...@de...> <ra...@bi...> <ra...@us...> ------------------------ <man...@hi...> ------------------- Let us have the serenity to accept the things we cannot change, courage to change the things we can, and wisdom to know the difference. |