Thread: [Rabbit-proxy-users] Temporary Files
Brought to you by:
ernimril
From: Jason F. <xen...@gm...> - 2007-03-12 20:51:00
|
Greetings, I'm curious about the temporary cache files. As I understood it, rabbit would automatically clean out unecessary cache files and strive to attain a specified cache size. However, it seems that my installation is not following these directives at all and regularly fills the entire hard drive, resulting in a manual clean. I have three instances of rabbit running and the current cache sized are 1.5G, 1.1G, and 750M ... Is this normal? The cache drive is 5 Gig currently. My config (with respect to the cache) looks like this : # The time in hours to cache files, unless specified otherwise (in the # http header that is). cachetime=6 # The maximal size of the proxy in MB. # The cache sweeps at regular intervalls and if the cache is bigger # some stuff is cleaned out. maxsize=1000 # The time the cleaner sleeps between cleanups. # time is in seconds. cleanloop=60 Thanks, -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Robert O. <ro...@kh...> - 2007-03-12 23:47:13
|
Hello! Jason Frisvold wrote: > I'm curious about the temporary cache files. As I understood it, > rabbit would automatically clean out unecessary cache files and strive > to attain a specified cache size. That is the basic idea, yes. > However, it seems that my > installation is not following these directives at all and regularly > fills the entire hard drive, resulting in a manual clean. Odd, how much off is it by? Hmmm, when I think about it, rabbit probably only counts the resource data, not the http headers, that may cause the real size to be quite a few times larger than the resource size, especially if you get lots of really small resources. I will think about the correct way to handle this. Probably want to add the header sizes to the cache size. One day I will try to make the header handling better as well... > I have three instances of rabbit running and the current cache sized > are 1.5G, 1.1G, and 750M ... Is this normal? The cache drive is 5 Gig > currently. What happen if you lower the cache sizes? will it stay nicely. How much data do the cache status page say that rabbit holds? Also: what file system do you use? will it waste space for lots of small files (since rabbits cache usually is just that). /robo |
From: Jason F. <xen...@gm...> - 2007-03-13 15:51:57
|
Hi Robert, On 3/12/07, Robert Olofsson <ro...@kh...> wrote: > Odd, how much off is it by? Seems to be as high as 50% ... I haven't been watching it extremely closely to this point, but I'm checking more often now.. > Hmmm, when I think about it, rabbit probably only counts the > resource data, not the http headers, that may cause the real size to be > quite a few times larger than the resource size, especially if you > get lots of really small resources. Could there really be 500 Megs of headers in 1.5 gig of cache data? That seems a lot... > What happen if you lower the cache sizes? will it stay nicely. I've dropped the cache size to 500 Meg. I'll monitor it over the next few days.. > How much data do the cache status page say that rabbit holds? I emptied the cache prior to your email. I'll check it out if it exceeds the max again... Incidentally, how do I access the status page? I'm getting a 407, Proxy Authentication Required error... > Also: what file system do you use? will it waste space for lots of small > files (since rabbits cache usually is just that). I'm currently using an ext3 system. > /robo Thanks! -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Jason F. <xen...@gm...> - 2007-03-19 23:19:22
|
On 3/13/07, Jason Frisvold <xen...@gm...> wrote: > Hi Robert, > > Seems to be as high as 50% ... I haven't been watching it extremely > closely to this point, but I'm checking more often now.. Ok, hitting my alert system now. One cache is 1.5 Gig, one is 1.2 Gig and the last is 445 Meg. Rabbit is set to a max of 500 meg. Question. Does Rabbit lose association with the cache if it's reset? For example, if rabbit crashes, does it "forget" the previous cache files? > > How much data do the cache status page say that rabbit holds? > > Incidentally, how do I access the status page? I'm getting a 407, > Proxy Authentication Required error... Still not sure how to do this.. Can you point me in the right direction? Thanks, -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Robert O. <ro...@kh...> - 2007-03-20 06:18:20
|
Jason Frisvold wrote: > Question. Does Rabbit lose association with the cache if it's reset? > For example, if rabbit crashes, does it "forget" the previous cache > files? Rabbit writes out an index file every now and then. Rabbit reads that index file when it starts up so the cache info should be reused. Still, your sizes seems high. As I told you the key/hook files are not part of the cache size (and they ought to be). I will try to fix the broken accounting, I will hopefully have time to do that in the next few days. /robo |
From: Jason F. <xen...@gm...> - 2007-03-20 13:40:22
|
On 3/20/07, Robert Olofsson <ro...@kh...> wrote: > Rabbit writes out an index file every now and then. Rabbit reads that > index file when it starts up so the cache info should be reused. I figured as much, just checking ... :) > As I told you the key/hook files are not part of the cache size > (and they ought to be). Is there any way to get a listing of just those files and check the size on that? ... I *think* I accomplished this. Tried the following sequence : ls -laR | egrep "(hook|key)" > /tmp/list.txt TOTAL=0 cat /tmp/list.txt | while read a b c d e f ; do let TOTAL=$TOTAL+$e ; echo "$TOTAL" ; done I came up with a grand total of about 32 Meg of hook and key files. Subtract that from the total size of 1.6 Gig and there's still over 1 gig of unaccounted for data... To verify my results, I took just the cache files, minus the hook and key files, and tried the same test.. : ls -laR | egrep -v "(hook|key)" > /tmp/list2.txt TOTAL=0 cat /tmp/list2.txt | grep ^- | while read a b c d e f ; do let TOTAL=$TOTAL+$e ; echo "$TOTAL" ; done Here I got a grand total of about 899 meg. Added to the 32 meg about and this is about 920 meg, a tad lower than the 1.6 gig that du reports. If I use --apparent-size on my du command, I get 928 meg. The error is probably time dependent as this is a live cache, so files were created/deleted as I performed these tests. I can believe, I suppose, that there's 500 meg of wasted space due to block size. Still, that's about 400 meg more than there should be if rabbit is cleaning at 500 meg. So it would appear, at least to me, that rabbit isn't cleaning out when it should? Is there a way to "force" a cleanup, or see when the cleanup occurs? > I will try to fix the broken accounting, I will hopefully have time > to do that in the next few days. Ok, sounds good. I'll keep hand-cleaning the cache till then... BTW. I've tried, and failed, to get into CacheStatus. According to what I've read, I need to have a user defined in the users file and my IP not blocked in the access file. Both of these hold true, but when I try to login to the cache : http://user:pass@mycache:8081/CacheStatus It fails with a proxy authentication error. Any idea what's going on here? > /robo Thanks! -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Robert O. <ro...@kh...> - 2007-03-20 20:44:48
|
Hello! I have updated rabbit so that it should count the size of the .hook and .key files as well. You will find the fix in 3.9-pre1, download it from: http://www.khelekore.org/rabbit/index.shtml Please tell me how this works. Jason: can you please check if the rcache/temp/ directory contains many _old_ files? and if so, what types the files are, something like "file /tmp/rcache/temp/*" I have seen rabbit leave image files there on a few occasions (usually named something like 3946.c). If you have active users the temp-dir will contain a few files, but the files ought to disappear quickly. /robo |
From: Jason F. <xen...@gm...> - 2007-03-26 15:12:20
|
On 3/20/07, Robert Olofsson <ro...@kh...> wrote: > Hello! > > I have updated rabbit so that it should count the size > of the .hook and .key files as well. > > You will find the fix in 3.9-pre1, download it from: > http://www.khelekore.org/rabbit/index.shtml > > Please tell me how this works. General functionality seems to work well... > Jason: can you please check if the rcache/temp/ directory > contains many _old_ files? and if so, what types the files are, > something like "file /tmp/rcache/temp/*" I have seen rabbit > leave image files there on a few occasions (usually named something > like 3946.c). If you have active users the temp-dir will contain > a few files, but the files ought to disappear quickly. Ok, current cache stats : Cachedir: file:/tmp/squabbit-1/. #Cached files: 38114. current Size: 230 MB. (241681218 bytes).. Max Size: 500 MB. Cachetime: 6 hours. du shows this : On-disk - 1.2G Apparent Size - 698M That temp directory you mention is littered with files. Du reports : On-Disk - 547M Apparent Size - 353M Subtracting, shouldn't this mean that there's 345 meg of cached files? Rabbit only shows 230 ? At any rate, it looks like those temp files are not being removed.. They date all the way back to the 20th... Any ideas? > /robo Thanks, -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Holger K. <hol...@gm...> - 2007-03-19 23:58:15
|
Jason Frisvold schrieb: > On 3/13/07, Jason Frisvold <xen...@gm...> wrote: >> Hi Robert, >> >> Seems to be as high as 50% ... I haven't been watching it extremely >> closely to this point, but I'm checking more often now.. > > Ok, hitting my alert system now. One cache is 1.5 Gig, one is 1.2 Gig > and the last is 445 Meg. Rabbit is set to a max of 500 meg. Could this be a problem between filesize and used space in the filesystem due to clustersize? |
From: Jason F. <xen...@gm...> - 2007-03-20 03:08:09
|
On 3/19/07, Holger Krull <hol...@gm...> wrote: > > Ok, hitting my alert system now. One cache is 1.5 Gig, one is 1.2 Gig > > and the last is 445 Meg. Rabbit is set to a max of 500 meg. > > Could this be a problem between filesize and used space in the > filesystem due to clustersize? I suppose it's possible, but three times the max limit? I can't believe that ext3 is *that* inefficient.. Any idea how to determine the actual total file size as opposed to the size on disk? -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Jason F. <xen...@gm...> - 2007-03-20 03:21:06
|
On 3/19/07, Jason Frisvold <xen...@gm...> wrote: > I suppose it's possible, but three times the max limit? I can't > believe that ext3 is *that* inefficient.. > > Any idea how to determine the actual total file size as opposed to the > size on disk? dumpe2fs gave me the info.. looks like it uses 4k blocks. If I take a rough estimate of the number of files, about 266,000 and multiply that by 4k, I only get 1 Gig.. There's still 600 meg unaccounted for.. (I think that's a relatively accurate way of guessing on disk vs actual size) -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Robert O. <ro...@kh...> - 2007-03-26 21:48:30
|
Jason Frisvold wrote: > General functionality seems to work well... Except that it is still leaking.. > Cachedir: file:/tmp/squabbit-1/. > #Cached files: 38114. can you check how many .key and .hook files you have in that dir? > current Size: 230 MB. (241681218 bytes).. > Max Size: 500 MB. > ... > On-disk - 1.2G > Apparent Size - 698M Hmmm, not good. > That temp directory you mention is littered with files. Du reports : Basically it goes like this: resource comes from real web server resource is filtered resource is written to client and cache temp One of: 1) resource is completed correctly: cache temp is moved into cache for real 2) resource is aborted or filtering fails: cache temp is removed. > Subtracting, shouldn't this mean that there's 345 meg of cached files? > Rabbit only shows 230 ? Correct. > At any rate, it looks like those temp files are not being removed.. > They date all the way back to the 20th... I would like to know what type the files in the cache are. Also: have you managed to see rabbits status pages? /robo |
From: Jason F. <xen...@gm...> - 2007-03-27 14:07:29
|
On 3/26/07, Robert Olofsson <ro...@kh...> wrote: > Except that it is still leaking.. Well, yeah.. :) Though I'd rather have rabbit properly caching and compressing as it should and just have to "empty the rabbit cage" every once in a while... > can you check how many .key and .hook files you have in > that dir? Sure, I'll add that to the list of things I grab next time.. (Had to empty the cache to keep the system running) > Hmmm, not good. Could this be a power of two error? ie, you're calculating using 1000 bytes per K instead of 1024? ... Hrm.. maybe not. that's only a difference of what, about 12 meg or so? ... > Basically it goes like this: > resource comes from real web server > resource is filtered > resource is written to client and cache temp Rabbit writes the temp file, right? So it should never lose the handle on the file? > I would like to know what type the files in the cache are. I did happen to get nosy and look at this.. :) Looks like these are all JPEG files. I didn't load them into a viewer to see exactly what they were, but there were several thousand of these.. > Also: have you managed to see rabbits status pages? Yeppers.. I can access it now with no problem.. Was completely my fault the whole time.. Didn't quite "get" how to access it, though I have it now.. > /robo As soon as I get more data (probably a day or two) I'll post it.. -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Jason F. <xen...@gm...> - 2007-04-10 11:32:58
|
On 3/27/07, Jason Frisvold <xen...@gm...> wrote: > > can you check how many .key and .hook files you have in > > that dir? > > Sure, I'll add that to the list of things I grab next time.. (Had to > empty the cache to keep the system running) Cachedir: file:/tmp/squabbit-1/. #Cached files: 42213. current Size: 336 MB. (352902676 bytes).. Max Size: 500 MB. Cachetime: 6 hours. [friz@rabbit squabbit-1]$ find . -print | egrep "(hook|key)" | wc -l 84426 Are you counting pairs? du shows 1.4G du --apparent-size shows 756M temp has 642M in it. (405M apparent) There are files over a week old in there, some older. I checked a handful of these files and they're all JPEG image files. So, at this point it looks like the temp directory is the problem here. First, these files are not included in the cache size, and second, these files are not being purged. Does this help? -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Robert O. <ro...@kh...> - 2007-04-10 17:20:35
|
Jason Frisvold wrote: > du shows 1.4G > > du --apparent-size shows 756M > > temp has 642M in it. (405M apparent) There are files over a week > old in there, some older. I checked a handful of these files and > they're all JPEG image files. Ok, now that I have looked, I have actually seen one or two files in my temp folder that stays. In my cases it has been image files (I think jpeg, but I am not sure). In the few cases I have seen the image has only been partially downloaded. Jason: Can you check if the images are fully downloaded? > So, at this point it looks like the temp directory is the problem > here. First, these files are not included in the cache size, and > second, these files are not being purged. > Does this help? Yes, a lot. I have not had the time to look at this, but for the moment it looks like the image handler is failing to clean up from time to time. Since this is one of the more complicated handlers that would not be very unexpected Thanks /robo |
From: Jason F. <xen...@gm...> - 2007-04-10 18:13:49
|
On 4/10/07, Robert Olofsson <ro...@kh...> wrote: > Ok, now that I have looked, I have actually seen one or two files > in my temp folder that stays. > In my cases it has been image files (I think jpeg, but I am not sure). > In the few cases I have seen the image has only been partially > downloaded. > > Jason: Can you check if the images are fully downloaded? It appears that all of the images (or, rather, the majority) are fully downloaded jpegs.. I'm not getting any errors when I try to view them, anyway.. > Yes, a lot. > I have not had the time to look at this, but for the moment it looks > like the image handler is failing to clean up from time to time. > Since this is one of the more complicated handlers that would not > be very unexpected Ok. Now, how to fix it. I could write a script to clean up that temp directory manually. How long are those files good for? Are they "good" for as long as the cache files, or can they be erased shortly after creation? Is 24 hours enough time to let them hang around? > Thanks > /robo Thanks, -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Robert O. <ro...@kh...> - 2007-04-10 18:27:07
|
Jason Frisvold wrote: > It appears that all of the images (or, rather, the majority) are fully > downloaded jpegs.. I'm not getting any errors when I try to view > them, anyway.. Ok, good to know, not what I hoped for, but still good to know. > Ok. Now, how to fix it. I could write a script to clean up that temp > directory manually. How long are those files good for? Are they > "good" for as long as the cache files, or can they be erased shortly > after creation? Is 24 hours enough time to let them hang around? As soon as the request is handled the cache entry is moved from temp to it's right place. So I would say that files that are older than a few minutes are very suspect, on slow downloads you may have really large iso-images and such, but 1 hour or more is probably safe enough (depending on your users...). Thanks /robo |
From: Jason F. <xen...@gm...> - 2007-04-10 20:11:35
|
On 4/10/07, Robert Olofsson <ro...@kh...> wrote: > As soon as the request is handled the cache entry is moved from > temp to it's right place. So I would say that files that are > older than a few minutes are very suspect, on slow downloads you may > have really large iso-images and such, but 1 hour or more is probably > safe enough (depending on your users...). So these become the extension-less files? It sounds like I can comfortably run a 24 hour cleanup script that reduces a lot of this.. With that running, I probably won't see my temp drive max out anymore.. I'll look into getting that into place. I'm not sure if this helps at all, but I ran a quick shell script to check the type of each file in the temp directly. With very few exceptions (< 1%), every single file there is a JPEG image. The only other data types found were filesystems like minix, x86 boot sectors, and sun disk labels... I found other data types like .GIF files in the cache directories, though. Perhaps that will help identify the cause of the problem? > Thanks > /robo -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |
From: Jason F. <xen...@gm...> - 2007-04-22 23:48:55
|
On 4/10/07, Jason Frisvold <xen...@gm...> wrote: > It sounds like I can comfortably run a 24 hour cleanup script that > reduces a lot of this.. With that running, I probably won't see my > temp drive max out anymore.. I'll look into getting that into place. Well, the cleanup script seems to be working now.. And now the cache has exceeded it's maximum ... Cache status Cachedir: file:/tmp/squabbit-1/. #Cached files: 101394. current Size: 619 MB. (650069237 bytes).. Max Size: 500 MB. Cachetime: 6 hours. Just tryin to keep you updated.. Lemme know if I can provide any additional information. -- Jason 'XenoPhage' Frisvold Xen...@gm... http://blog.godshell.com |