From: Jonathan K. <jk...@cs...> - 2005-01-31 22:14:04
|
On Mon, 31 Jan 2005, Chris Halls wrote: > The ideas are good but they do ignore the two use cases I mentioned that were > the primary reasons for the existence of the database (noatime mounts and > unwanted updating of atime by other programs). Then don't use atime at all. Use a served-time that only apt-proxy knows about. When it sends the file it updates it. When it goes to clean up the cache, it checks for it, and fills it with the mtime if it doesn't exist. If it can't get an mtime, it uses the current time. > implementation than the concept itself. If the recycling actually worked > properly then I'm not convinced there would still be a need to change the > concept. Or do you think that even with the recycling mechanism working, For whatever reason, recycling worked one system, and not on the other. If it worked on both systems, I wouldn't have even noticed it. But I did notice it, and what I saw, I didn't like. The way I understand it, the database needs to be updated so that the cleanup will work. This means you have three processes running: A server that fetches files, stores files in the cache, and sends files to clients while updating the file's last access time in a database. A cleaner that periodically removes old files from the cache by comparing the current time to each file's last access time that is stored in a database. Finally, a recycler that periodically looks through the cache and makes sure all files in the cache are in the database. There are two processes that search through the cache and compares each file to the database. That's redundant. Furthermore, the recycler runs constantly, and if all goes well finds nothing the vast majority of the time. Where as the cleaner runs only periodically (or at least it should). Finally, the recycler runs excruciatingly slowly. It shouldn't take 16 hours to update that small of a database, or any database for that matter. Now what I REALLY don't like is that an arcane command is needed to update a database that already has an automatic process to keep the database updated. That is beyond dumb. It implies that automatic process doesn't work, or at least not well enough to be trusted; and if that's the case, what's the point of having the automatic process in the first place? Not only does apt-proxy-import redo the recycler's job, but it also tries to the server's job by checking if the file to be copied is actually new or not. And when it tries to do that job, it fails miserably because for some reason it can't find backends that the server finds just fine. If apt-proxy-import was simply cp, then the recycler would eventually find the new file and update the database. When the file is requested, the server would check if it's new or not, and do the right thing accordingly, apt-proxy-import is solving a problem that doesn't exist. So yeah, I don't like apt-proxy-import at all. > that the database concept is wrong? I think the database is a necessary hack to get around not having atimes. Being a hack, if there is ever a chance to kill it, then it should be killed at the earliest opportune momement. > (The idea was to add more information to those databases in the future, > such as a better way ensuring that files in the cache are intact) Shouldn't the server process handle that automatically? Doesn't the server process already handle that automatically? If it didn't download a complete file (which it can detect by comparing the number of bytes recieved with the number of bytes expected), then delete the partial file and try again. (This can trivally be extended to check that the server didn't just receive any old bytes, but the correct bytes.) After n tries, fail so the client can request the next file. Next time the client requests the file, it won't be in the database, so the process starts all over again. You should never cache a broken file. If you're talking about files getting damaged that are already on the disk, I don't think that's going to happen very often, and if so, the disk damage is probably far more extensive than just a couple of files limited to the apt-proxy cache. If the damage is extensive, that's the job for a disk recovery tool. Are you proposing to compare a checksum every time a file is served? I think that's a lot of work for something that just isn't going to happen. If on the off chance a file in the cache did manage to become corrupted, the sysadmin would check the logs see the error report (e.g. "unexpected end of file") and simply delete the broken file from the cache, which would cause apt-proxy to get a new copy of the file. If that file is also corrupted, then something much more serious is wrong, and apt-proxy couldn't possibly fix it. (i.e. the remote source file is corrupted, or the disk is dying) So unless you can be more specific than "more information", I'm opposed to using the database anymore than a crutch to get around noatime. -- Jonathan Koren World domination? I'll leave that to the jk...@cs... religious nuts and Republicans, thank you. http://www.cs.siu.edu/~jkoren/ -- The Monarch, "Venture Brothers" |