Re: [Apt-proxy-users] endless recycling

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, 31 Jan 2005, Chris Halls wrote:

> The ideas are good but they do ignore the two use cases I mentioned that were
> the primary reasons for the existence of the database (noatime mounts and
> unwanted updating of atime by other programs).

Then don't use atime at all.  Use a served-time that only apt-proxy knows 
about.  When it sends the file it updates it.  When it goes to clean up 
the cache, it checks for it, and fills it with the mtime if it doesn't 
exist.  If it can't get an mtime, it uses the current time.

> implementation than the concept itself.  If the recycling actually worked
> properly then I'm not convinced there would still be a need to change the
> concept.  Or do you think that even with the recycling mechanism working,

For whatever reason, recycling worked one system, and not on the other. 
If it worked on both systems, I wouldn't have even noticed it.  But I did 
notice it, and what I saw, I didn't like.

The way I understand it, the database needs to be updated so that the 
cleanup will work.  This means you have three processes running:  A server 
that fetches files, stores files in the cache, and sends files to clients 
while updating the file's last access time in a database.  A cleaner that 
periodically removes old files from the cache by comparing the current 
time to each file's last access time that is stored in a database. 
Finally, a recycler that periodically looks through the cache and makes 
sure all files in the cache are in the database.

There are two processes that search through the cache and compares each 
file to the database.  That's redundant.  Furthermore, the recycler runs 
constantly, and if all goes well finds nothing the vast majority of the 
time.  Where as the cleaner runs only periodically (or at least it 
should).  Finally, the recycler runs excruciatingly slowly.  It shouldn't 
take 16 hours to update that small of a database, or any database for that 
matter.

Now what I REALLY don't like is that an arcane command is needed to update 
a database that already has an automatic process to keep the database 
updated.  That is beyond dumb.  It implies that automatic process doesn't 
work, or at least not well enough to be trusted; and if that's the case, 
what's the point of having the automatic process in the first place?

Not only does apt-proxy-import redo the recycler's job, but it also 
tries to the server's job by checking if the file to be copied is actually 
new or not.  And when it tries to do that job, it fails miserably because 
for some reason it can't find backends that the server finds just 
fine.

If apt-proxy-import was simply cp, then the recycler would eventually find 
the new file and update the database.  When the file is requested, the 
server would check if it's new or not, and do the right thing accordingly, 
apt-proxy-import is solving a problem that doesn't exist.

So yeah, I don't like apt-proxy-import at all.

> that the database concept is wrong?

I think the database is a necessary hack to get around not having atimes. 
Being a hack, if there is ever a chance to kill it, then it should be 
killed at the earliest opportune momement.

> (The idea was to add more information to those databases in the future, 
> such as a better way ensuring that files in the cache are intact)

Shouldn't the server process handle that automatically?  Doesn't the 
server process already handle that automatically?  If it didn't download a 
complete file (which it can detect by comparing the number of bytes 
recieved with the number of bytes expected), then delete the partial file 
and try again.  (This can trivally be extended to check that the server 
didn't just receive any old bytes, but the correct bytes.)  After n tries, 
fail so the client can request the next file.  Next time the client 
requests the file, it won't be in the database, so the process starts all 
over again.  You should never cache a broken file.

If you're talking about files getting damaged that are already on the 
disk, I don't think that's going to happen very often, and if so, the disk 
damage is probably far more extensive than just a couple of files limited 
to the apt-proxy cache.  If the damage is extensive, that's the job for a 
disk recovery tool.

Are you proposing to compare a checksum every time a file is served? 
I think that's a lot of work for something that just isn't going to 
happen.  If on the off chance a file in the cache did manage to become 
corrupted, the sysadmin would check the logs see the error report (e.g. 
"unexpected end of file") and simply delete the broken file from the 
cache, which would cause apt-proxy to get a new copy of the file.  If that 
file is also corrupted, then something much more serious is wrong, and 
apt-proxy couldn't possibly fix it.  (i.e. the remote source file is 
corrupted, or the disk is dying)

So unless you can be more specific than "more information", I'm opposed to 
using the database anymore than a crutch to get around noatime.

--
Jonathan Koren			World domination?  I'll leave that to the 
jk...@cs...		religious nuts and Republicans, thank you.
http://www.cs.siu.edu/~jkoren/		-- The Monarch, "Venture Brothers"