Help save net neutrality! Learn more.

#16 Add caching of annotated files to speed up access

General (16)

Currently all pages displayed by LXR are dynamically generated. However, most of the content displayed is relatively static. I propose that LXR create a cache of annotated files to speed up browsing. Here are some specifics:

1) We already have a fileid, so the cache should be organized by fileid with an appropriate directory fan-out. Since the fileid format is specific to the file backend, this might be a bit tricky--one possibility is to let the file backend do the fan-out with a special method. At any rate, this ensures that all releases that use the same file version share the cache, while releases with the same file but different versions have different cached contents.

2) Only the actual annotated file should be cached: we should not cache the entire page (header/footer/etc.) So, we don't have a completely static page, but close: we just have to generate the header/footer etc., and insert the contents of the cached file at the appropriate place. This allows us to avoid the cache even when the list of releases/architectures/etc. changes.

3) If the fileid is truly unique (which I believe it has to be for LXR to work properly) then we don't really need to worry about a stale cache: it can never be stale because a different file will have a different fileid. If, for some reason, that's not sufficient it's trivial enough to detect stale data using creation time on the cached file.

4) Cleaning the cache is a trickier problem. There are a number of options here. For simple disk space reclamation we can, for example, examine the time last accessed and clean up the cached files older than a certain date. A cleanup script could be run daily or weekly, with various options such as a maximum cache size, or a fixed "clean anything older than X days". An alternative to using access time (which can be problematic since backups can modify it, or maybe your filesystem is mounted noatime for efficiency reasons) is to use time last modified, and have LXR touch the file to update the modification time every time it gets the file from the cache.

Another option is to simply not clean the cache, since we don't clean out the database normally. We should add in some code to clean the cache when we DO clean out the DB: there are options to genxref to remove releases etc.


  • Malcolm Box

    Malcolm Box - 2009-04-07

    Or simply configure Squid in front of LXR and let it worry about it...

  • Paul D. Smith

    Paul D. Smith - 2009-04-07

    Will that really work? It's been a while since I've thought about this, and even longer since I've messed with Squid, but since LXR is a CGI interface won't it always go through the proxy to the backing server? If it doesn't, then how do normal dynamic CGI interfaces ever work with a caching front-end like Squid?

  • Malcolm Box

    Malcolm Box - 2009-04-08

    It should work fine - caching behaviour is controlled by Cache-* headers in the HTTP response, and looking at the headers that LXR spits out it rightly does nothing to prevent caching.

    I've not tried it but I see no reason why it wouldn't work. For bonus points LXR should probably provide better caching control in the headers to deal with reindexing rendering the cached copy out of date. However in the normal case where files don't change their content it will be fine.

  • Malcolm Box

    Malcolm Box - 2009-04-08

    Yep - I've just tested this and Squid successfully caches LXR output. I set it up as an explicit HTTP proxy (ie configured in the browser), but I'm sure it could be configured transparently for a site's users.


Log in to post a comment.