Nischay, Markus and me where discussing how to implement caching for ask queries and inadvertently ended up discussing the whole query invalidation project again. Since this fits in with Nischay his project and is something I also want to poke at since we'll have to implement something similar in Wikidata I decided to write up my current thoughts on how to implement this.

I propose having a table "queries" where each row an identifier for a query (for instance a hash of the conditions, printouts and relevant params). Entries would be added on page save in case they are not there yet. Each entry can contain the computed results for the query. There would also be a table to map each query to the pages on which it's used. Flow would look like this:

* People use a single query on multiple pages, first usage inserts a new entry in queries with the freshly obtained result

* Successive usages just get the result from the cache in the query table

* When someone changes data, we figure out what queries can be affected and remove their cache, plus invalidate the parser,html,whatever caches of all pages containing any of the queries that had their cache removed

* On next view of such a page SMW find an empty cache for the query and recomputes it

Note: we would not necessarily need to wait for people to view a page to have the cache rebuild (both the query cache and the page specific caches). We could create jobs to do this, so that on the next view of the page, it's there immediately. This only makes sense for wikis where most pages get visited often though, since else you might be doing a lot of work for nothing.

The only difficult problem left to solve here is how to best figure out which queries have changed (or could have changed), but this does not appear to affect the rest of the design.

Anyone objections against such an approach or suggestions of any sort?


Jeroen De Dauw
Don't panic. Don't be evil.