Edgewall Software

Version 3 (modified by Christian Boos, 15 years ago) ( diff )

he he, got a conflict while trying to save my version ;-) Let's save it, then look how close it is to yours…

The problem of cache invalidation

Problem

Trac uses various caches at the component level, in order to speed-up costly tasks. Some examples are the recent addition of ticket fields cache (#6436), others are the InterMapTxt cache, the user permission cache, the oldest example being the Wiki page cache.

Those cache are held at the level of Component instances. For a given class, there's one such instance per environment in any given server process. The first thing to take into account here is that those caches must be safely accessed and modified when accessed by concurrent threads (in multi-threaded web front ends, that is). That's not a big deal, and I think it's already handled appropriately.

But due to the always possible concurrent access at the underlying database level by multiple processes, there's also a need to maintain a consistency and up-to-date status of those caches across all processes involved. Otherwise, you might do a change by the way of one request and the next request (even the GET following a redirect after your POST!) might be handled by a different server process which has a different "view" of the application state, and you end up confused at best, filing a bug on t.e.o at worst ;-)

This doesn't even have to imply a multi-process server setup, as all what is needed is e.g. a modification of the database done using trac-admin.

Current Situation

So the current solution to the above problem is to use some kind of global reset mechanism, which will not only invalidate the caches, but simply "throw away" all the Component instances of the environment that has been globally reset. That reset happens by the way of a simulated change on the TracIni file, triggered by a call to self.config.touch() from a component instance. The next time an environment instance is retrieved, the old environment instance is found to be out of date and a new one will be created (see trac.env.open_environment). Consequently, new Component instances will be created as well, and the caches will be repopulated as needed.

Pros:

  • it works well ;-)

Cons:

  • it's a bit costly - though I've no numbers on that, it's easy to imagine that if this full reset happens too frequently, then the benefits from the caches will simply disappears. In the past, when the reset rate was abnormally high due to some bug, the performance impact was very perceptible.
  • it's all or nothing - the more we rely on this mechanism for different caches, the more we'll aggravate the above situation. Ideally, invalidating one cache should not force all the other caches to be reset.

Idea 1: The CacheManager

This idea introduces a centralized cache manager component that manages cached data, retrieval from the database and invalidation of cached data. The assumption is that it doesn't make sense to retrieve data to be cached from the database more than once per HTTP request.

Every cache is identified by a unique identifier string, for example 'ticket.fields' and 'wiki.InterMapTxt'. It has an associated retrieval function that populates the cache if required, and a generation number that starts at 0 and is incremented at every cache invalidation.

A new table in the database stores the cache identifiers, along with the current generation number and possibly the time of the last invalidation (for timed cache invalidation). The schema would be something like:

Table('cache', key='id')[
    Column('id'),
    Column('generation', type='int'),
    Column('time', time='int'),
]

So how is the cache used?

  • HTTP request: At the beginning of every HTTP request, the complete cache table is read into memory. This provides the CacheManager with the current state of the database data. Timed invalidation could also be done at this point, by dropping cached data that is too old.
  • Retrieval of cached data: The CacheManager can be queried for a reference to a cache. At this point, it checks if the generation number of the cached data matches the number read at the start of the HTTP request. If it does, the cached data is simply returned. Otherwise, the cached data is discarded, the retrieval function is called to populate the cache with fresh data, and the data is returned.
  • Invalidation of cached data: Invalidation of cached data is done explicitly after updating the database by incrementing the generation number for the cache in the cache table, in the same transaction as the data update, and invalidating the currently cached data in the CacheManager.

Pros:

  • Caches are managed in a single place, and the cache logic is implemented once and for all. This should avoid bugs due to re-implementing cache logic for every individual cache.
  • Cached data is consistent for the duration of an HTTP request.
  • Caches can be made fine-grained. For example, it may be possible to use separate caches for the values of every ticket field (not sure we want that, though). Invalidation is fine-grained as well.

Cons:

  • One additional database query per HTTP request. I don't know how much impact this can have, but I would expect this to be negligible, as the cache table should never grow past a few dozen rows.
  • Caches must be invalidated explicitly. The same drawback applies to the current situation, so nothing is lost there.

Open questions:

  • This strategy should work well in a multi-process scenario. In a multi-thread scenario, proper locking must ensure that cached data is not modified during a request. It may be possible to use thread-local storage to ensure that a single request has a consistent view of the cache, even if a second thread invalidates the cache.

Comments and improvements are welcome. If this approach sounds reasonable, I'd like to do a prototype implementation and apply it to a few locations (the wiki page cache and ticket fields).

Idea 2: Cache control

I'm currently thinking about the following solution.

Each time a cache needs to be invalidated (i.e. in the current situations where we call config.touch()), we would instead call env.cache_invalidate(cache_key), where cache_key is

some unique key identifying that cache (e.g. "InterMapTxt" or "repository-reponame" for the MultiRepositoryCache). This call will atomically increment some generation value associated to the key, in the db (that might be tricky - select for update for Pgsql, explicit transaction for Pysqlite). A simple create table cachecontrol (key text, generation int) should be enough.

At periodic times, e.g. in open_environment, we would call env.cache_update(). That will do a select * from cachecontrol. The results are stored besides the previously known latest values, therefore we can quickly see which caches need a refresh.

Whenever a Component has to fetch a value from the cache, it will first call env.cache_is_valid(cache_key). If the result is true, it can retrieve values from the cache. If not, the cache has to be updated first. Once the cache is refreshed, the component calls env.cache_validate(cache_key).

Example: InterMapTxt cache

For convenience, if a Component only manages one cache (the common case), it can pass self instead of a string key and its class name will be used.

Only the code changes for trac/env.py and trac/wiki/interwiki.py are roughly implemented (i.e. not tested yet - just to illustrate the above).

See attachment:cache_control-r7933.diff

Attachments (6)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.