Context Navigation

CacheInvalidation

Timestamp:: Mar 6, 2009, 6:52:43 PM (15 years ago)
Author:: Remy Blank
Comment:: Added a first idea: the CacheManager.

Legend:

: Unmodified
: Added
: Removed
: Modified

TracDev/Proposals/CacheInvalidation

-              v1
+              v2
+== Idea 1: The `CacheManager` ==
+This idea introduces a centralized cache manager component that manages cached data, retrieval from the database and invalidation of cached data. The assumption is that it doesn't make sense to retrieve data to be cached from the database more than once per HTTP request.
+Every cache is identified by a unique identifier string, for example `'ticket.fields'` and `'wiki.InterMapTxt'`. It has an associated retrieval function that populates the cache if required, and a generation number that starts at 0 and is incremented at every cache invalidation.
+A new table in the database stores the cache identifiers, along with the current generation number and possibly the time of the last invalidation (for timed cache invalidation). The schema would be something like:
+{{{
+Table('cache', key='id')[
+    Column('id'),
+    Column('generation', type='int'),
+    Column('time', time='int'),
+]
+}}}
+So how is the cache used?
+ * '''HTTP request''': At the beginning of every HTTP request, the complete `cache` table is read into memory. This provides the `CacheManager` with the current state of the database data. Timed invalidation could also be done at this point, by dropping cached data that is too old.
+ * '''Retrieval of cached data''': The `CacheManager` can be queried for a reference to a cache. At this point, it checks if the generation number of the cached data matches the number read at the start of the HTTP request. If it does, the cached data is simply returned. Otherwise, the cached data is discarded, the retrieval function is called to populate the cache with fresh data, and the data is returned.
+ * '''Invalidation of cached data''': Invalidation of cached data is done explicitly after updating the database by incrementing the generation number for the cache in the `cache` table, in the same transaction as the data update, and invalidating the currently cached data in the `CacheManager`.
+Pros:
+ * Caches are managed in a single place, and the cache logic is implemented once and for all. This should avoid bugs due to re-implementing cache logic for every individual cache.
+ * Cached data is consistent for the duration of an HTTP request.
+ * Caches can be made fine-grained. For example, it may be possible to use separate caches for the values of every ticket field (not sure we want that, though). Invalidation is fine-grained as well.
+Cons:
+ * One additional database query per HTTP request. I don't know how much impact this can have, but I would expect this to be negligible, as the `cache` table should never grow past a few dozen rows.
+ * Caches must be invalidated explicitly. The same drawback applies to the current situation, so nothing is lost there.
+Open questions:
+ * This strategy should work well in a multi-process scenario. In a multi-thread scenario, proper locking must ensure that cached data is not modified during a request. It may be possible to use thread-local storage to ensure that a single request has a consistent view of the cache, even if a second thread invalidates the cache.
+Comments and improvements are welcome. If this approach sounds reasonable, I'd like to do a prototype implementation and apply it to a few locations (the wiki page cache and ticket fields).