Edgewall Software

Changes between Version 22 and Version 23 of TracDev/Proposals/CacheInvalidation


Ignore:
Timestamp:
May 9, 2015, 11:53:37 PM (4 years ago)
Author:
figaro
Comment:

Cosmetic changes, added link to previous proposal

Legend:

Unmodified
Added
Removed
Modified
  • TracDev/Proposals/CacheInvalidation

    v22 v23  
    11[[PageOutline(2-4)]]
    2 = The problem of cache invalidation =
    3 
    4 // This was the design discussion behind the `trac.cache` module ([source:trunk/trac/cache.py trunk], [source:branches/1.0-stable/trac/cache.py 1.0.x]). See also apidoc:api/trac_cache. //
    5 
    6 == Problem ==
     2
     3= The problem of cache invalidation
     4
     5This was the design discussion behind the `trac.cache` module ([source:trunk/trac/cache.py trunk], [source:branches/1.0-stable/trac/cache.py 1.0.x]). See also apidoc:api/trac_cache.
     6
     7== Problem
     8
    79Trac uses various caches at the component level, in order to speed-up costly tasks. Some examples are the recent addition of ticket fields cache (#6436), others are the InterMapTxt cache, the user permission cache, the oldest example being the Wiki page cache.
    810
    911Those cache are held at the level of `Component` instances. For a given class, there's one such instance per environment in any given server process. The first thing to take into account here is that those caches must be safely accessed and modified when accessed by concurrent threads (in multi-threaded web front ends, that is). That's not a big deal, and I think it's already handled appropriately.
    1012
    11 But due to the always possible concurrent access at the underlying database level by multiple processes, there's also a need to maintain a consistency and up-to-date status of those caches across all processes involved. Otherwise, you might do a change by the way of one request and the next request (even the GET following a redirect after your POST!) might be handled by a different server process which has a different "view" of the application state, and you end up confused at best, filing a bug on t.e.o at worst ;-)
     13But due to the always possible concurrent access at the underlying database level by multiple processes, there's also a need to maintain a consistency and up-to-date status of those caches across all processes involved. Otherwise, you might do a change by the way of one request and the next request (even the GET following a redirect after your POST!) might be handled by a different server process which has a different "view" of the application state, and you end up confused at best, filing a bug on t.e.o at worst.
    1214
    1315This doesn't even have to imply a multi-process server setup, as all what is needed is e.g. a modification of the database done using trac-admin.
    1416
    15 == Current Situation ==
     17This proposal was previously logged as wiki:TracDev/Proposals/Journaling.
     18
     19== Current Situation
     20
    1621So the current solution to the above problem is to use some kind of global reset mechanism, which will not only invalidate the caches, but simply "throw away" all the `Component` instances of the environment that has been globally reset. That reset happens by the way of a simulated change on the TracIni file, triggered by a call to `self.config.touch()` from a component instance. The next time an environment instance is retrieved, the old environment instance is found to be out of date and a new one will be created (see `trac.env.open_environment`). Consequently, new Component instances will be created as well, and the caches will be repopulated as needed.
    1722
    1823Pros:
    19  - it works well ;-)
     24 - it works well
    2025
    2126Cons:
     
    2328 - it's all or nothing - the more we rely on this mechanism for different caches, the more we'll aggravate the above situation. Ideally, invalidating one cache should not force all the other caches to be reset.
    2429
    25 
    26 == Final implementation ==
     30== Final implementation
     31
    2732The final implementation, committed in [8071], is a refinement of the [#CacheManager CacheManager] idea.
    2833The documentation for this feature (starting with the content of this section) now lives in TracDev/CacheManager.
    29 
    3034
    3135 * '''Creating a cached attribute''' is done by defining a retrieval function and decorating it with the '''`@cached_value`''' decorator. For example, for the wiki page names:
     
    6064 * The `cache` table is read the first time a cached attribute is accessed during a request. This avoids slowing down requests that don't touch cached attributes, like requests for static content for example.
    6165
    62 
    6366-----
    6467''The sections below are kept as documentation of the implementation process.''
    6568
    6669== Idea 1: The `CacheManager` == #CacheManager
     70
    6771This idea introduces a centralized cache manager component that manages cached data, retrieval from the database and invalidation of cached data. The assumption is that it doesn't make sense to retrieve data to be cached from the database more than once per HTTP request.
    6872
     
    100104Comments and improvements are welcome. If this approach sounds reasonable, I'd like to do a prototype implementation and apply it to a few locations (the wiki page cache and ticket fields).
    101105
    102 === Prototype implementation ===
     106=== Prototype implementation
     107
    103108A prototype implementation of this approach is available in [attachment:cache-manager-r7941.patch]. It implements the cache manager and has been applied to the wiki page, InterMapTxt and ticket field caches.
    104109
    105110 * Creating a cached attribute is extremely simple: declare a retrieval function with the desired name of the attribute and apply the `@cached_value` or `@cached` decorator. For example, for the wiki pages:
    106 {{{
    107 #!python
     111{{{#!python
    108112@cached_value('wiki.WikiSystem.pages')
    109113def pages(self, db):
     
    115119
    116120 * Invalidating a cached attribute is done by deleting the attribute:
    117 {{{
    118 #!python
     121{{{#!python
    119122def wiki_page_added(self, page):
    120123    if not self.has_page(page.name):
     
    129132
    130133 * To test the patch, the table `cache` must be created by hand in the database, as follows (for SQLite):
    131 {{{
    132 #!sql
     134{{{#!sql
    133135CREATE TABLE cache (
    134     key text PRIMARY KEY,
    135     generation int
     136    key TEXT PRIMARY KEY,
     137    generation INT
    136138);
    137139}}}
     
    139141 * Two new `trac-admin` commands allow listing the `cache` table and invalidating one or more caches. They are intended for debugging purposes, and should probably be removed if the proposal is applied to trunk.
    140142
    141 === Discussion ===
     143=== Discussion
     144
    142145'''Comments and testing are very welcome'''. The implementation is quite complete, except for the missing database upgrade module. I have only tested with several concurrent `tracd` instances so far.
    143146
    144 ==== cboos: feedback ====
     147==== cboos: feedback
     148
    145149It's really nice, I like it a lot!
    146150
     
    154158I've also added more documentation to the decorators and changed the order of the definitions in a top-down way (first the decorators, then the descriptors, ending with the proxy class), as I think it's easier to understand that way.
    155159
    156 ==== rblank: ====
     160==== rblank:
     161
    157162Thanks for the improvement ideas, I'll integrate them shortly.
    158163 - I'm not sure the DB race condition you describe can actually happen. At least with SQLite, issuing a `SELECT` sets the lock state to `SHARED`, which disallows writes, so it should not be possible to increase the generation between the data retrieval and fetching the generation. I don't know how this behaves with other databases, though. Maybe it's just safer to fetch the generation first.
     
    162167 - If the keys are auto-generated, the decorators don't need any arguments anymore. This allows simplifying them even more by dropping the `cached()` and `cached_value()` functions, and calling the descriptor classes `cached` and `cached_value` directly.
    163168
    164 ==== cboos: ====
     169==== cboos:
     170
    165171 - SELECT statements don't start a transaction in PySqlite, as opposed to the other DML statements. So in my understanding, each retrieval is "atomic" and I think there can indeed be a race condition between the SELECT(s) done for data retrieval and the SELECT for fetching the generation.
    166172   As this picked my curiosity, I tried to see how multiple SELECTs could be done within a single transaction, and this is indeed possible, but a bit heavyweight: see e.g. pysqlite:IdRange, look for `def get_id_range`. So I think it's better to simply cope with the race.
     
    170176 - great, I didn't know one can do that
    171177
    172 ==== cboos: !CacheManager used in non-Component classes ====
     178==== cboos: !CacheManager used in non-Component classes
     179
    173180When thinking about how to use the !CacheManager for getting the `youngest_rev` information from !CachedRepository, I saw two additional problems:
    174181 - we can't use the decorators here, as the !CachedRepository is not a Component (and shouldn't be, as we may have many instances per environment)
     
    177184
    178185So we need to access the !CacheManager directly, using something like:
    179 {{{
    180 #!python
     186{{{#!python
    181187    @property
    182188    def metadata(self):
     
    188194Do you see a better way to do it?
    189195
    190 ==== rblank: ====
     196==== rblank:
     197
    191198Yes, by instantiating `CacheProxy` in the constructor and storing it as an instance attribute. This gives it the same interface as if `@cached` was used.
    192 {{{
    193 #!python
     199{{{#!python
    194200    self.metadata = CacheProxy('CachedRepository.metadata:' + self.name, self.get_metadata, env)
    195201}}}
    196202This does indeed require `env`, and changing that will make the `CachedRepository` unit tests a bit more complicated :-/
    197203
    198 ==== rblank: Update with feedback ====
     204==== rblank: Update with feedback
     205
    199206The attachment:cache-manager-r7989.patch is an updated patch which should take into account all corner cases described above. Cache keys are now auto-generated from the module, class and attribute name. I have also added the database upgrade code, so the `db_version` is now 22.
    200207
    201208Are there any other issues that should be considered? If not, the next step would be to plan the integration into trunk. Are there any special considerations when upgrading the database version? What else (other than committing) must be done?
    202209
    203 ==== cboos: feedback ====
     210==== cboos: feedback
     211
    204212 - Last round of feedback:
    205213   - The API documentation should also mention that `cached` and `cached_value`
     
    225233 - We could also think about adding some tests for this, though that might be more involved.
    226234
    227 ==== rblank: ====
     235==== rblank:
     236
    228237 - Replies to last round of feedback:
    229238   - Will do.
    230239   - That's what I tried first, but the error due to the `INSERT` rolled back the whole transaction. I'll have to find a way to do this in a single statement.
    231240
    232 ==== cboos: ====
     241==== cboos:
     242
    233243Hm right, that can be problematic. So what about this:
    234 {{{
    235 #!python
     244{{{#!python
    236245    cursor.execute("SELECT generation FROM cache WHERE key=%s",
    237246                   (key,))
     
    247256                           "WHERE key=%s", (key,))
    248257}}}
    249 If we were in a transaction, then I suppose the SELECT/INSERT
    250 sequence can't fail. Conversely, if it fails, then we were
    251 ''not'' in a transaction, and we can follow-up with an UPDATE
    252 to recover from the failed INSERT.
    253 
    254 ==== rblank: Alternative for atomic UPSERT ====
     258
     259If we were in a transaction, then I suppose the SELECT/INSERT sequence can't fail. Conversely, if it fails, then we were ''not'' in a transaction, and we can follow-up with an UPDATE to recover from the failed INSERT.
     260
     261==== rblank: Alternative for atomic UPSERT
     262
    255263That could work, yes. How about this:
    256 {{{
    257 #!python
     264{{{#!python
    258265    cursor.execute("UPDATE cache SET generation=generation+1 "
    259266                   "WHERE key=%s", (key,))
     
    269276transaction if no rows are altered. We may have to use a dummy row that is always updated in addition to the desired row.
    270277
    271 ==== cboos: ====
     278==== cboos:
     279
    272280Looks great! I don't think we can have something any simpler, in particular the UPDATE doesn't seem to return the number of modified rows for all backends (at least, that doesn't seem to be possible with SQLite3 and Pysqlite).
    273281
    274 ==== rblank: Updated patch ====
     282==== rblank: Updated patch
     283
    275284The attachment:cache-manager-r7992.patch improves the docstrings for the decorators and the proxy, and makes invalidation atomic. I'll now ask for feedback on trac-dev.
    276285
    277 
    278 == Idea 2: Cache control ==
     286== Idea 2: Cache control
    279287
    280288I'm currently thinking about the following solution.
    281289
    282 Each time a cache needs to be invalidated (i.e. in the current situations where we call `config.touch()`), we would instead call `env.cache_invalidate(cache_key)`, where `cache_key` is
    283 some unique key identifying that cache (e.g. "InterMapTxt" or "repository-''reponame''" for the MultipleRepositorySupport/Cache). This call will atomically increment some ''generation'' value associated to the key, in the db (that might be tricky - select for update for Pgsql, explicit transaction for Pysqlite). A simple `create table cachecontrol (key text, generation int)` should be enough.
     290Each time a cache needs to be invalidated (i.e. in the current situations where we call `config.touch()`), we would instead call `env.cache_invalidate(cache_key)`, where `cache_key` is some unique key identifying that cache (e.g. "InterMapTxt" or "repository-''reponame''" for the MultipleRepositorySupport/Cache). This call will atomically increment some ''generation'' value associated to the key, in the db (that might be tricky - select for update for Pgsql, explicit transaction for Pysqlite). A simple `create table cachecontrol (key text, generation int)` should be enough.
    284291
    285292At periodic times, e.g. in `open_environment`, we would call `env.cache_update()`. That will do a `select * from cachecontrol`. The results are stored besides the previously known latest values, therefore we can quickly see which caches need a refresh.
     
    287294Whenever a Component has to fetch a value from the cache, it will first call `env.cache_is_valid(cache_key)`. If the result is true, it can retrieve values from the cache. If not, the cache has to be updated first. Once the cache is refreshed, the component calls `env.cache_validate(cache_key)`.
    288295
    289 
    290 === Example: InterMapTxt cache ===
     296=== Example: InterMapTxt cache
     297
    291298For convenience, if a Component only manages one cache (the common case), it can pass `self` instead of a string key and its class name will be used.
    292299
    293 Only the code changes for trac/env.py and trac/wiki/interwiki.py are roughly implemented (i.e. not tested yet - just to illustrate the above).
     300Only the code changes for trac/env.py and trac/wiki/interwiki.py are roughly implemented. Not tested yet and just to illustrate the above.
    294301
    295302See attachment:cache_control-r7933.diff.
    296303For testing, manual creation of a cache_control table is needed:
    297 {{{
     304{{{#!sql
    298305CREATE TABLE cache_control( key text primary key, generation int);
    299306}}}
     
    307314That concludes my initial approach to the problem. Now let's take into account what was proposed in idea 1...
    308315
    309 === Discussion ===
     316=== Discussion
     317
    310318While the two approaches are quite close in spirit, there are a few differences.
    311319
     
    319327
    320328''Indeed, the basic idea is the same. My goal was to push as much of the logic into the `CacheManager` as possible, so that cache users would only have two functionalities: get the data (this could even be hidden by using a `property`-like descriptor) and invalidate the cache. There should be no need for cache users to "remember to first check if the cache is valid, then ...": this logic is common to all cache users, and can be integrated into the cache manager.''
    321 
    322 ''I'll get started on a prototype implementation as well, to see how simple I can make it for the cache user.''