Edgewall Software
Modify

Opened 5 months ago

Closed 5 months ago

Last modified 5 months ago

#13634 closed defect (fixed)

cache invalidation for Environment::get_known_users can cause DB temp space exhaustion with MySQL backend

Reported by: Klaus Espenlaub <klaus.espenlaub@…> Owned by: Jun Omae
Priority: normal Milestone: 1.6.1
Component: database backend Version: 1.4.3
Severity: normal Keywords: mysql temporary table known_users disk full
Cc: klaus.espenlaub@… Branch:
Release Notes:

Use sorted() instead of ORDER BY session.sid in order to avoid filesort caused by indexing only a prefix of column values on MySQL.

API Changes:
Internal Changes:

Description

On larger Trac deployments (ours has several backend servers, each with multiple worker processes) with frequent accesses the cache invalidation logic essentially implements a trigger for a "thundering herd": when the generation counter is increased then all worker processes will rather quickly run the query in Environment::_known_users. Looks all normal and harmless on first sight, but:

MySQL creates a temporary table each for such SELECT DISTINCT ... ORDER BY ... queries. What makes it worse is that the query has columns of type mediumtext and this forces the temporary table to go straight to disk. As long as the DB server CPU, IO and memory load is low this doesn't cause noticeable issues. With our approx. 200000 known users results in a temp table of about 8MB (taking about 2 seconds for the query which isn't too annoying because almost always it is cached). If this gets multiplied by ~35 (the number of Trac WSGI workers in our setup for a busy situation) then it still works just fine, just causes some slowdown due to IO. When there is other activity (such as viewing source with 'blame' etc.) then the refreshing of the known users can get to DB connections timing out.

That's the "point of no return": the query wasn't finished yet but the worker will issue another one. This growing avalanche of queries (all limited by IO performance and needing 8MB of temp table space each) will exhaust the disk space of the DB server at the maximum IO performance, using all CPU and memory resources on the way too. No wonder with about 1500 such queries running simultaneously at the end (they build up over 45 to 60 minutes in our case).

Once the disk space is exhausted the MySQL server goes into "offline mode", needing a server restart (until that happens Trac is obviously down). After the restart everything is perfectly working again. Temporary tables are automatically removed, therefore the disk usage is back to normal.

Proposed solution: remove the DISTINCT qualifier from this query. Due to the design of the session and session_attribute tables I can't see how it could ever produce non-unique results.

Reference: MySQL 8.0 documentation about internal use of temporary tables

Attachments (4)

explain_analyze_select_distinct_order_by.txt (7.0 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 5 months ago.
EXPLAIN ANALYZE of the statement
explain_select_distinct_order_by.txt (2.2 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 5 months ago.
EXPLAIN of the statement
explain_analyze_select_order_by.txt (5.6 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 5 months ago.
EXPLAIN ANALYZE of the statement without DISTINCT
explain_select_order_by.txt (2.1 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 5 months ago.
EXPLAIN of the statement without DISTINCT

Download all attachments as: .zip

Change History (15)

comment:1 by Jun Omae, 5 months ago

Component: generaldatabase backend
Owner: set to Jun Omae
Status: newassigned

Please post results of explain analyze with the following queries on your database.

EXPLAIN ANALYZE
SELECT DISTINCT s.sid, n.value, e.value
FROM session AS s
 LEFT JOIN session_attribute AS n ON (n.sid=s.sid
  AND n.authenticated=1 AND n.name = 'name')
 LEFT JOIN session_attribute AS e ON (e.sid=s.sid
  AND e.authenticated=1 AND e.name = 'email')
WHERE s.authenticated=1 ORDER BY s.sid
\G
EXPLAIN ANALYZE
SELECT s.sid, n.value, e.value
FROM session AS s
 LEFT JOIN session_attribute AS n ON (n.sid=s.sid
  AND n.authenticated=1 AND n.name = 'name')
 LEFT JOIN session_attribute AS e ON (e.sid=s.sid
  AND e.authenticated=1 AND e.name = 'email')
WHERE s.authenticated=1 ORDER BY s.sid
\G

by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

EXPLAIN ANALYZE of the statement

by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

EXPLAIN of the statement

by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

EXPLAIN ANALYZE of the statement without DISTINCT

by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

Attachment: explain_select_order_by.txt added

EXPLAIN of the statement without DISTINCT

comment:2 by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

Attached the requested information plus a bit more variation. This is with MySQL Database Service in the Oracle Cloud (not that I have any reason to believe that it would be significantly different to running MySQL Community Server yourself), which reports itself as 8.0.32-u3-cloud.

comment:3 by Jun Omae, 5 months ago

Milestone: 1.6.1

Thanks for the quick response. Your proposed fix seems to be good.

comment:4 by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

With one big catch: I thought that this change will get our Trac deployment stable again. Which looked good for a while (made the change in our production environment over a day ago), but it's not solving the issue fully. The rate of temp disk table creation has decreased but not as much as I would've expected.

comment:5 by Jun Omae, 5 months ago

Could you please check performance schema and share the queries which are using tmp disk table if the performance schema is enabled in your database?

SELECT schema_name, digest_text, count_star AS n,
       sum_created_tmp_disk_tables AS tmp_disk_tables,
       sum_created_tmp_tables AS tmp_tables
  FROM performance_schema.events_statements_summary_by_digest
 WHERE sum_created_tmp_disk_tables > 0
\G

comment:6 by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

Ouch… guess we both saw the "using filesort" and didn't immediate realize what it means: the sorting is done on disk because it doesn't fit into the default sort buffer size of 256K. Thinking about what Trac really needs I decided to remove the ORDER by s.sid part of the query. Checked all uses of Environment::get_known_users and found nothing which depended on it being sorted.

Bumping the sort buffer size is in principle also possible but would need a bit more effort (reconfiguring the DB server in the cloud, …). Something for when things are stable again and I can access this stuff myself.

The remaining queries needing temp disk tables are very rarely used (less than once every 5 minutes). The most frequent one in our setup is the one with digest

SELECT DISTINCTROW `rev` , ( CASE WHEN PATH IN (...) THEN PATH WHEN PATH LIKE ? ESCAPE ? THEN ? END ) AS PATH FROM `node_change` WHERE `repos` = ? AND `rev` >= ? AND `rev` <= ? AND ( PATH IN (...) OR PATH LIKE ? ESCAPE ? )

which appears to be the one in CachedRepository::_get_changed_revs. No immediate idea how it's triggered (couldn't spot any obvious callers) but it's the only candidate.

comment:7 by Jun Omae, 5 months ago

Removing the ORDER BY s.sid leads incompatibility with docstring of Environment.get_known_users. The docstring describes that ordered alpha-numerically by username at source:/tags/trac-1.6/trac/env.py@:708#L701.

Instead, we could use sorted() / list.sort() like the following:

  • trac/env.py

    diff --git a/trac/env.py b/trac/env.py
    index 043ce7f5b..a76525e19 100644
    a b class Environment(Component, ComponentManager):  
    715715
    716716    @cached
    717717    def _known_users(self):
    718         return self.db_query("""
    719                 SELECT DISTINCT s.sid, n.value, e.value
     718        # Use sorted() instead of "ORDER BY s.sid" in order to avoid filesort
     719        # caused by indexing only a prefix of column values on MySQL.
     720        users = self.db_query("""
     721                SELECT s.sid, n.value, e.value
    720722                FROM session AS s
    721723                 LEFT JOIN session_attribute AS n ON (n.sid=s.sid
    722724                  AND n.authenticated=1 AND n.name = 'name')
    723725                 LEFT JOIN session_attribute AS e ON (e.sid=s.sid
    724726                  AND e.authenticated=1 AND e.name = 'email')
    725                 WHERE s.authenticated=1 ORDER BY s.sid
     727                WHERE s.authenticated=1
    726728        """)
     729        return sorted(users, key=lambda u: u[0])
    727730
    728731    @cached
    729732    def _known_users_dict(self):
  • trac/web/session.py

    diff --git a/trac/web/session.py b/trac/web/session.py
    index 14dbde704..64b2d6466 100644
    a b class SessionAdmin(Component):  
    463463        sids = {self._split_sid(sid)
    464464                for sid in sids
    465465                if sid not in ('anonymous', 'authenticated', '*')}
     466        # Use sort() instead of "ORDER BY s.sid, s.authenticated" in order to
     467        # avoid filesort caused by indexing only a prefix of column values on
     468        # MySQL.
    466469        rows = self.env.db_query("""
    467             SELECT DISTINCT s.sid, s.authenticated, s.last_visit,
    468                             n.value, e.value, h.value
     470            SELECT s.sid, s.authenticated, s.last_visit, n.value, e.value,
     471                   h.value
    469472            FROM session AS s
    470473              LEFT JOIN session_attribute AS n
    471474                ON (n.sid=s.sid AND n.authenticated=s.authenticated
    class SessionAdmin(Component):  
    476479              LEFT JOIN session_attribute AS h
    477480                ON (h.sid=s.sid AND h.authenticated=s.authenticated
    478481                    AND h.name='default_handler')
    479             ORDER BY s.sid, s.authenticated
    480482            """)
     483        rows.sort(key=lambda u: (u[0], u[1]))
    481484        for sid, authenticated, last_visit, name, email, handler in rows:
    482485            if all_anon and not authenticated or all_auth and authenticated \
    483486                    or (sid, authenticated) in sids:

comment:8 by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

Yes, might be the clean solution (normally I'd be all for letting the DB do the sorting, but if that ends up "on disk" it won't be fast when the filesystem cache can't cover it all). I went for the easy solution because my priority is getting our Trac environment stable again. So far it's running for 3 days. Next week

We had MySQL in offline state on average every 2 days for the last 2 weeks. The start of those outages correlate somewhat but not really well with switching from mod_wsgi to gunicorn for running Trac - that successfully addressed the WSGI/Apache worker thread depletion issue which increasingly haunted us in the last 10 years in two installs (in the older we fully automated killing suspect workers to keep things going). I suspect that it's simply the slowly growing user count which pushed us into this "DB explosion" situation related to temp table usage in combination with high load spikes.

comment:9 by Klaus Espenlaub <klaus.espenlaub@…>, 5 months ago

Oops, sent before finishing my edit. "Next week… we can hopefully declare it stable again".

comment:10 by Jun Omae, 5 months ago

Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Pushed the changes in [17760] and merged in [17761].

in reply to:  description comment:11 by Jun Omae, 5 months ago

With our approx. 200000 known users results …

If session and session_attribute tables have too many records, I'd suggest disabling purging old anonymous sessions to set 0 to [trac] anonymous_session_lifetime option, and purging the anonymous sessions in periodic job. See [trac] anonymous_session_lifetime and #12953.

Last edited 5 months ago by Jun Omae (previous) (diff)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jun Omae.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jun Omae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.