Context Navigation

Modify ↓

#13634 closed defect (fixed)

cache invalidation for Environment::get_known_users can cause DB temp space exhaustion with MySQL backend

Reported by:	Klaus Espenlaub <klaus.espenlaub@…>	Owned by:	Jun Omae
Priority:	normal	Milestone:	1.6.1
Component:	database backend	Version:	1.4.3
Severity:	normal	Keywords:	mysql temporary table known_users disk full
Cc:	klaus.espenlaub@…	Branch:
Release Notes:	Use `sorted()` instead of `ORDER BY session.sid` in order to avoid filesort caused by indexing only a prefix of column values on MySQL.
API Changes:
Internal Changes:

Description

On larger Trac deployments (ours has several backend servers, each with multiple worker processes) with frequent accesses the cache invalidation logic essentially implements a trigger for a "thundering herd": when the generation counter is increased then all worker processes will rather quickly run the query in Environment::_known_users. Looks all normal and harmless on first sight, but:

MySQL creates a temporary table each for such SELECT DISTINCT ... ORDER BY ... queries. What makes it worse is that the query has columns of type mediumtext and this forces the temporary table to go straight to disk. As long as the DB server CPU, IO and memory load is low this doesn't cause noticeable issues. With our approx. 200000 known users results in a temp table of about 8MB (taking about 2 seconds for the query which isn't too annoying because almost always it is cached). If this gets multiplied by ~35 (the number of Trac WSGI workers in our setup for a busy situation) then it still works just fine, just causes some slowdown due to IO. When there is other activity (such as viewing source with 'blame' etc.) then the refreshing of the known users can get to DB connections timing out.

That's the "point of no return": the query wasn't finished yet but the worker will issue another one. This growing avalanche of queries (all limited by IO performance and needing 8MB of temp table space each) will exhaust the disk space of the DB server at the maximum IO performance, using all CPU and memory resources on the way too. No wonder with about 1500 such queries running simultaneously at the end (they build up over 45 to 60 minutes in our case).

Once the disk space is exhausted the MySQL server goes into "offline mode", needing a server restart (until that happens Trac is obviously down). After the restart everything is perfectly working again. Temporary tables are automatically removed, therefore the disk usage is back to normal.

Proposed solution: remove the DISTINCT qualifier from this query. Due to the design of the session and session_attribute tables I can't see how it could ever produce non-unique results.

Reference: MySQL 8.0 documentation about internal use of temporary tables

Attachments (4)

explain_analyze_select_distinct_order_by.txt (7.0 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 23 months ago.: EXPLAIN ANALYZE of the statement
explain_select_distinct_order_by.txt (2.2 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 23 months ago.: EXPLAIN of the statement
explain_analyze_select_order_by.txt (5.6 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 23 months ago.: EXPLAIN ANALYZE of the statement without DISTINCT
explain_select_order_by.txt (2.1 KB ) - added by Klaus Espenlaub <klaus.espenlaub@…> 23 months ago.: EXPLAIN of the statement without DISTINCT

Download all attachments as: .zip

Change History (15)

comment:1 by Jun Omae, 23 months ago

Component:	general → database backend
Owner:	set to Jun Omae
Status:	new → assigned

Please post results of explain analyze with the following queries on your database.

EXPLAIN ANALYZE
SELECT DISTINCT s.sid, n.value, e.value
FROM session AS s
 LEFT JOIN session_attribute AS n ON (n.sid=s.sid
  AND n.authenticated=1 AND n.name = 'name')
 LEFT JOIN session_attribute AS e ON (e.sid=s.sid
  AND e.authenticated=1 AND e.name = 'email')
WHERE s.authenticated=1 ORDER BY s.sid
\G

EXPLAIN ANALYZE
SELECT s.sid, n.value, e.value
FROM session AS s
 LEFT JOIN session_attribute AS n ON (n.sid=s.sid
  AND n.authenticated=1 AND n.name = 'name')
 LEFT JOIN session_attribute AS e ON (e.sid=s.sid
  AND e.authenticated=1 AND e.name = 'email')
WHERE s.authenticated=1 ORDER BY s.sid
\G

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Attachment:	explain_analyze_select_distinct_order_by.txt added

EXPLAIN ANALYZE of the statement

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Attachment:	explain_select_distinct_order_by.txt added

EXPLAIN of the statement

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Attachment:	explain_analyze_select_order_by.txt added

EXPLAIN ANALYZE of the statement without DISTINCT

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Attachment:	explain_select_order_by.txt added

EXPLAIN of the statement without DISTINCT

comment:2 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Attached the requested information plus a bit more variation. This is with MySQL Database Service in the Oracle Cloud (not that I have any reason to believe that it would be significantly different to running MySQL Community Server yourself), which reports itself as 8.0.32-u3-cloud.

comment:3 by Jun Omae, 23 months ago

Milestone:	→ 1.6.1

Thanks for the quick response. Your proposed fix seems to be good.

comment:4 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

With one big catch: I thought that this change will get our Trac deployment stable again. Which looked good for a while (made the change in our production environment over a day ago), but it's not solving the issue fully. The rate of temp disk table creation has decreased but not as much as I would've expected.

comment:5 by Jun Omae, 23 months ago

Could you please check performance schema and share the queries which are using tmp disk table if the performance schema is enabled in your database?

SELECT schema_name, digest_text, count_star AS n,
       sum_created_tmp_disk_tables AS tmp_disk_tables,
       sum_created_tmp_tables AS tmp_tables
  FROM performance_schema.events_statements_summary_by_digest
 WHERE sum_created_tmp_disk_tables > 0
\G

comment:6 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Ouch… guess we both saw the "using filesort" and didn't immediate realize what it means: the sorting is done on disk because it doesn't fit into the default sort buffer size of 256K. Thinking about what Trac really needs I decided to remove the ORDER by s.sid part of the query. Checked all uses of Environment::get_known_users and found nothing which depended on it being sorted.

Bumping the sort buffer size is in principle also possible but would need a bit more effort (reconfiguring the DB server in the cloud, …). Something for when things are stable again and I can access this stuff myself.

The remaining queries needing temp disk tables are very rarely used (less than once every 5 minutes). The most frequent one in our setup is the one with digest

SELECT DISTINCTROW `rev` , ( CASE WHEN PATH IN (...) THEN PATH WHEN PATH LIKE ? ESCAPE ? THEN ? END ) AS PATH FROM `node_change` WHERE `repos` = ? AND `rev` >= ? AND `rev` <= ? AND ( PATH IN (...) OR PATH LIKE ? ESCAPE ? )

which appears to be the one in CachedRepository::_get_changed_revs. No immediate idea how it's triggered (couldn't spot any obvious callers) but it's the only candidate.

comment:7 by Jun Omae, 23 months ago

Removing the ORDER BY s.sid leads incompatibility with docstring of Environment.get_known_users. The docstring describes that ordered alpha-numerically by username at source:/tags/trac-1.6/trac/env.py@:708#L701 .

Instead, we could use sorted() / list.sort() like the following:

trac/env.py

diff --git a/trac/env.py b/trac/env.py
index 043ce7f5b..a76525e19 100644

                class Environment(Component, ComponentManager):
     @cached
     def _known_users(self):
+        return self.db_query("""
+                SELECT DISTINCT s.sid, n.value, e.value
+        # Use sorted() instead of "ORDER BY s.sid" in order to avoid filesort
+        # caused by indexing only a prefix of column values on MySQL.
+        users = self.db_query("""
+                SELECT s.sid, n.value, e.value
                 FROM session AS s
                  LEFT JOIN session_attribute AS n ON (n.sid=s.sid
                   AND n.authenticated=1 AND n.name = 'name')
                  LEFT JOIN session_attribute AS e ON (e.sid=s.sid
                   AND e.authenticated=1 AND e.name = 'email')
                 WHERE s.authenticated=1 ORDER BY s.sid
+                WHERE s.authenticated=1
         """)
+        return sorted(users, key=lambda u: u[0])
     @cached
     def _known_users_dict(self):

trac/web/session.py

diff --git a/trac/web/session.py b/trac/web/session.py
index 14dbde704..64b2d6466 100644

                class SessionAdmin(Component):
         sids = {self._split_sid(sid)
                 for sid in sids
                 if sid not in ('anonymous', 'authenticated', '*')}
+        # Use sort() instead of "ORDER BY s.sid, s.authenticated" in order to
+        # avoid filesort caused by indexing only a prefix of column values on
+        # MySQL.
         rows = self.env.db_query("""
             SELECT DISTINCT s.sid, s.authenticated, s.last_visit,
                             n.value, e.value, h.value
+            SELECT s.sid, s.authenticated, s.last_visit, n.value, e.value,
+                   h.value
             FROM session AS s
               LEFT JOIN session_attribute AS n
                 ON (n.sid=s.sid AND n.authenticated=s.authenticated
-…
+               class SessionAdmin(Component):
               LEFT JOIN session_attribute AS h
                 ON (h.sid=s.sid AND h.authenticated=s.authenticated
                     AND h.name='default_handler')
-            ORDER BY s.sid, s.authenticated
             """)
+        rows.sort(key=lambda u: (u[0], u[1]))
         for sid, authenticated, last_visit, name, email, handler in rows:
             if all_anon and not authenticated or all_auth and authenticated \
                     or (sid, authenticated) in sids:

comment:8 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Yes, might be the clean solution (normally I'd be all for letting the DB do the sorting, but if that ends up "on disk" it won't be fast when the filesystem cache can't cover it all). I went for the easy solution because my priority is getting our Trac environment stable again. So far it's running for 3 days. Next week

We had MySQL in offline state on average every 2 days for the last 2 weeks. The start of those outages correlate somewhat but not really well with switching from mod_wsgi to gunicorn for running Trac - that successfully addressed the WSGI/Apache worker thread depletion issue which increasingly haunted us in the last 10 years in two installs (in the older we fully automated killing suspect workers to keep things going). I suspect that it's simply the slowly growing user count which pushed us into this "DB explosion" situation related to temp table usage in combination with high load spikes.

comment:9 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

Oops, sent before finishing my edit. "Next week… we can hopefully declare it stable again".

comment:10 by Jun Omae, 23 months ago

Release Notes:	modified (diff)
Resolution:	→ fixed
Status:	assigned → closed

Pushed the changes in [17760] and merged in [17761].

in reply to: description comment:11 by Jun Omae, 23 months ago

With our approx. 200000 known users results …

If session and session_attribute tables have too many records, I'd suggest disabling purging old anonymous sessions to set 0 to [trac] anonymous_session_lifetime option, and purging the anonymous sessions in periodic job. See [trac] anonymous_session_lifetime and #12953.

Last edited 23 months ago by Jun Omae (previous) (diff)

Modify Ticket

Change Properties

Summary:
Description:	On larger Trac deployments (ours has several backend servers, each with multiple worker processes) with frequent accesses the cache invalidation logic essentially implements a trigger for a "thundering herd": when the generation counter is increased then all worker processes will rather quickly run the query in Environment::_known_users. Looks all normal and harmless on first sight, but: MySQL creates a temporary table each for such `SELECT DISTINCT ... ORDER BY ...` queries. What makes it worse is that the query has columns of type mediumtext and this forces the temporary table to go straight to disk. As long as the DB server CPU, IO and memory load is low this doesn't cause noticeable issues. With our approx. 200000 known users results in a temp table of about 8MB (taking about 2 seconds for the query which isn't too annoying because almost always it is cached). If this gets multiplied by ~35 (the number of Trac WSGI workers in our setup for a busy situation) then it still works just fine, just causes some slowdown due to IO. When there is other activity (such as viewing source with 'blame' etc.) then the refreshing of the known users can get to DB connections timing out. That's the "point of no return": the query wasn't finished yet but the worker will issue another one. This growing avalanche of queries (all limited by IO performance and needing 8MB of temp table space each) will exhaust the disk space of the DB server at the maximum IO performance, using all CPU and memory resources on the way too. No wonder with about 1500 such queries running simultaneously at the end (they build up over 45 to 60 minutes in our case). Once the disk space is exhausted the MySQL server goes into "offline mode", needing a server restart (until that happens Trac is obviously down). After the restart everything is perfectly working again. Temporary tables are automatically removed, therefore the disk usage is back to normal. Proposed solution: remove the `DISTINCT` qualifier from this query. Due to the design of the `session` and `session_attribute` tables I can't see how it could ever produce non-unique results. Reference: [https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html MySQL 8.0 documentation about internal use of temporary tables] You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:	Use `sorted()` instead of `ORDER BY session.sid` in order to avoid filesort caused by indexing only a prefix of column values on MySQL.
API Changes:
Internal Changes:

Action

leave as closed The owner will remain Jun Omae.

reopen The resolution will be deleted. Next status will be 'reopened'.

change ownership to The owner will be changed from Jun Omae to the specified user.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats:

Context Navigation

#13634 closed defect (fixed)

cache invalidation for Environment::get_known_users can cause DB temp space exhaustion with MySQL backend

Description

Attachments (4)

Change History (15)

comment:1 by Jun Omae, 23 months ago

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

comment:2 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

comment:3 by Jun Omae, 23 months ago

comment:4 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

comment:5 by Jun Omae, 23 months ago

comment:6 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

comment:7 by Jun Omae, 23 months ago

trac/env.py

trac/web/session.py

comment:8 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

comment:9 by Klaus Espenlaub <klaus.espenlaub@…>, 23 months ago

comment:10 by Jun Omae, 23 months ago

in reply to: description comment:11 by Jun Omae, 23 months ago

Modify Ticket

Add Comment

by anonymous

Download in other formats: