Edgewall Software
Modify

Opened 17 years ago

Last modified 6 months ago

#4245 new defect

[PATCH] Tickets are slow with large amount of users and permissions

Reported by: ants.aasma@… Owned by:
Priority: high Milestone: next-stable-1.6.x
Component: general Version: devel
Severity: major Keywords: patch performance slow restrict_owner get_users_with_permission
Cc: zach@…, leho@…, ismael@…, mmitar@…, osimons, peter@…, verm@…, dsd@…, Ryan J Ollos Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Ryan J Ollos)

TicketSystem.get_ticket_fields() implements an extremely inefficient algorithm to find users that have a certain permission (O(n2users * perms_per_useravg)). I have added a more efficient function to PermissionSystem that enables querying users having a specified permission directly. It correctly resolves group memberships and hierarchical permissions.

Attachments (8)

trac.get_users_with_permission.trunk.patch (4.3 KB ) - added by ants.aasma@… 17 years ago.
Patch
trac.get_users_with_permission.v2.patch (4.0 KB ) - added by ants.aasma@… 17 years ago.
Second version of the patch. Using the env.get_known_users() interface to get users.
get_users_with_permission.groupprovider_fix.patch (2.5 KB ) - added by ants.aasma@… 17 years ago.
Adds support for the implicit group providers
groupprovider_fix.diff_against_prev_impl.patch (1.4 KB ) - added by ants.aasma@… 17 years ago.
Diff against the previous implementation to highlight the changes.
perm-addcache.patch (4.2 KB ) - added by Chris Everest <ceverest@…> 14 years ago.
Add caching logic to perms.py
T4245-CachePermissionsTable.patch (2.2 KB ) - added by Peter Suter 12 years ago.
Use @cached decorator on get_all_permissions()
t4245-trac_cached_perm_rows-012-r10790.diff (1.9 KB ) - added by osimons 12 years ago.
My original caching patch - suspiciously similar, but uses a copy of the list instead.
profile.txt (14.0 KB ) - added by Peter Suter 12 years ago.

Download all attachments as: .zip

Change History (54)

by ants.aasma@…, 17 years ago

Patch

comment:1 by Christopher Lenz, 17 years ago

Milestone: 0.11
Owner: changed from Jonas Borgström to Christopher Lenz

I'll look into this tomorrow. Candidate for being backported to 0.10-stable.

by ants.aasma@…, 17 years ago

Second version of the patch. Using the env.get_known_users() interface to get users.

comment:2 by Christopher Lenz, 17 years ago

Resolution: fixed
Status: newclosed

Slightly modified patch applied in [4625].

comment:3 by Eli Carter, 17 years ago

Milestone: 0.110.11.1
Resolution: fixed
Status: closedreopened

This change did not account for the magic 'authenticated' group, causing the regression reported in #4630. See [5849]. We went back to a very inefficient solution, so we need to improve that.

by ants.aasma@…, 17 years ago

Adds support for the implicit group providers

by ants.aasma@…, 17 years ago

Diff against the previous implementation to highlight the changes.

comment:4 by ants.aasma@…, 17 years ago

Attached a fix that accounts for the group providers. This would be a lot more efficient, if the group providers had an get_group_members method.

This isn't tested, because I don't have a test enviroment set up nor the time to do so. But I figured I'd get the code out there if anyone has.

comment:5 by ants.aasma@…, 17 years ago

Already spotted an oops, line 168 in new should read:

                        implicit_groups[groupname] = set([username])

(missing list brackets from set construction)

comment:6 by Christian Boos, 15 years ago

Keywords: restrict_owner get_users_with_permission added
Priority: normalhigh
Severity: normalmajor

#8034 / #8212 are recent duplicates. Especially #8212 somewhat proving the "back to a very inefficient solution" part of comment:3

I see there is a patch here that has been overlooked (attachment:get_users_with_permission.groupprovider_fix.patch).

comment:7 by Christian Boos, 15 years ago

The above patch fails on the regression test for #4630:

FAIL: Test for regression of http://trac.edgewall.org/ticket/4630 b
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Workspace\src\trac\repos\0.11-stable\trac\ticket\tests\functional.py", line 941, in runTest
    self.assertEqual(users, ['admin', 'user'])
AssertionError: [u'a', u'i', u'd', u'admin', u'user', u'm', u'n'] != ['admin', 'user']

comment:8 by zach@…, 15 years ago

Cc: zach@… added

comment:9 by fabien, 15 years ago

Hi, I also have the same problem with a setup using 1000 users. It's slow when "restrict_owner = true"…

I modified eventually_restrict_owner to not check for permissions (not needed for my config), and now it's much faster, but not as fast as without restrict_owner… Maybe it would help to cache users and permissions ?

comment:10 by Chris Everest <ceverest@…>, 14 years ago

In regard to this ticket I am also seeing significant performance decreases using restrict_owner in combination with the PrivateTicket plugin and a large user base. The main cause of the performance hit is in perms.py.

IPermissionStore.get_user_permissions() is calling 'SELECT username,action FROM permission' against all users in the system. This has caused 1000's of database queries for each page request to display a single ticket.

I have added two levels of caching to reduce this performance issue and provided a patch. Pulling from existing cache logic in DefaultPermissionPolicy.check_permission() I created new cache logic in PermissionSystem.get_users_with_permission() and IPermissionStore.get_user_permissions().

The cache in IPermissionStore.get_user_permissions() is the more effective change as it limits the query 'SELECT username,action FROM permission' to one execution and uses the cached results for each additional check of that routine, per page request.

The cache in PermissionSystem.get_users_with_permission() is not activated as often, but will certianly avoid redundant calls to its subroutines/methods.

In each case, the cache variables are very small and should only survive a single page request, thus uncompromising any security or permissions.

by Chris Everest <ceverest@…>, 14 years ago

Attachment: perm-addcache.patch added

Add caching logic to perms.py

comment:11 by lkraav <leho@…>, 14 years ago

Cc: leho@… added

comment:12 by Remy Blank, 14 years ago

#9184 was closed as a duplicate, and contains a patch to increase the permission cache duration.

comment:13 by Ismael de Esteban <ismael@…>, 14 years ago

#9184 Makes a big improvement when you have many users and restrict_owner option.

We can consider to include that since it has make a big improvement in the instance I use.

comment:14 by Ismael de Esteban <ismael@…>, 14 years ago

Cc: ismael@… added

comment:15 by Mitar, 13 years ago

Cc: mmitar@… added

comment:16 by anonymous, 13 years ago

any solution to this huge problem?

comment:17 by Christian Boos, 13 years ago

No.

(answer designed to be as useful as the question…)

For the longer run, see TracDev/Proposals/EvenFinerGrainedPermissions#Performance.

comment:18 by osimons, 13 years ago

Cc: osimons added

Helping out with a user on IRC, we've done some profiling of requests and looking into what actually takes time on a project with 1800 authenticated users (using the DefaultPermissionStore). In summary - all times should be seen as relative of course:

  • ~2.5 seconds = restrict owner false
  • ~63 seconds = restrict owner true
  • ~14 seconds = restrict owner true, with caching full permissions table by a simple patch that just stores the table after first read and reuses it for any subsequent use (not for production use, but a test to see how much the SQL impacts times in a single process setting)
  • ~40 seconds = restrict owner true, using @cached decorator on the get_all_permissions() output (+ invalidate code when granting or revoking permissions). This obviously re-introduces som SQL calls to handle cache status.

Having 1800 users requires creating 1800 permission caches and checking the permission against each one. It it bound to take time. 40 seconds is of course better than 63 seconds, but still too much work involved. 14 seconds would only be attainable if we used something like self.config.touch() to invalidate the cached permissions table across processes…

Further it may well be more useful to cache the actual list made by eventually_restrict_owner() and store it for each ticket. Then we'd need an IPermissionChangeListener interface so that others can listen in on changes and invalidate caches (#3302).

comment:19 by peter@…, 13 years ago

Cc: peter@… added

Said user on IRC.

in reply to:  18 comment:20 by osimons, 13 years ago

Replying to osimons:

Further it may well be more useful to cache the actual list made by eventually_restrict_owner() and store it for each ticket. Then we'd need an IPermissionChangeListener interface so that others can listen in on changes and invalidate caches (#3302).

No, that won't be good. Users can be added and removed from that list without an actual permission change - new users could register, sessions purged, or a security policy could change which users are approved. Anything really. So, bad idea.

comment:21 by verm, 12 years ago

Cc: verm@… added

I hit this problem tonight.. I spent a couple of hours tracing it back using a 1GB tracefile from python -m trace -t tracd.. It was running 13,000 queries per ticket. This bumped it from taking ~14 seconds per ticket to instantaneous. I'm also running the FlexibleAssignTo plugin which requires restrict_owner to be true in order to work.

If there's a way to make this plugin work without restrict_owner being set then it'll be a great workaround.

comment:22 by verm, 12 years ago

To be clear, I'm not asking for any support for the FlexibleAssignTo plugin I added it in case anyone else was interested in poking around.

comment:23 by Ryan J Ollos <ryano@…>, 12 years ago

Cc: ryano@… added

comment:24 by Remy Blank, 12 years ago

Cc: dsd@… added

#10647 was closed as a duplicate.

comment:25 by Christian Boos, 12 years ago

Keywords: performance added

by Peter Suter, 12 years ago

Use @cached decorator on get_all_permissions()

in reply to:  18 ; comment:26 by Peter Suter, 12 years ago

Replying to osimons:

  • ~2.5 seconds = restrict owner false
  • ~63 seconds = restrict owner true
  • ~40 seconds = restrict owner true, using @cached decorator

I think attachment:T4245-CachePermissionsTable.patch implements the same idea, right?

I tested this on a synthetic Trac environment with 2500 authenticated users, populated in cmd.exe by running this for 15 minutes or so:

for /L %i in (1,1,2500) do (
    trac-admin tracenv session add user%i
    trac-admin tracenv permission add user%i TICKET_MODIFY
)
  • ~1 second = restrict owner false
  • ~22 seconds = restrict owner true
  • ~3 seconds = restrict owner true, using @cached decorator patch

Any idea why this quite a bit more effective in this synthetic case than in your (presumably real world) scenario?

in reply to:  26 ; comment:27 by osimons, 12 years ago

Replying to psuter:

Replying to osimons:

  • ~2.5 seconds = restrict owner false
  • ~63 seconds = restrict owner true
  • ~40 seconds = restrict owner true, using @cached decorator

I think attachment:T4245-CachePermissionsTable.patch implements the same idea, right?

Looks like it. Basically just caching the "SELECT username, action FROM permission" db query as your patch implements.

  • ~1 second = restrict owner false
  • ~22 seconds = restrict owner true
  • ~3 seconds = restrict owner true, using @cached decorator patch

Any idea why this quite a bit more effective in this synthetic case than in your (presumably real world) scenario?

No idea. I can't find the details of the exchange anymore, and have no real knowledge of how the users system was configured, the plugins added, or the data it contained. Your known data is a better benchmark as I see it.

in reply to:  27 comment:28 by peter@…, 12 years ago

Replying to osimons:

Replying to psuter:

Any idea why this quite a bit more effective in this synthetic case than in your (presumably real world) scenario?

No idea. I can't find the details of the exchange anymore, and have no real knowledge of how the users system was configured, the plugins added, or the data it contained. Your known data is a better benchmark as I see it.

Our exchange was on IRC, I still have the same system running, and I can do more testing if seriously required, although if I'm going to do Trac work right now I'd prefer to focus on the Git plugin. Thanks!

comment:29 by Peter Suter, 12 years ago

Hearing about the effect of the patch on real data might be helpful.

Unless anyone can see a problem with the patch, we could also just apply it. I think it will help, it's just not clear how much.

by osimons, 12 years ago

My original caching patch - suspiciously similar, but uses a copy of the list instead.

comment:30 by osimons, 12 years ago

I found my original patch, it is pretty much like yours - only it uses a copy of the result instead of modifying it directly. I suppose that is a precaution that I need not have taken, and may also account for the relative difference in speed between the result sets?

See attachment:t4245-trac_cached_perm_rows-012-r10790.diff

Version 0, edited 12 years ago by osimons (next)

comment:31 by Peter Suter, 12 years ago

Hm, one crucial thing that seems missing there is a change to the repeated query in DefaultPermissionStore.get_user_permissions(). Replace that with something like rows = self.get_all_permissions() and it performs just about the same in my test. (~3 seconds) Without that I don't see an improvement at all. (~22 seconds)

(Tested on trunk.)

comment:32 by osimons, 12 years ago

Goodie. Then I see no reason not to commit what you have?

comment:33 by Peter Suter, 12 years ago

Committed in [11032].

comment:34 by Remy Blank, 12 years ago

Not sure if this would gain much, but you could even simplify the query:

    return [(username, action) for username, action in  
            self.env.db_query("SELECT username, action FROM permission")]

by substituting:

    return self.env.db_query("SELECT username, action FROM permission")

ConnectionWrapper.execute() (which is called by env.db_query()) already returns a list of all rows.

comment:35 by Peter Suter, 12 years ago

attachment:profile.txt contains some a request profiling results. If I'm reading this right:

  1. restrict_owner = true, no caching: ~70 seconds
  2. restrict_owner = true, caching: ~2.2 seconds
  3. restrict_owner = false: ~1 second

In case 1 time is mostly spent querying SQL in get_user_permissions, as we already knew. In both case 2 and 3 most time seems to be spent in Genshi. I'm not sure where the additional ~1.2 seconds in case 2 now come from. Possibly mostly from Genshi processing the 2500 user drop-down options?

Replying to rblank:

Not sure if this would gain much, but you could even simplify the query:

It makes virtually no difference (as this is cached?) but simplifying code is always good. :)

comment:36 by osimons, 12 years ago

Out of curiosity, could you check the status of the Genshi speedups on your test setup? How do the numbers compare depending on speedups availability? Of primary interest is the ~2.2 seconds test, and as you say I also have a feeling much of the difference is added Genshi work.

See comment:7:ticket:10635 - and perhaps we'll see a 10% improvement here too?

by Peter Suter, 12 years ago

Attachment: profile.txt added

in reply to:  36 comment:37 by Peter Suter, 12 years ago

So far I was using Genshi 0.7dev-r1162 (without speedups). I appended a test run with speedups to attachment:profile.txt. At ~2.1 seconds it's only a ~5% improvement.

Last edited 12 years ago by Peter Suter (previous) (diff)

comment:38 by andrew.hardy@…, 11 years ago

Was this problem ever resolved? I seem to have a similar problem with Trac 0.12.

comment:40 by Ryan J Ollos, 10 years ago

Cc: Ryan J Ollos added; ryano@… removed

comment:41 by Ryan J Ollos, 9 years ago

Milestone: next-minor-0.12.xnext-stable-1.0.x

comment:42 by Ryan J Ollos, 9 years ago

Owner: Christopher Lenz removed
Status: reopenednew

comment:43 by figaro, 8 years ago

Keywords: patch added

comment:44 by Ryan J Ollos, 7 years ago

Milestone: next-stable-1.0.xnext-stable-1.2.x

Moved ticket assigned to next-stable-1.0.x since maintenance of 1.0.x is coming to a close. Please move the ticket back if it's critical to fix on 1.0.x.

comment:45 by Ryan J Ollos, 6 years ago

Description: modified (diff)

comment:46 by Ryan J Ollos, 4 years ago

Milestone: next-stable-1.2.xnext-stable-1.4.x

comment:47 by Ryan J Ollos, 6 months ago

Milestone: next-stable-1.4.xnext-stable-1.6.x

Milestone renamed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
The ticket will be disowned.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.