#5668 closed defect (fixed)
PermissionPolicy implementation detail looks wrong
Reported by: | Owned by: | osimons | |
---|---|---|---|
Priority: | highest | Milestone: | 0.11 |
Component: | general | Version: | devel |
Severity: | normal | Keywords: | permission |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
See source:trunk/trac/perm.py@5808#L446
In case of hash collision (yes I realize it's unlikely) couldn't this return wrong results?
Attachments (2)
Change History (16)
comment:1 by , 17 years ago
Description: | modified (diff) |
---|---|
Keywords: | permission added |
Milestone: | → 0.11.1 |
Summary: | browser:trunk/trac/perm.py#L446 looks wrong → PermissionPolicy implementation detail looks wrong |
comment:2 by , 17 years ago
Priority: | normal → highest |
---|
comment:3 by , 17 years ago
I suppose a string representation of the resource is as distinct as it gets? Seems to work fine.
-
trac/perm.py
521 521 return self._has_permission(action, resource) 522 522 523 523 def _has_permission(self, action, resource): 524 key = (self.username, hash(resource), action) 524 resource_key = resource and "%s:%s:%s" % (resource.realm, 525 resource.id, resource.version) or None 526 key = (self.username, resource_key, action) 525 527 try: 526 528 return self._cache[key] 527 529 except KeyError:
follow-up: 5 comment:4 by , 17 years ago
Not to put too fine a point on it, but "Seems to work fine" isn't very compelling… especially in this case because someone surely said the same for the original (wrong) code.
follow-ups: 6 13 comment:5 by , 17 years ago
Replying to Dave Abrahams <dave@boost-consulting.com>:
Not to put too fine a point on it, but "Seems to work fine" isn't very compelling… especially in this case because someone surely said the same for the original (wrong) code.
Thanks for your words of encouragement.
It was intended as a start of a discussion on alternatives to hashing, or perhaps of what to hash and what to have as full text.
I'm not worried about the distinctiveness of it as it will be as distinct as it gets (permission-side), but more the impact of it on memory use. A repos with many files, deep hierarchies and lots of revisions is going to take up a large chunk of memory by storing the full (unicode) path for each and every revision of a each and every file path as Trac gets used and queried. That could mean each resource in the cache consuming perhaps 250 bytes on average compared to the 2-4 bytes of the integer hash. On my limited test setup I can't test the consequences of this in full - it pretty much works as before. However, testing this by querying a full list of files for each revision on a t.e.o copy may find this unusable.
The full representation is the most distinct way I can see, and of course it works. That does not mean it is the right way - or necessarily that hashing is 'wrong' as you say. Perhaps we should store the realm as string, and just hash the id and version? This needs to be discussed, and there are trade-offs involved. Certainly as high memory usage is alsready a problem.
follow-up: 7 comment:6 by , 17 years ago
Replying to osimons:
The full representation is the most distinct way I can see, and of course it works.
Why not keep a reference to the Resource object in the value, for disambiguation needs? They are probably already referenced from somewhere else and the lifespan of this cache is anyway limited (one request). There would be little additional memory cost - the hash will be kept as the key, the value would be the object plus the decision.
Only in case of collision (quite unlikely, but this is the whole point of the ticket), the value would be a list of (resource, decision)
pairs.
by , 17 years ago
Attachment: | t5668-r6465-perm-cache-key.a.diff added |
---|
Patch saving the resource and double-checking before using the cached decision.
comment:7 by , 17 years ago
Replying to cboos:
Why not keep a reference to the Resource object in the value, for disambiguation needs?
OK. If memory usage is not a significant issue that is the best way. And, it would also likely be a less expensive operation and generally work better as the resource evolves over time. I've attached a patch for review: attachment:t5668-r6465-perm-cache-key.a.diff
As hashing and equality testing is of such great importance for perm checking, the patch also includes test cases for __eq__()
and __hash__()
for the Resource
class. If anything related to this changes later, it should be detected by the tests.
comment:8 by , 17 years ago
Milestone: | 0.11.1 → 0.11 |
---|
Looks good. Invalidating the cache in case of collision is actually even simpler than what I proposed. Still, I think the patch can be improved by avoiding the exception handler, something along the lines of:
... cached = self._cache.get(key) if cached: cache_decision, cache_resource = cached if resource == cache_resource: return cache_decision ...
I think we can target 0.11 for this one.
comment:9 by , 17 years ago
Oh, btw. for the unit tests, I would be surprised if the hash values themselves are portable…
by , 17 years ago
Attachment: | t5668-r6465-perm-cache-key.b.diff added |
---|
Updated patch to not use exception handling + dropping hash testing.
comment:10 by , 17 years ago
New patch: attachment:t5668-r6465-perm-cache-key.b.diff
It does look cleaner without exceptions. And, dropping the test cases on hash creation - it does indeed vary across installations. Now I know ;-)
follow-up: 12 comment:11 by , 17 years ago
Owner: | changed from | to
---|
Yep, new patch looks perfect (and worksforme), please commit.
comment:12 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
follow-up: 14 comment:13 by , 17 years ago
Replying to osimons:
Replying to Dave Abrahams <dave@boost-consulting.com>:
Not to put too fine a point on it, but "Seems to work fine" isn't very compelling… especially in this case because someone surely said the same for the original (wrong) code.
Thanks for your words of encouragement.
Is that the tangy snap of sarcasm I smell?
It was intended as a start of a discussion on alternatives to hashing, or perhaps of what to hash and what to have as full text.
I'm not worried about the distinctiveness of it as it will be as distinct as it gets (permission-side), but more the impact of it on memory use. A repos with many files, deep hierarchies and lots of revisions is going to take up a large chunk of memory by storing the full (unicode) path for each and every revision of a each and every file path as Trac gets used and queried. That could mean each resource in the cache consuming perhaps 250 bytes on average compared to the 2-4 bytes of the integer hash. On my limited test setup I can't test the consequences of this in full - it pretty much works as before. However, testing this by querying a full list of files for each revision on a t.e.o copy may find this unusable.
The full representation is the most distinct way I can see, and of course it works. That does not mean it is the right way - or necessarily that hashing is 'wrong' as you say.
Just to be clear: I never claimed that hashing is wrong. I said the original code was wrong, and it was, because hash collisions were not accounted for.
comment:14 by , 17 years ago
Replying to Dave Abrahams <dave@boost-consulting.com>:
Just to be clear: I never claimed that hashing is wrong. I said the original code was wrong, and it was, because hash collisions were not accounted for.
Sure. I learned a lot along the way here as I dug into it, so thanks for pointing out the issue and bearing with me.
Right, the cache should contain a copy of the full key for disambiguation.