Edgewall Software
Modify

Opened 18 years ago

Closed 17 years ago

Last modified 17 years ago

#5668 closed defect (fixed)

PermissionPolicy implementation detail looks wrong

Reported by: Dave Abrahams <dave@…> Owned by: osimons
Priority: highest Milestone: 0.11
Component: general Version: devel
Severity: normal Keywords: permission
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

See source:trunk/trac/perm.py@5808#L446

In case of hash collision (yes I realize it's unlikely) couldn't this return wrong results?

Attachments (2)

t5668-r6465-perm-cache-key.a.diff (3.3 KB ) - added by osimons 17 years ago.
Patch saving the resource and double-checking before using the cached decision.
t5668-r6465-perm-cache-key.b.diff (3.1 KB ) - added by osimons 17 years ago.
Updated patch to not use exception handling + dropping hash testing.

Download all attachments as: .zip

Change History (16)

comment:1 by Christian Boos, 18 years ago

Description: modified (diff)
Keywords: permission added
Milestone: 0.11.1
Summary: browser:trunk/trac/perm.py#L446 looks wrongPermissionPolicy implementation detail looks wrong

Right, the cache should contain a copy of the full key for disambiguation.

comment:2 by Christian Boos, 17 years ago

Priority: normalhighest

comment:3 by osimons, 17 years ago

I suppose a string representation of the resource is as distinct as it gets? Seems to work fine.

  • trac/perm.py

     
    521521        return self._has_permission(action, resource)
    522522
    523523    def _has_permission(self, action, resource):
    524         key = (self.username, hash(resource), action)
     524        resource_key = resource and "%s:%s:%s" % (resource.realm,
     525                            resource.id, resource.version) or None
     526        key = (self.username, resource_key, action)
    525527        try:
    526528            return self._cache[key]
    527529        except KeyError:

comment:4 by Dave Abrahams <dave@…>, 17 years ago

Not to put too fine a point on it, but "Seems to work fine" isn't very compelling… especially in this case because someone surely said the same for the original (wrong) code.

in reply to:  4 ; comment:5 by osimons, 17 years ago

Replying to Dave Abrahams <dave@boost-consulting.com>:

Not to put too fine a point on it, but "Seems to work fine" isn't very compelling… especially in this case because someone surely said the same for the original (wrong) code.

Thanks for your words of encouragement.

It was intended as a start of a discussion on alternatives to hashing, or perhaps of what to hash and what to have as full text.

I'm not worried about the distinctiveness of it as it will be as distinct as it gets (permission-side), but more the impact of it on memory use. A repos with many files, deep hierarchies and lots of revisions is going to take up a large chunk of memory by storing the full (unicode) path for each and every revision of a each and every file path as Trac gets used and queried. That could mean each resource in the cache consuming perhaps 250 bytes on average compared to the 2-4 bytes of the integer hash. On my limited test setup I can't test the consequences of this in full - it pretty much works as before. However, testing this by querying a full list of files for each revision on a t.e.o copy may find this unusable.

The full representation is the most distinct way I can see, and of course it works. That does not mean it is the right way - or necessarily that hashing is 'wrong' as you say. Perhaps we should store the realm as string, and just hash the id and version? This needs to be discussed, and there are trade-offs involved. Certainly as high memory usage is alsready a problem.

in reply to:  5 ; comment:6 by Christian Boos, 17 years ago

Replying to osimons:

The full representation is the most distinct way I can see, and of course it works.

Why not keep a reference to the Resource object in the value, for disambiguation needs? They are probably already referenced from somewhere else and the lifespan of this cache is anyway limited (one request). There would be little additional memory cost - the hash will be kept as the key, the value would be the object plus the decision.

Only in case of collision (quite unlikely, but this is the whole point of the ticket), the value would be a list of (resource, decision) pairs.

by osimons, 17 years ago

Patch saving the resource and double-checking before using the cached decision.

in reply to:  6 comment:7 by osimons, 17 years ago

Replying to cboos:

Why not keep a reference to the Resource object in the value, for disambiguation needs?

OK. If memory usage is not a significant issue that is the best way. And, it would also likely be a less expensive operation and generally work better as the resource evolves over time. I've attached a patch for review: attachment:t5668-r6465-perm-cache-key.a.diff

As hashing and equality testing is of such great importance for perm checking, the patch also includes test cases for __eq__() and __hash__() for the Resource class. If anything related to this changes later, it should be detected by the tests.

comment:8 by Christian Boos, 17 years ago

Milestone: 0.11.10.11

Looks good. Invalidating the cache in case of collision is actually even simpler than what I proposed. Still, I think the patch can be improved by avoiding the exception handler, something along the lines of:

    ...
    cached = self._cache.get(key)
    if cached:
        cache_decision, cache_resource = cached
        if resource == cache_resource:
            return cache_decision
    ...

I think we can target 0.11 for this one.

comment:9 by Christian Boos, 17 years ago

Oh, btw. for the unit tests, I would be surprised if the hash values themselves are portable…

by osimons, 17 years ago

Updated patch to not use exception handling + dropping hash testing.

comment:10 by osimons, 17 years ago

New patch: attachment:t5668-r6465-perm-cache-key.b.diff

It does look cleaner without exceptions. And, dropping the test cases on hash creation - it does indeed vary across installations. Now I know ;-)

comment:11 by Christian Boos, 17 years ago

Owner: changed from aat to osimons

Yep, new patch looks perfect (and worksforme), please commit.

in reply to:  11 comment:12 by osimons, 17 years ago

Resolution: fixed
Status: newclosed

Replying to cboos:

Yep, new patch looks perfect (and worksforme), please commit.

Committed as [6482]. Closing.

in reply to:  5 ; comment:13 by Dave Abrahams <dave@…>, 17 years ago

Replying to osimons:

Replying to Dave Abrahams <dave@boost-consulting.com>:

Not to put too fine a point on it, but "Seems to work fine" isn't very compelling… especially in this case because someone surely said the same for the original (wrong) code.

Thanks for your words of encouragement.

Is that the tangy snap of sarcasm I smell?

It was intended as a start of a discussion on alternatives to hashing, or perhaps of what to hash and what to have as full text.

I'm not worried about the distinctiveness of it as it will be as distinct as it gets (permission-side), but more the impact of it on memory use. A repos with many files, deep hierarchies and lots of revisions is going to take up a large chunk of memory by storing the full (unicode) path for each and every revision of a each and every file path as Trac gets used and queried. That could mean each resource in the cache consuming perhaps 250 bytes on average compared to the 2-4 bytes of the integer hash. On my limited test setup I can't test the consequences of this in full - it pretty much works as before. However, testing this by querying a full list of files for each revision on a t.e.o copy may find this unusable.

The full representation is the most distinct way I can see, and of course it works. That does not mean it is the right way - or necessarily that hashing is 'wrong' as you say.

Just to be clear: I never claimed that hashing is wrong. I said the original code was wrong, and it was, because hash collisions were not accounted for.

in reply to:  13 comment:14 by osimons, 17 years ago

Replying to Dave Abrahams <dave@boost-consulting.com>:

Just to be clear: I never claimed that hashing is wrong. I said the original code was wrong, and it was, because hash collisions were not accounted for.

Sure. I learned a lot along the way here as I dug into it, so thanks for pointing out the issue and bearing with me.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain osimons.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from osimons to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.