Edgewall Software

Trac Context Refactoring

Just Brainstorming…

Goals

These are the original goals of the introduction of the wiki rendering context, and later the global Context class:

  • Fix for rendering relative links in wiki text when that wiki text is rendered in a different context (e.g. in the timeline)
  • Enable fine-grained permission checking (i.e. per object instead of just per-subsystem)

The outreach of the Context class went far beyond that, however: it's now used in many places to:

  • generate links to resources
  • get display names for a resource
  • actually access the underlying resource (i.e. the object representing the database record)
  • pass the "context" object around when the “req” and model object are needed, rather than passing those two directly
  • and more?

An important goal of this proposal is to limit the scope and responsibility of the context/resource infrastructure, or at least to define clearer responsibilities. In this respect, one problem is that Context objects (may) contain request objects, but many APIs that would need to get passed a context should not depend on being called from an HTTP request. For example, a permission policy could currently base its decision on the path of the request, some cookie, or other information from a request, but it really should only be looking at the user and the resource, because otherwise the policy could not be applied when the API is being used from a simple script. That is impossible to enforce with a Context class that doesn't have very clear boundaries.

Towards Resource Descriptors

The purpose of a resource descriptor is to identify a resource in Trac (realm+identifier). Resource descriptors are used for checking permissions at the object level, and for communicating which resource some wiki text belongs to.

A resource descriptor is simply a list of tuples, for example:

[('wiki', ('WikiStart', 1))]
[('wiki', ('WikiStart', 1)), ('attachment', 'README.txt')]
[('ticket', 42), ('attachment', 'patch.diff')]

The list represents the containment hierarchy of a resource: for example, the attachment in the second list belongs to the “WikiStart” wiki page.

The individual tuples in a descriptor list are of the form (realm, identifier), where “realm” is a string, and “identifier” is any serializable and hashable object, such as a string or tuple. For example, wiki pages require both a page name and a version number to be uniquely identified, whereas attachments only require the filename.

We're using a simple data structure here because:

  1. it is sufficient, and
  2. we expect that we'll need to be creating lots of those descriptors, so keeping object instantiation and management overhead to a minimum is desirable.

Resource descriptors are needed whenever some code wants to check the permissions on a specific resource, or the resource contains wiki text that is supposed to be rendered. They can be constructed manually, but we should probably also have convenience functions, for example a .descriptor property on model objects that would return the corresponding descriptor.

Permission System

From looking at the current code, the most common uses of req.perm are without any resource descriptor at all, or with a pre-built descriptor. Still, to make life easier it would be preferable if the end user didn't need to fully specify a resource descriptor. Something like this might suffice:

  1. req.perm(realm, id=None)
  2. If realm is a list, treat it as a fully specified resource descriptor.
  3. Otherwise treat (realm, id) as the descriptor.

The existing permission code won't be that difficult to convert. The real work will be in locating where the existing "resource descriptor" context objects are being created and converting those.

Wiki Rendering Context

Why is this needed? From WikiContext#FixedIssues:

Thanks to Wiki rendering contexts, the following issues could be solved:

  • relative attachments and comments TracLinks now always refer to the correct context resource, irrelevant from where they are displayed (ticket query with description #3711, headings #3842, timeline, etc.)
  • relative links (i.e. see this kind of TracLinks) are now always referring to the correct resource (#4144)

Those two are of course related. I believe this can be implemented by simply passing a resource descriptor to the wiki_to_html() function, and storing it as a .resource attribute on the Formatter object. The link resolvers can then look at the resource descriptor and fixup the generating link accordingly.

In the current code, the Context class is also responsible for generating links to a resource. As far as I can tell, this is a separate enhancement that is only needed for the attachments module and the newly added “generic reports” feature. See the next section for an approach for this.

The other thing that a context does is to determine whether links should be absolute (include the scheme and hostname) or relative to the server root. This had previously been an attribute on the wiki formatter object, and I think we should be able to simply move it back there.

Context Factories and Subclasses

These are used in the current context code. What do they do?

  • return the model object for a context
  • construct a URL to the resource identified by the context
  • provide appropriate “name”, “shortname”, and “summary” properties

Now, what is that actually used for?

It seems like the primary reason these exist is the attachments module. So here's another unrelated feature enabled by Context: attachment containers can be any resource, as far as I can tell. And things like the attachment templates rendering the name and URL of the parent resource is based on the context object. If Context is to go away, we'd need something else to replace this.

Maybe something like this:

class IResourceManager(Interface):

    def get_resource_realm():
        """Return the string identifying the realm of resources handled by this
        component."""

    def get_resource_url(descriptor, href, **kwargs):
        """Return the URL for the requested resource."""

    def resolve_resource(descriptor):
        """Return an object representing the content of the requested resource.
       
        The returned object needs to have at least the following properties
        and functions:
        * `__unicode()__`: should return a compact representation of the object,
          such as "#123" for the ticket with the ID 123
        * `display_name`: a more verbose string representation of the object,
          for example "Ticket #123" for the ticket with the ID 123
        """

class TicketSystem(Component):
    # Imagine the existing stuff mixed in here ;)
    implements(IResourceManager)

    def get_resource_realm(self):
        return 'ticket'

    def get_resource_href(self, descriptor, href, **kwargs):
        return href.ticket(descriptor[-1][1], **kwargs)

    def resolve_resource(self, descriptor):
        return Ticket(self.env, tkt_id=descriptor[-1][1])

class Environment(Component):
    # Imagine the existing stuff mixed in here ;)

    resource_managers = ExtensionPoint(IResourceManager)

    def __init__(self):
        for manager in self.resource_managers:
            self._managers[manager.get_resource_realm()] = manager

    def get_resource_manager(self, descriptor):
        for realm, identifier in reversed(self.managers):
            manager = self._managers.get(realm)
            if manager:
                return manager

Note: This mirrors the generality/abstractness of the current context code—a different approach would be to make this specific to attachments, and add an IAttachmentContainer extension point, or something like that.

Feedback

osimons

The work so far looks great to me. Looks nice, simple and purposeful. My issue with context deals with 'security', and here are some notes i have made:

  1. Having a context without req object is great - it means no one can write a security policy based on req.path, on some secret query parameter or whatever else available through req - security becomes strictly a function of the user and the resource, and not how it is accessed. That is great.
  2. Here is the continuation of that: Why should features (plugins inside and outside Trac) always have to check security? My basic idea is to make security implicit when accessed through the context as a best practice. If you don't have …_VIEW, well you just don't get the resource or raise PermissionError when trying to access it (this is a bit harder to enforce with write, though - I have to leave that solution to others :-) see the comment I wrote replying to aat -Christian Boos 9/24/07 4:11 PM). Using the resource descriptor as default access would mean security was included for free. No assert's, req.perm code - but possibly a few more try: except: or other general ways of letting the receipient know about security enforcement.
    • The simple use case: Run a query and you only get tickets listed that you have access to. For the resources you do not have access to, there is simply no resource to display. This will be the case with the current system. The query and report modules just haven't been converted yet. -Alec 9/12/07 2:42 PM
    • Another better example; I have added some patches to TOC macro that uses Context to fetch and list things. This plugin currently ignores security totally - it will happily render whatever pages I ask it to. I know security needs to go into the plugin, and as a developer I then have to come up with how I want to deal with security - grafting it on as I see fit. Should I say: Well, you are allowed to view this page so you should see all content on it (including full toc)? Should I check each page in the TOC and exclude some? Should I list page names but not headings inside the documents? This should not be my decision to make.
    • A ticket today can be accessed through navigation/request to ticket, queries, macros, links and countless other ways. In each and every instance I need to remember to check permissions. Always. And in the same manner. Some sort of custom ticket box macro that gets this wrong will spill all the beans.
  3. Having a light-weight resource descriptor is a bit of an illusion if for each access on that resource I have to check permissions. That makes the sum of accessing/checking a resource larger - even though the descriptor itself might be small and cheap to make. I have yet to think of a single use case where we would/should access the resource outside the context of security in 'common' feature code? Rendering links (css) in formatter is a good example of this. Every single link on a wiki page is checked for permissions. Anyone have a use case?
  4. If you want to go outside best practice, there is obviously ways to access and change any resource we'd like - using the various Model objects, or even database for that matter. But, outside the 'context' all security bets are off - you need to check this yourself.
  5. It would set us on a path to remove the req object for Formatter and possibly other entities - perhaps one day being able to render wiki formatting outside a request context again :-) I chatted to cmlenz, so obviously many ways of doing this - so kind of just mentioned as a side effect of removing current req.perm checks everywhere.
  6. Essence of security I think is contrary to Python dogma: implicit is better than explicit :-)

Thanks all for putting in time and effort on this! — simon

aat

This is more historical than anything, but I largely agree. In my ideal fantasy land Trac would have three layers internally:

  1. The underlying data model (model.py) for raw data access.
  2. An internal API for accessing the data while applying permissions, as well as other more complex manipulations of internal resources. Kind of like how WikiSystem and TicketSystem exist today, but better defined and more stable, and applied across all Trac modules.
  3. Then finally the API consumer, which would include the Trac web interface as well as third party plugins.

But, that being said, I think these changes and those you suggest, are well out of scope for the context refactoring. The new permission system, and context itself, hasn't changed the model of how the permission system is used, and these changes would be much wider reaching than just fixing context.

Alternatively, 1. and 2. could be merged, i.e. the model could take care to do the appropriate permission checks. That sounds pretty simple, like doing a perm.require('TICKET_MODIFY') in the Ticket.save_changes() method, for example. The perm (a PermissionCache object) could easily be passed on to the model, in the IResourceManager.get_model(resource) call (see my proposal below). -Christian Boos 9/24/07 4:05 PM

cmlenz

I agree with aat here: those are all valid concerns, but outside the scope of this proposal.

But I do think that the point concerning not making the request object available for permission checks is quite important for this refactoring. Permission policies should be based solely on resources and the user, not any aspects of the current HTTP request such as cookies, HTTP vs HTTPS, etc.

cboos

  • in Goals, the encapsulation of "everything needed" (env, req, db) in one object was also one of the goals. I'm OK now to drop completely this idea.
  • in Towards Resource Descriptors, only the identification aspect of resource is addressed. So the list of (realm,id,version) tuple is at most a Resource Identifier, and can't really be called a Resource Descriptor, as it can't describe usefully a resource. I don't think there would be any noticeable performance impact for using objects for that, as opposed to having to revert to the actual data model for representation needs (as suggested in the example TicketSystem.resolve_resource). Still, if this is a concern, I agree we could delegate the dynamic description of identifiers to Components.
    Also, the manipulation of those list-based resource identifiers would be quite cumbersome (see the same resolve_resource example) and even more if one checks for the presence or absence of version information. It would be much easier/cleaner to use r.realm, r.id and r.version in the code.
    Finally, another important reason why it's preferable to use an object rather than a list of tuple is that the notion of "identity" will evolve. Currently we have realm+id (+ eventually revision), but in the future we'll probably add the project information as well. It will be far easier to deal with the project information by adding r.project calls where relevant rather than changing all the code accessing the tuples.
  • Context Factories and Subclasses:
    We need some kind of dynamic description of resources, for all the parts that are generic about resources in Trac: the attachments, the history and diffs of resource content (#2945), the generic reports, etc.
    Those things have been built some time after the WikiContext integration, so it's true that they are separate enhancements. But I think that they are very worthwhile ones that have to be kept and probably expanded upon (generic comments, content change annotation, etc.)
    If there's really a need to decouple the way resources can be identified from the way resources can be described, then we could eventually have simple Resource objects and "rendering" methods from a manager component like the one suggested. That would actually make some parts of the implementation simpler, as we could create very simple Resource objects which could delegate their dynamic aspects to their managing component.
  • Wiki Rendering Context:
    We can't simply move all the responsibilities of the rendering context to the Formatter class, as Formatter objects are only present when explicitly rendering Wiki text, not generally available when rendering "content", in the mimeview module (the "content" which is rendered is coming from resources and that content may be wiki content). So we need something (a RenderingContext class?) that would be made available to the content renderers, and from there to the formatter, and from the formatter conveyed to the wiki processors. For example, the "blame" kind of annotation, now available for repository source file, later for other resources (#5821) is done at the mimeview layer, and doesn't depend on the wiki layer, so using a Formatter here is not appropriate. Another example of the use of a context which knowns about the "parent" context is to provide a way to guard against recursive content display. This is not yet that common for now, as the possibility to embed one content into another are limited so far (the TicketQuery macro in table mode could lead to such a situation, though). So the RenderingContext is really something that belongs to the trac.mimeview layer and could be used in further developments to refactor/simplify that API as well (see #3332).
  • Permission System:
    • About osimons points 2. and 3.: I think this is an interesting proposal. In order to get a resource descriptor at all, the accessor should at least have read access to it. I can't think of a reason when one should be able to manipulate a resource without having at least read permission. If that's really needed, that should still be possible, but it's all but the most common case. So the point would be to find a nice and simple way to create resource identifiers/descriptors for authorized resources.
      The PermissionCache seems to be the best place to stick this convenient resource creation mechanism, for a couple of reasons:
      • there's already a PermissionCache is available in most place of the code where we already use it (req.perm),
      • it's convenient to create when there's no req available (PermissionCache(env, username))
      • we're already using it for enforcing read permissions (if 'WIKI_VIEW' in req.perm: ...), even for fine-grained permissions (if 'WIKI_VIEW' in req.perm('wiki', 'WikiStart'): ...)
    • In general, I'm fully for ditching out the req out of the picture, of both the resource descriptors (of course) and the rendering contexts. As osimons said, it would be really nice to be able to use the rendering stuff of Trac in an off-line context. We don't necessarily need the req in all of the rendering chain, but we do need some of the information so far only available from the req, most notably the user information (req.authname). Other important things that are currently stored in the Request is the Chrome data (the parts of the template data consumed by layout.html). Overall, the current code base relies too much on the req object in the rendering process, so removing the request from the rendering should be left for later iterations. OTOH, the request is now removed from the resource and permission sides completely. See osimons point 1. and cmlenz first comment and paragraph in "Goals".

Now, the above ideas have been confronted to some actual coding and I'm getting to the point of having a working prototype (wiki and attachment module mostly done, others on the way). I think I've addressed cmlenz concerns and beyond this, I've taken osimons points further and actually try to use the permission system as the primary way to access the resources.

Warning: work in progress -Christian Boos 2007-09-27

The Resource class for resource identification

A Resource object is used to uniquely identify some resource in a Trac environment.
The identification is based on realm, id and eventually version information. Resource objects are used throughout Trac in order to conveniently identify and manipulate resources, in a light-weight and convenient way. As fine-grained security and permission checks are becoming the rule in Trac, one shouldn't be able to access a resource without the appropriate credentials.
Therefore, resources have to be created by the permission system. More precisely, the PermissionCache object is used for creating Resource objects. In turn, each Resource object let you access the permission cache, by the way of a perm property. That way:

  • permission checks can be conveniently performed on a Resource (keeping the existing PermissionCache API).
  • access to new resources can be performed starting from an existing resource, either by going through the resource's permission cache, or by using some convenience methods for specialized situations (other resources from the same realm, child resources).

The syntax for creating resource objects can be exactly the one used up to now for checking fine grained permissions:

page = req.perm('wiki', 'WikiStart')

By default, the version information is set to None, specifying that we're interested in the latest version of the resource.
In case there's no read permission for that resource, None is returned instead of an object.

Alternatively, the equivalent to req.perm.require('wiki', 'WikiStart') can be achieved using:

page = req.perm.assert_resource('wiki', 'WikiStart')

Here, in case there's no read permission, a PermissionError exception is raised. See osimons point 2.

Note that if for some reason there's really a need to create a Resource object without the proper credentials, it's of course possible to do so, using directly the Resource constructor - simply it's not the "advised" way.

In addition to the above, it should also be possible to duplicate resource objects using convenient ways (e.g. page(version=req.args.get('version')) in order to get a specific version of an already available resource). When doing this, the permissions are checked again for the newly created resource, and None is returned if there's no read permission. The alternative copy() method can be used if one would like to trigger a PermissionError instead (or should that be assert_copy() to mirror assert_resource()?).

Resource can also be retrieved without specifying an explicit id. In that case, the resource can be used to refer to the realm as a whole.

wiki_realm = req.perm('wiki')

Beyond checks for read access which are done upon resource creation, any other permission check can be done afterwards on the resource, simply by using the perm property (a familiar PermissionCache object):

    page.perm.require('WIKI_CREATE')
    if 'WIKI_ADMIN' in page.perm:
        continue

(Note: maybe we could even not expose the perm property, and write the above like that:)

    page.require('WIKI_CREATE')
    if 'WIKI_ADMIN' in page:
        continue

Another situation where it's much more convenient to create a resource from an existing resource is when one wants to create a secondary resource, i.e. a resource which only exists within the scope of another resource, like an attachment:

    attachment = parent.child('attachment', filename)

(Note: this will probably soon become child()/assert_child() for symmetry with the other methods)

After a few iterations of the prototype implementation, I finally came to something very similar to the IResourceManager interface described in the Context Factories and Subclasses section above, mainly for efficiency concerns and simplicity of the implementation. But for convenience purpose, the Resource class has still the name, shortname and summary properties, as well as a model property and an url(href) method. Those properties and method simply delegate the actual work to the resource's manager which is the Component implementing the IResourceManager interface which has claimed to manage the resource's realm (see get_resource_realms() below):

    class IResourceManager(Interface):

        def get_resource_realms():
            """Generate realm strings identifying the realm of resources handled
            by this component."""

        def get_model(resource):
            """Return an object representing the content of the requested resource.
            """

        def get_resource_url(resource, href, **kwargs):
            """Return the URL for the requested resource."""

        def get_resource_description(resource, format=None):
            """Return a representation of the resource, according to the `format`.

            For example, the ticket with the ID 123 is represented like `'#123'`
            for the `'compact'` format, `'Ticket #123'` for the default `None`
            format.
            With the `'summary'`format, more details about the resource will be
            given, at the expense of a lookup to the resource's model.
            """

osimons

Now, I'm not even going to pretend to have a clear understanding of how the base components/objects/factories needs to be implemented, but I have experimented by writing some dummy-pseudo-code to get a feel for the usage of the various implementations, and making some thoughts as to how to interface with the resource system - I'll just jot down some notes again; ignore, use or abuse as you like. I do not pretend to have the required skills to actually design and implement this - do tell me to be quiet if I only make noise :-)

  1. I experimented with a very simple design using the 'cache' terminology instead of factories - to me that made more readable code, but I think it is very much the same as what cboos is working on. The idea with using a cache was that it could actually also cache the result of resolving descriptors for the user session. Might be a performance enhancement, might not. Anyway, based on much the same idea as PermissionCache that gets 'built-into' some sort of ResourceCache - along the lines of cboos' example code above that I like. And, like PermissionCache, it does not have to be an actual cache.
    rc = ResourceCache(env, req.authname)
    r = rc([('wiki', ('WikiStart', 1))])   # or however we end up calling descriptors
    # r is then some resource-manager-thingy - one level above the actual resource (model) like the IResourceManager
    
    Resources are actually quite cheap to build, so I don't think we need a special cache for them. OTOH, the permission checks are expensive, but we already have the PermissionCache for caching those. -Christian Boos 9/24/07 3:08 PM
  2. I then started trying to use the different variations on notation for descriptors, and have trouble being really comfortable with w any. Could be me, of course. Then I picked up on where cboos mentioned the Href class, which I think was a great idea: Why can't we use a path as a descriptor / identifier? That way it can be re-used both by request handlers that already do this parsing, and as Trac is a web application at the core, using the path to identify unique resources makes sense (to me at least). Just look at the location bar in the browser to find the name and syntax of the resource. They are easy on the eyes, easy to write, and we already have lots of code for resolving and working with paths. Not all paths are resources, but all resources are paths.
    r = rc('wiki/WikiStart?version=23') # The full descriptor as string, path can be key in a cache if we want to
    r = rc.wiki('WikiStart') # same principle as Href as cboos suggested.
    r = rc(href.wiki('WikiStart')) # or even using an Href object directly
    
    A child-parent path like r = rc('attachment/ticket/2048/2048.patch') could then be resolved by the attachment resolver looking up ticket 2048 in the ResourceCache, and returning that resource-manager-thingy with itself attached as a child (or the other way around). Keeping the idea from the original Context class, any resource-manager-thingy should also be able to easily render the path to itself for usage by others (for path-based security policy or whatever). It is then the resposibility of the object 'lowest' in the chain to render the path - for the ticket example, it would have to be handed over to the lowest child (attachment) to render the path/descriptor for the full chain (or at least to merge and return the results).

The problem I have with this idea is that paths are not as good for resource identification as you'd expect. First, you'd need to "parse" such a path each time you need to access a specific part of the information (realm, id or version); second that parsing is not as easy as one might think, as the "/" can be either part of the identifier or a separator - we already deal with those things in the match_request methods, and that's painful enough for not having to spread that everywhere… -Christian Boos 9/24/07 3:10 PM

Last modified 10 years ago Last modified on Sep 27, 2007, 11:04:01 AM

Attachments (6)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.