= Trac Context Refactoring = '''Just Brainstorming…'''[[br]] == Goals == These are the original goals of the introduction of the wiki rendering context, and later the global Context class: * Fix for rendering relative links in wiki text when that wiki text is rendered in a different context (e.g. in the timeline) * Enable fine-grained permission checking (i.e. per object instead of just per-subsystem) The outreach of the Context class went far beyond that, however: it's now used in many places to: * generate links to resources * get display names for a resource * actually access the underlying resource (i.e. the object representing the database record) * pass the "context" object around when the “req” and model object are needed, rather than passing those two directly * and more? An important goal of this proposal is to limit the scope and responsibility of the context/resource infrastructure, or at least to define clearer responsibilities. In this respect, one problem is that Context objects (may) contain request objects, but many APIs that would need to get passed a context should not depend on being called from an HTTP request. For example, a permission policy could currently base its decision on the path of the request, some cookie, or other information from a request, but it really should only be looking at the user and the resource, because otherwise the policy could not be applied when the API is being used from a simple script. That is impossible to enforce with a Context class that doesn't have very clear boundaries. == Towards Resource Descriptors == The purpose of a resource descriptor is to identify a resource in Trac (realm+identifier). Resource descriptors are used for checking permissions at the object level, and for communicating which resource some wiki text belongs to. A resource descriptor is simply a list of tuples, for example: {{{ #!python [('wiki', ('WikiStart', 1))] [('wiki', ('WikiStart', 1)), ('attachment', 'README.txt')] [('ticket', 42), ('attachment', 'patch.diff')] }}} The list represents the containment hierarchy of a resource: for example, the attachment in the second list belongs to the “WikiStart” wiki page. The individual tuples in a descriptor list are of the form (realm, identifier), where “realm” is a string, and “identifier” is any serializable and hashable object, such as a string or tuple. For example, wiki pages require both a page name and a version number to be uniquely identified, whereas attachments only require the filename. We're using a simple data structure here because: 1. it is sufficient, and 2. we expect that we'll need to be creating lots of those descriptors, so keeping object instantiation and management overhead to a minimum is desirable. Resource descriptors are needed whenever some code wants to check the permissions on a specific resource, or the resource contains wiki text that is supposed to be rendered. They can be constructed manually, but we should probably also have convenience functions, for example a .descriptor property on model objects that would return the corresponding descriptor. == Permission System == From looking at the current code, the most common uses of req.perm are without any resource descriptor at all, or with a pre-built descriptor. Still, to make life easier it would be preferable if the end user didn't need to fully specify a resource descriptor. Something like this might suffice: 1. req.perm(realm, id=None) 2. If realm is a list, treat it as a fully specified resource descriptor. 3. Otherwise treat (realm, id) as the descriptor. The existing permission code won't be that difficult to convert. The real work will be in locating where the existing "resource descriptor" context objects are being created and converting those. == Wiki Rendering Context == Why is this needed? From WikiContext#FixedIssues: Thanks to Wiki rendering contexts, the following issues could be solved: * relative attachments and comments TracLinks now always refer to the correct context resource, irrelevant from where they are displayed (ticket query with description #3711, headings #3842, timeline, etc.) * relative links (i.e. [#anchor see this] kind of TracLinks) are now always referring to the correct resource (#4144) Those two are of course related. I believe this can be implemented by simply passing a resource descriptor to the wiki_to_html() function, and storing it as a .resource attribute on the Formatter object. The link resolvers can then look at the resource descriptor and fixup the generating link accordingly. In the current code, the Context class is also responsible for generating links to a resource. As far as I can tell, this is a separate enhancement that is only needed for the attachments module and the newly added “generic reports” feature. See the next section for an approach for this. The other thing that a context does is to determine whether links should be absolute (include the scheme and hostname) or relative to the server root. This had previously been an attribute on the wiki formatter object, and I think we should be able to simply move it back there. == Context Factories and Subclasses == These are used in the current context code. What do they do? * return the model object for a context * construct a URL to the resource identified by the context * provide appropriate “name”, “shortname”, and “summary” properties Now, what is that actually used for? It seems like the primary reason these exist is the attachments module. So here's another unrelated feature enabled by Context: attachment containers can be any resource, as far as I can tell. And things like the attachment templates rendering the name and URL of the parent resource is based on the context object. If Context is to go away, we'd need something else to replace this. Maybe something like this: {{{ #!python class IResourceManager(Interface): def get_resource_realm(): """Return the string identifying the realm of resources handled by this component.""" def get_resource_url(descriptor, href, **kwargs): """Return the URL for the requested resource.""" def resolve_resource(descriptor): """Return an object representing the content of the requested resource. The returned object needs to have at least the following properties and functions: * `__unicode()__`: should return a compact representation of the object, such as "#123" for the ticket with the ID 123 * `display_name`: a more verbose string representation of the object, for example "Ticket #123" for the ticket with the ID 123 """ class TicketSystem(Component): # Imagine the existing stuff mixed in here ;) implements(IResourceManager) def get_resource_realm(self): return 'ticket' def get_resource_href(self, descriptor, href, **kwargs): return href.ticket(descriptor[-1][1], **kwargs) def resolve_resource(self, descriptor): return Ticket(self.env, tkt_id=descriptor[-1][1]) class Environment(Component): # Imagine the existing stuff mixed in here ;) resource_managers = ExtensionPoint(IResourceManager) def __init__(self): for manager in self.resource_managers: self._managers[manager.get_resource_realm()] = manager def get_resource_manager(self, descriptor): for realm, identifier in reversed(self.managers): manager = self._managers.get(realm) if manager: return manager }}} '''Note:''' ''This mirrors the generality/abstractness of the current context code—a different approach would be to make this specific to attachments, and add an IAttachmentContainer extension point, or something like that.'' = Feedback = === osimons === The work so far looks great to me. Looks nice, simple and purposeful. My issue with context deals with 'security', and here are some notes i have made: 1. Having a context without req object is great - it means no one can write a security policy based on req.path, on some secret query parameter or whatever else available through req - security becomes strictly a function of the user and the resource, and not how it is accessed. That is great. 2. Here is the continuation of that: Why should features (plugins inside and outside Trac) always have to check security? My basic idea is to make security implicit when accessed through the context as a best practice. If you don't have ..._VIEW, well you just don't get the resource or raise `PermissionError` when trying to access it (this is a bit harder to enforce with write, though - I have to leave that solution to others :-) ''see the comment I wrote replying to aat -Christian Boos 9/24/07 4:11 PM''). Using the resource descriptor as default access would mean security was included for free. No assert's, req.perm code - but possibly a few more try: except: or other general ways of letting the receipient know about security enforcement. * The simple use case: Run a query and you only get tickets listed that you have access to. For the resources you do not have access to, there is simply no resource to display. ''This will be the case with the current system. The query and report modules just haven't been converted yet. -Alec 9/12/07 2:42 PM'' * Another better example; I have added some patches to TOC macro that uses Context to fetch and list things. This plugin currently ignores security totally - it will happily render whatever pages I ask it to. I know security needs to go into the plugin, and as a developer I then have to come up with how I want to deal with security - grafting it on as I see fit. Should I say: Well, you are allowed to view this page so you should see all content on it (including full toc)? Should I check each page in the TOC and exclude some? Should I list page names but not headings inside the documents? This should not be my decision to make. * A ticket today can be accessed through navigation/request to ticket, queries, macros, links and countless other ways. In each and every instance I need to remember to check permissions. Always. And in the same manner. Some sort of custom ticket box macro that gets this wrong will spill all the beans. 3. Having a light-weight resource descriptor is a bit of an illusion if for each access on that resource I have to check permissions. That makes the sum of accessing/checking a resource larger - even though the descriptor itself might be small and cheap to make. I have yet to think of a single use case where we would/should access the resource outside the context of security in 'common' feature code? Rendering links (css) in formatter is a good example of this. Every single link on a wiki page is checked for permissions. Anyone have a use case? 4. If you want to go outside best practice, there is obviously ways to access and change any resource we'd like - using the various Model objects, or even database for that matter. But, outside the 'context' all security bets are off - you need to check this yourself. 5. It would set us on a path to remove the req object for Formatter and possibly other entities - perhaps one day being able to render wiki formatting outside a request context again :-) I chatted to cmlenz, so obviously many ways of doing this - so kind of just mentioned as a side effect of removing current req.perm checks everywhere. 6. Essence of security I think is contrary to Python dogma: implicit is better than explicit :-) Thanks all for putting in time and effort on this! -- simon === aat === This is more historical than anything, but I largely agree. In my ideal fantasy land Trac would have three layers internally: 1. The underlying data model (model.py) for raw data access. 2. An internal API for accessing the data while applying permissions, as well as other more complex manipulations of internal resources. Kind of like how `WikiSystem` and `TicketSystem` exist today, but better defined and more stable, and applied across all Trac modules. 3. Then finally the API consumer, which would include the Trac web interface as well as third party plugins. But, that being said, I think these changes and those you suggest, are well out of scope for the context refactoring. The new permission system, and context itself, hasn't changed the model of how the permission system is used, and these changes would be much wider reaching than just fixing context. ''Alternatively, 1. and 2. could be merged, i.e. the model could take care to do the appropriate permission checks. That sounds pretty simple, like doing a perm.require('TICKET_MODIFY') in the Ticket.save_changes() method, for example. The perm (a `PermissionCache` object) could easily be passed on to the model, in the IResourceManager.get_model(resource) call (see my proposal below). -Christian Boos 9/24/07 4:05 PM'' === cmlenz === I agree with aat here: those are all valid concerns, but outside the scope of this proposal. But I do think that the point concerning not making the request object available for permission checks is quite important for this refactoring. Permission policies should be based solely on resources and the user, not any aspects of the current HTTP request such as cookies, HTTP vs HTTPS, etc. === cboos === * in '''Goals''', the encapsulation of "everything needed" (env, req, db) in one object was also one of the goals. I'm OK now to drop completely this idea. * in '''Towards Resource Descriptors''', only the identification aspect of resource is addressed. So the list of (realm,id,version) tuple is at most a Resource Identifier, and can't really be called a Resource Descriptor, as it can't describe usefully a resource. I don't think there would be any noticeable performance impact for using objects for that, as opposed to having to revert to the actual data model for representation needs (as suggested in the example `TicketSystem.resolve_resource`). Still, if this is a concern, I agree we could delegate the dynamic description of identifiers to Components.[[br]]Also, the manipulation of those list-based resource identifiers would be quite cumbersome (see the same resolve_resource example) and even more if one checks for the presence or absence of version information. It would be much easier/cleaner to use r.realm, r.id and r.version in the code.[[br]]Finally, another important reason why it's preferable to use an object rather than a list of tuple is that the notion of "identity" will evolve. Currently we have realm+id (+ eventually revision), but in the future we'll probably add the project information as well. It will be far easier to deal with the project information by adding r.project calls where relevant rather than changing all the code accessing the tuples. * '''Context Factories and Subclasses:'''[[br]]We need some kind of dynamic description of resources, for all the parts that are generic about resources in Trac: the attachments, the history and diffs of resource content (#2945), the generic reports, etc.[[br]]Those things have been built some time after the `WikiContext` integration, so it's true that they are separate enhancements. But I think that they are very worthwhile ones that have to be kept and probably expanded upon (generic comments, content change annotation, etc.)[[br]]If there's really a need to decouple the way resources can be identified from the way resources can be described, then we could eventually have simple Resource objects and "rendering" methods from a manager component like the one suggested. That would actually make some parts of the implementation simpler, as we could create very simple Resource objects which could delegate their dynamic aspects to their managing component. * '''Wiki Rendering Context:'''[[br]]We can't simply move all the responsibilities of the rendering context to the Formatter class, as Formatter objects are only present when explicitly rendering Wiki text, not generally available when rendering "content", in the mimeview module (the "content" which is rendered is coming from resources and that content may be wiki content). So we need something (a `RenderingContext` class?) that would be made available to the content renderers, and from there to the formatter, and from the formatter conveyed to the wiki processors. For example, the "blame" kind of annotation, now available for repository source file, later for other resources (#5821) is done at the mimeview layer, and doesn't depend on the wiki layer, so using a Formatter here is not appropriate. Another example of the use of a context which knowns about the "parent" context is to provide a way to guard against recursive content display. This is not yet that common for now, as the possibility to embed one content into another are limited so far (the !TicketQuery macro in table mode could lead to such a situation, though). So the `RenderingContext` is really something that belongs to the `trac.mimeview` layer and could be used in further developments to refactor/simplify that API as well (see #3332). * '''Permission System:''' if the permission system is not to be bound to the req, then I see no good reason why we use req.perm as its main entry point (other than backward compatibility issues). The rendering context seems to be a possible place to store this information (who is accessing the content?). Alternatively, we could introduce an User object, for now as simple as a wrapper for the authname and a place to store the authentification cache. * About osimons points 2. and 3.: I think this is an interesting proposal. In order to get a resource descriptor at all, the accessor should at least have read access to it. This could for example be enforced by requiring a mandatory user argument when creating a resource descriptor: `ResourceManager(self.env, user).ticket(42)`.[[br]]Also, regardless from how the `PermissionCache` is obtained (currently req.perm), I think using it as the principal way to produce the resource identifiers is a good idea. * In general, I'm fully for ditching out the req out of the picture, of both the resource descriptors (of course) and the rendering contexts. As osimons said, it would be really nice to be able to use the rendering stuff of Trac in an off-line context. We don't necessarily need the req in all of the rendering chain, but we do need some of the information so far only available from the req, most notably the user information (req.authname). Other important things that are currently stored in the Request is the Chrome data (the parts of the template data consumed by layout.html). Now, the above ideas have been confronted to some actual coding and I'm getting to the point of having a working prototype. Several details are still in need of additional work (especially the rendering context par), but I think I'm getting there.[[br]] I think I've addressed cmlenz concerns and beyond this, I've taken osimons points further and actually try to use the permission system as the primary way to access the resources. '''Warning:''' ''work in progress -Christian Boos 9/24/07 4:31 PM'' === The `Resource` class for resource identification === A Resource object is used to uniquely identify some resource in a Trac environment.[[br]] The identification is based on realm, id and eventually version information. `Resource` objects are used throughout Trac in order to conveniently identify and manipulate resources, in a light-weight and convenient way. As fine-grained security and permission checks are becoming the rule in Trac, one shouldn't be able to access a resource without the appropriate credentials.[[br]] Therefore, resources have to be created by the permission system. More precisely, the `PermissionCache` object is used for creating `Resource` objects. In turn, each Resource object let you access the permission cache, by the way of a perm propery. That way: * permission checks can be conveniently performed on a Resource (keeping the existing `PermissionCache` API). * access to new resources can be performed starting from an existing resource, either by going through the resource's permission cache, or by using some convenience methods for specialized situations (other resources from the same realm, child resources). Let's illustrate the usage of resource objects. In the general case, all what is needed is the realm and id information: {{{ #!python page = req.perm.resource('wiki', 'WikiStart') }}} By default, the version information is set to None, specifying that we're interested in the latest version of the resource.[[br]] `Resource` can also be retrieved without specifying an explicit id. In that case, the resource can be used to refer to the realm as a whole. {{{ #!python wiki = req.perm.resource('wiki') }}} Note that from this point, the Request object is put out of the picture for permission checks and is not otherwise used by resources.[[br]] (See osimons point 1. and cmlenz first comment and paragraph in "Goals") If the req.perm credentials don't provide the _VIEW permission for the resource, it is simply not possible to build a `Resource` instance for the specified resource and the call will fail, triggering a `PermissionError`. (See osimons point 2.)[[br]] There's an alternate method, which can be used when one does not want to get a `PermissionError`, which will simply return `None` if there's no permission to access the resource (the "get" is reminiscent of the dictionary lookup get). {{{ #!python wiki = req.perm.get_resource('wiki') if wiki: # do something with it }}} Beyond checks for read access which are done upon resource creation, any other permission check can be done afterwards on the resource, simply by using the perm property (a familiar `PermissionCache` object): {{{ #!python page.perm.require('WIKI_CREATE') if 'WIKI_ADMIN' in page.perm: continue }}} Any kind of primary resources could be created using a `PermissionCache`, obtained either from `req.perm` or `.perm`, but in some cases there are even more convenient ways to create a new resource.[[br]] For example, one can create a resource by copying an existing one. This is convenient when one has only to change the id or the version of an existing resource identifier. So when staying within the same realm, resources can be created using the copy() method: {{{ #!python versioned_page = page.copy(version=version) }}} Like for resource(), when dealing with a large number of resources one could use an alternative "get" form, in order to get the authorized resources without putting exceptions into play: {{{ #!python pages = [] for name in WikiSystem(env).get_all_pages(prefix): page = wiki.get_copy(id=name) if page: pages.append(page) }}} The `get_copy(id)` method will return `None` if there's no access right to resource with the given id.[[br]] Another situation where it's much more convenient to create a resource from an existing resource is when one wants to create a secondary resource, i.e. a resource which only exists within the scope of another resource, like an attachment: {{{ #!python attachment = parent.get_child('attachment', filename) }}} After a few iterations of the prototype implementation, I finally came to something very similar to the `IResourceManager` interface from '''Context Factories and Subclasses''' chapter above, mainly for efficiency concerns and simplicity of the implementation. But for convenience purpose, the `Resource` class has still the `name`, `shortname` and `summary` properties, as well as a `model` property and an `url(href)` method. Those properties and method simply delegate the actual work to the resource's manager (i.e. the Component implementing the `IResourceManager ` interface): {{{ #!python class IResourceManager(Interface): def get_resource_realms(): """Generate realm strings identifying the realm of resources handled by this component.""" def get_model(resource): """Return an object representing the content of the requested resource. """ def get_resource_url(resource, href, **kwargs): """Return the URL for the requested resource.""" def get_resource_description(resource, format=None): """Return a representation of the resource, according to the `format`. For example, the ticket with the ID 123 is represented like `'#123'` for the `'compact'` format, `'Ticket #123'` for the default `None` format. With the `'summary'`format, more details about the resource will be given, at the expense of a lookup to the resource's model. """ }}} === osimons === Now, I'm not even going to pretend to have a clear understanding of how the base components/objects/factories needs to be implemented, but I have experimented by writing some dummy-pseudo-code to get a feel for the usage of the various implementations, and making some thoughts as to how to interface with the resource system - I'll just jot down some notes again; ignore, use or abuse as you like. I do not pretend to have the required skills to actually design and implement this - do tell me to be quiet if I only make noise :-) 1. I experimented with a very simple design using the 'cache' terminology instead of factories - to me that made more readable code, but I think it is very much the same as what cboos is working on. The idea with using a cache was that it could actually also cache the result of resolving descriptors for the user session. Might be a performance enhancement, might not. Anyway, based on much the same idea as `PermissionCache` that gets 'built-into' some sort of `ResourceCache` - along the lines of cboos' example code above that I like. And, like `PermissionCache`, it does not have to be an actual cache. {{{ #!python rc = ResourceCache(env, req.authname) r = rc([('wiki', ('WikiStart', 1))]) # or however we end up calling descriptors # r is then some resource-manager-thingy - one level above the actual resource (model) like the IResourceManager }}} ''Resources are actually quite cheap to build, so I don't think we need a special cache for them. OTOH, the permission checks are expensive, but we already have the `PermissionCache` for caching those. -Christian Boos 9/24/07 3:08 PM'' 2. I then started trying to use the different variations on notation for descriptors, and have trouble being really comfortable with w any. Could be me, of course. Then I picked up on where cboos mentioned the Href class, which I think was a great idea: Why can't we use a path as a descriptor / identifier? That way it can be re-used both by request handlers that already do this parsing, and as Trac is a web application at the core, using the path to identify unique resources makes sense (to me at least). Just look at the location bar in the browser to find the name and syntax of the resource. They are easy on the eyes, easy to write, and we already have lots of code for resolving and working with paths. Not all paths are resources, but all resources are paths. {{{ #!python r = rc('wiki/WikiStart?version=23') # The full descriptor as string, path can be key in a cache if we want to r = rc.wiki('WikiStart') # same principle as Href as cboos suggested. r = rc(href.wiki('WikiStart')) # or even using an Href object directly }}} A child-parent path like `r = rc('attachment/ticket/2048/2048.patch')` could then be resolved by the attachment resolver looking up ticket 2048 in the `ResourceCache`, and returning that resource-manager-thingy with itself attached as a child (or the other way around). Keeping the idea from the original Context class, any resource-manager-thingy should also be able to easily render the path to itself for usage by others (for path-based security policy or whatever). It is then the resposibility of the object 'lowest' in the chain to render the path - for the ticket example, it would have to be handed over to the lowest child (attachment) to render the path/descriptor for the full chain (or at least to merge and return the results). ''The problem I have with this idea is that paths are not as good for resource identification as you'd expect. First, you'd need to "parse" such a path each time you need to access a specific part of the information (realm, id or version); second that parsing is not as easy as one might think, as the "/" can be either part of the identifier or a separator - we already deal with those things in the match_request methods, and that's painful enough for not having to spread that everywhere... -Christian Boos 9/24/07 3:10 PM''