Edgewall Software
Modify

Opened 14 years ago

Last modified 14 months ago

#9936 new enhancement

CSS and JS files should be versioned

Reported by: peter.westwood@… Owned by:
Priority: high Milestone: next-major-releases
Component: web frontend Version: 0.12.1
Severity: normal Keywords: htdocs css js
Cc: peter.westwood@…, osimons, Jun Omae, leho@…, trac-tickets@…, xin20111119nix@…, massimo.b@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

When we recently upgraded a trac install a number of users has strange behaviour on the Custom Query page.

It turned out that this was due to JS caching.

Ideally Trac should use version numbering on the CSS and JS uris so that the cache is busted on upgrades.

Attachments (6)

9936-versioned-css-js-r10548.patch (16.1 KB ) - added by Remy Blank 14 years ago.
Version URLs for CSS and JS files.
t9936-etag_files-r10542_012.diff (4.7 KB ) - added by osimons 14 years ago.
Alternative patch using headers only.
9936-static-fingerprint-r10741.patch (10.4 KB ) - added by Remy Blank 13 years ago.
Implemented fingerprinting for static resources.
9936-revert-r10960.patch (11.2 KB ) - added by Remy Blank 13 years ago.
Revert [10769].
0001-chrome-resource-fingerprinting-disabled-by-default.patch (4.4 KB ) - added by Christian Boos 13 years ago.
Applies on r10960 (i.e. still on top of r10769)
0001-chrome-resource-fingerprinting-disabled-by-default.2.patch (11.9 KB ) - added by Christian Boos 13 years ago.
Same as the other, but will apply once r10769 gets reverted… (i.e. it re-adds the parts I wanted to keep)

Download all attachments as: .zip

Change History (98)

comment:1 by Remy Blank, 14 years ago

Milestone: 0.12.20.13

Yes, we have discussed that a few times. Is it necessary to actually change the file name part of the URLs to static content, or is it enough to append a query string? In the latter case, we could append a query argument to every URL to static content, with a value containing the Trac version, or the timestamp of the last upgrade, or something like that.

This is material for 0.13.

comment:2 by anonymous, 14 years ago

You just need to increment a query string argument and that should be enough.

In WordPress we use a scheme based on the date the file changes so as to work for people who have installs which follow development closely.

Adding the version number would still be great though :-)

in reply to:  2 comment:3 by Remy Blank, 14 years ago

Replying to anonymous:

In WordPress we use a scheme based on the date the file changes so as to work for people who have installs which follow development closely.

I thought about this, too, but it might not work when using trac-admin $ENV deploy and having the web server serve static content, as Trac doesn't know where those files are.

comment:4 by Carsten Klein <carsten.klein@…>, 14 years ago

Would it not suffice to include an ETAG with a given resource? This could be for example the MD5 hash of the resource data. Including this with the response will AFAIK reset the client cache for that url, replacing it by the new data?

See http://en.wikipedia.org/wiki/HTTP_ETag for more information on this.

I strongly vote against a …/trac.css?version=1…N approach…

comment:5 by Carsten Klein <carsten.klein@…>, 14 years ago

see also Request#check_modified for this

in reply to:  4 ; comment:6 by Christian Boos, 14 years ago

Replying to Carsten Klein <carsten.klein@…>:

Would it not suffice to include an ETAG with a given resource?

We're talking about a general solution here, not just when Trac is serving those resources through the Chrome component.

by Remy Blank, 14 years ago

Version URLs for CSS and JS files.

comment:7 by Remy Blank, 14 years ago

Owner: set to Remy Blank
Priority: normalhigh
Release Notes: modified (diff)

9936-versioned-css-js-r10548.patch adds a query string to all CSS <link> and JavaScript <SCRIPT> URLs, of the form ?v={trac_version}. The interesting part is in chrome.py, the rest just transforms most <link> and <script> tags embedded in templates into the corresponding calls to add_stylesheet() and add_script(), respectively.

The interesting thing with using the Trac version in the query string is that:

  • Users who install from a release package (source or binary) have a version 0.x.y, and the query string change with every release.
  • Users who install from SVN have a version 0.x.ydev-r1234, and therefore the query string also changes with every update, even of a single revision.

Both Firefox and Chrome correctly cache the files, and re-get them when the version changes. Opera and IE8 display correctly, but I haven't been able to check their caching behavior. I'm pretty confident that it works correctly, though, as WordPress uses the same scheme.

Feedback and testing welcome.

in reply to:  7 comment:8 by Christian Boos, 14 years ago

Keywords: htdocs added

Replying to rblank:

… though, as WordPress uses the same scheme.

Wikipedia also.

Feedback and testing welcome.

Looks good! Applied on demo-0.13.

comment:9 by osimons, 14 years ago

Hold your horses, please… Any solution that don't account for plugins is a no-starter as far as I'm concerned. Upgrading plugins can of course happen without upgrading Trac…

Using a value based on timestamp of each file resource seems like an easy and general solution to me, but of course having to stat each file each time it is added means quite a bit of overhead. It can be eased by caching timestamps for files as we go, but then we'd really need to think about caching invalidation strategy too so that we don't arrive at the point where the only way of updating static resources is to restart the process(es).

comment:10 by osimons, 14 years ago

Cc: osimons added

in reply to:  7 ; comment:11 by Jun Omae, 14 years ago

Cc: Jun Omae added

Replying to rblank:

Feedback and testing welcome.

I need the solution using the timestamp of each file. Trac plugin developers maybe need it. I always say "Please force reload" to reporters each time I change my plugin.

I think that add_stylesheet should append v=VERSION when the prefix of filename parameter is common/.

  • trac/web/chrome.py

    diff --git a/trac/web/chrome.py b/trac/web/chrome.py
    index e05c988..6d04068 100644
    a b def add_stylesheet(req, filename, mimetype='text/css', media=None):  
    137137    """
    138138    if filename.startswith('http://') or filename.startswith('https://'):
    139139        href = filename
    140     elif filename.startswith('common/') and 'htdocs_location' in req.chrome:
    141         href = Href(req.chrome['htdocs_location'])(filename[7:], v=VERSION)
     140    elif filename.startswith('common/'):
     141        if 'htdocs_location' in req.chrome:
     142            href = Href(req.chrome['htdocs_location'])(filename[7:], v=VERSION)
     143        else:
     144            href = req.href.chrome(filename, v=VERSION)
    142145    else:
    143146        href = req.href
    144147        if not filename.startswith('/'):
    145148            href = href.chrome
    146         href = href(filename, v=VERSION)
     149        href = href(filename)
    147150    add_link(req, 'stylesheet', href, mimetype=mimetype, media=media)
    148151
    149152def add_script(req, filename, mimetype='text/javascript', charset='utf-8'):
    def add_script(req, filename, mimetype='text/javascript', charset='utf-8'):  
    159162
    160163    if filename.startswith('http://') or filename.startswith('https://'):
    161164        href = filename
    162     elif filename.startswith('common/') and 'htdocs_location' in req.chrome:
    163         href = Href(req.chrome['htdocs_location'])(filename[7:], v=VERSION)
     165    elif filename.startswith('common/'):
     166        if 'htdocs_location' in req.chrome:
     167            href = Href(req.chrome['htdocs_location'])(filename[7:], v=VERSION)
     168        else:
     169            href = req.href.chrome(filename, v=VERSION)
    164170    else:
    165171        href = req.href
    166172        if not filename.startswith('/'):
    167173            href = href.chrome
    168         href = href(filename, v=VERSION)
     174        href = href(filename)
    169175    script = {'href': href, 'type': mimetype, 'charset': charset}
    170176
    171177    req.chrome.setdefault('scripts', []).append(script)

(untested)

comment:12 by Christian Boos, 14 years ago

Plugins may adopt the same strategy as Trac, and produce URLs with their own version number.

We can indeed rethink the way we handle all these add_script/add_… functions and make them instead methods of some class which can be given a base version. Plugins would use their own instance with their own version (current functions remain unchanged for backward compatibility).

The proposal concerning the timestamp was already addressed by Remy in comment:3, we don't know in Trac where those files are (they may even be located on another machine).

in reply to:  9 ; comment:13 by Remy Blank, 14 years ago

Replying to osimons:

Hold your horses, please… Any solution that don't account for plugins is a no-starter as far as I'm concerned. Upgrading plugins can of course happen without upgrading Trac…

I know I'm guilty of asking for feedback, but somehow I would appreciate if it was a bit more constructive:

  • The proposed solution, while not taking plugins into account, is already much better than the previous situation. It doesn't change any public interface, has close to zero overhead, and solves the issue at least for core, which is already a bit part. As such, it is a clear net gain. Calling it a no-starter feels… inappropriate.
  • I'm of course open to improvement suggestions, in particular, how this could be extended to plugins. While the timestamp approach sounds nice and general, my feeling is that it will quickly become very complicated. Remember that finding the path to e.g. a .css requires going through the whole ITemplateProvider machinery, including looking into .egg files.

So at this point, I have a simple solution implemented that solves a good portion of the issue, and an idea to compare against that feels overly complicated. How about extending the simple solution by adding a version= argument to add_stylesheet() and add_script() that defaults to VERSION, and through which plugin authors could pass the version string of the plugin? Sure, it requires some work by plugin authors, and yes, it makes it more difficult to have the same code run on 0.11, 0.12 and 0.13. It's still better than the current state.

There's another unsolved issue, the images. Unfortunately, the URLs for those are often placed inside .css files, so it's not possible to dynamically add a query string there (except if we start generating CSS dynamically, I remember a discussion about that). Also, images tend to change much less frequently than CSS and JS, so I'm not too concerned if the solution doesn't take images into account.

(Mid-air collision with more feedback, I'll address those separately.)

in reply to:  11 comment:14 by Remy Blank, 14 years ago

Replying to jomae:

I need the solution using the timestamp of each file.

Can you be more specific as to why you need that (in particular, the for each file part, as opposed to a per-plugin version)?

I think that add_stylesheet should append v=VERSION when the prefix of filename parameter is common/.

I'm not sure I understand the reason for the patch you posted. Do you mean that I add the ?v=... to too many items, i.e. that it shouldn't be added if the file isn't in /chrome?

comment:15 by osimons, 14 years ago

Providing 'inheritable class-frameworks' that plugins can use is a sure way of guaranteeing that it will be handled incorrectly - and painful to support and revise both in Trac and in plugins. Grafting 'version' onto links manually just does not feel right, and we should not go that way.

Any installation that serves resources outside Trac, links to external resources, or serves Trac-deployed resources as static files, should also take care of their own content-delivery and make sure the files are cached and expired as they see fit. I don't see how that affects Trac itself?

Any file that is added and scoped for lookup inside Trac (ie. relative files) is a file that we can find and check the timestamp of. If myplugin.js is deployed and actually served outside Trac is of no importance - we stat the original that we have.

Going that way, using an Etag based on file stats would be just as useful and much more general - we Etag the files we serve from Trac when requested, and ignore the rest. Users that serve static files outside Trac are left to expire them as they see fit using own web server skills, content-delivery networks or whatever.

I'd say that using Etag is more useful than the proposed version idea.

in reply to:  15 comment:16 by Remy Blank, 14 years ago

Replying to osimons:

Providing 'inheritable class-frameworks' that plugins can use is a sure way of guaranteeing that it will be handled incorrectly - and painful to support and revise both in Trac and in plugins. Grafting 'version' onto links manually just does not feel right, and we should not go that way.

"Doesn't feel right" is a pretty weak argument. Anything more concrete? I can't help noticing that some major sites are using this technique, so discarding it solely based on a feeling… doesn't feel right either :)

I'd say that using Etag is more useful than the proposed version idea.

That may well be the case. Then again, for it to be really useful, somebody has to implement it.

/me is waiting for a patch to compare against.

in reply to:  13 comment:17 by osimons, 14 years ago

Replying to rblank:

I know I'm guilty of asking for feedback, but somehow I would appreciate if it was a bit more constructive:

Oh, so feedback needs to be constructive too now?! Heh, I do intend to be and sorry if it does not always come across that way.

  • The proposed solution, while not taking plugins into account, is already much better than the previous situation. It doesn't change any public interface, has close to zero overhead, and solves the issue at least for core, which is already a bit part. As such, it is a clear net gain. Calling it a no-starter feels… inappropriate.

See, that is where we sort-of disagree - and to a large part it has to do with how we define 'core'. You define it as 'whatever ships as Trac', while I define it as the sub-set of services that all modules use whether they are shipped with Trac (like wiki, ticket etc) or provided as plugins. Having to tweak and add cruft to individual modules to compensate for missing or inefficient core APIs is what I want to avoid.

There's another unsolved issue, the images. Unfortunately, the URLs for those are often placed inside .css files, so it's not possible to dynamically add a query string there (except if we start generating CSS dynamically, I remember a discussion about that). Also, images tend to change much less frequently than CSS and JS, so I'm not too concerned if the solution doesn't take images into account.

Etag.

in reply to:  15 ; comment:18 by Christian Boos, 14 years ago

Replying to osimons:

Any installation that serves resources outside Trac, links to external resources, or serves Trac-deployed resources as static files, should also take care of their own content-delivery and make sure the files are cached and expired as they see fit. I don't see how that affects Trac itself?

The problem we try to address here is to circumvent the overly aggressive caching done by web browsers which sometimes decide to ignore caching headers. For example, on t.e.o we serve the static resources via lighttpd, which properly sets the timestamps, ETag and what not. Yet this is not enough. There seem to be an emerging technique of appending fake parameters to such resource URLs as a way to force those browsers to do the right thing. Like it or not (my first reaction was not to find that particularly elegant either), but this seems to become common practice, and if this solves a real problem I don't see why we should not use it.

Any file that is added and scoped for lookup inside Trac (ie. relative files) is a file that we can find and check the timestamp of. If myplugin.js is deployed and actually served outside Trac is of no importance - we stat the original that we have.

stating files like that is likely to induce excessive overhead, or complexity if we want to do it in a smart way, as you originally mentioned. For no real benefit either, how often do you hack "live" the resources of a production system?

Going that way, using an Etag based on file stats would be just as useful and much more general - we Etag the files we serve from Trac when requested, and ignore the rest. Users that serve static files outside Trac are left to expire them as they see fit using own web server skills, content-delivery networks or whatever.

I'd say that using Etag is more useful than the proposed version idea.

We indeed don't add ETags in /chrome for the resource files Trac serves itself, this could be done. But this is a separate issue (i.e. at the risk of repeating myself, even if we would add proper ETags, the browsers are likely to over aggressively cache those files, Javascript in particular).

comment:19 by osimons, 14 years ago

All that would be needed is to add Etag support to Request.send_file() + tune Request.check_modified(). That should take care of chrome, attachments and whatever else serves files via Trac. Or am I missing anything? I've just used Etags but never really implemented them myself.

As for agressive caching, that sounds like it is somewhat based on hearsay and occasional bad experiences. As with the edewall.org, I too serve all my static resources outside Trac and never have a problem with static resources not getting updated using Etag and modified information.

My experience is that browsers generally handle this OK (when done correctly server-side), but that may of course not be the same for proxy servers that organisations may employ - they are often much more agressive. However, using strong Etag validation as we could do for static files may well improve on any weak validation ('W/...) that are currently supported in Request.check_modified().

in reply to:  19 ; comment:20 by Remy Blank, 14 years ago

Replying to osimons:

All that would be needed is to add Etag support to Request.send_file() + tune Request.check_modified(). That should take care of chrome, attachments and whatever else serves files via Trac. Or am I missing anything?

I was going to say that this wouldn't work with .egg plugins, but in fact it will, because static resources are extracted as files anyway. So this may indeed be sufficient. Care to provide a patch that we could try?

in reply to:  19 comment:21 by Christian Boos, 14 years ago

Replying to osimons:

As for agressive caching, that sounds like it is somewhat based on hearsay and occasional bad experiences.

This is the heart of the problem. After seeing that WordPress and Wikiedia all adopted this strategy, I thought this was most probably a real issue. But to be sure I did some tests myself with demo-0.13 and the first browsers I tested immediately picked the changes I made to trac.js file served by lighttpd, which uses a strong ETag.

As with the edgewall.org, I too serve all my static resources outside Trac and never have a problem with static resources not getting updated using Etag and modified information.

My experiments seemed to confirm this… until I tested IE9 and Opera 11 ;-) (could someone else repeat the tests with IE8?) These last browsers don't care at all about the ETag and the Last-Modified headers, for the .js files or .css files, until you press <F5> (reload).

So while Chrome, Safari and FF (at least recent versions of those) seem to behave sensibly, this is a real problem for IE and Opera. Still a sizeable part of the Internet audience, I suppose.

Last edited 14 years ago by Christian Boos (previous) (diff)

by osimons, 14 years ago

Alternative patch using headers only.

in reply to:  20 ; comment:22 by osimons, 14 years ago

Replying to rblank:

Care to provide a patch that we could try?

Sure, something like attachment:t9936-etag_files-r10542_012.diff was what I had in mind. It actually reduces LOC count by 4 :-) and seems to work in my testing at least. How it fares in the big world I don't know, but if anything I'm quite sure this strategy don't hurt and can only improve things.

Last edited 14 years ago by osimons (previous) (diff)

in reply to:  22 comment:23 by Christian Boos, 14 years ago

Replying to osimons:

Replying to rblank:

Care to provide a patch that we could try?

Sure, something like attachment:t9936-etag_files-r10542_012.diff was what I had in mind.

Looks good! Nitpick: I'd prefer weak=True instead of strong_validation=False though (immediately makes it obvious what 'W' stands for).

Though as I explained in comment:18 and confirmed in comment:21, this addresses a different issue than what this ticket was initially about.

Last edited 14 years ago by Christian Boos (previous) (diff)

comment:24 by osimons, 14 years ago

Opera 11 works nicely for me, it returns 304 as it should for every subsequent file requests. Are you using non-standard cache settings? As for IE8 I have stopped trying to understand how its developer tools work, so I cannot for the world manage to figure out if images responds with 200 or 304…

I don't agree that my patch addresses a different issue, but it does of course provide a different solution for improving things with regards to caching.

in reply to:  24 comment:25 by Christian Boos, 14 years ago

Replying to osimons:

Opera 11 works nicely for me, it returns 304 as it should for every subsequent file requests.

Ok, but does it also reload the resource when you modify it?

As for IE8 I have stopped trying to understand how its developer tools work

You don't need the developer tools: go to a Trac instance (e.g. WikiStart), then modify trac.js on the server in a visible way (e.g. add "Hello" before \u00B6 in addAnchor) and then go to a different Wiki page (or the same but by following a link, not by pressing reload): if you see "Hello", then the new trac.js was reloaded. If it only appears after an explicit reload, then you see the browser caching effect we're talking about.

I don't agree that my patch addresses a different issue

Sorry, it's not really a matter of opinion here. The issue is different: we're talking about browsers not respecting the ETag hints in case they are present, so adding ETags in places in which they were not already present is indeed something different.

comment:26 by Christian Boos, 14 years ago

Some further thoughts on the topic, after reading http://code.google.com/speed/page-speed/docs/caching.html. For static resources, we could implement more aggressive caching if we would set the Expires or Cache-Control: max-age at a date far in the future, yet control the refresh when needed with the version parameter. In that page, they refer to this technique as "URL fingerprinting".

in reply to:  22 ; comment:27 by Remy Blank, 14 years ago

Replying to osimons:

Sure, something like attachment:t9936-etag_files-r10542_012.diff was what I had in mind.

Looks good indeed, and should probably be applied (I do agree with Christian about weak vs. strong_validation, though). I even wonder why this wasn't done so in the first place. Then again, we did check If-Modified-Since, and this should already have been sufficient for .css and .js files, so this issue really is about browsers.

Another question: my patch changes most <script> tags into calls to add_script() (and <link> into add_stylesheet()). I assume this was historical, and the add_*() functions are the preferred method now, right?

in reply to:  27 comment:28 by Christian Boos, 14 years ago

Replying to rblank:

Replying to osimons:

Sure, something like attachment:t9936-etag_files-r10542_012.diff was what I had in mind.

Looks good indeed, and should probably be applied

The caveat with ETag based on file timestamps for static resources is that it becomes ineffective and even detrimental to the performance in case multiple hosts are used to serve the static content, as the ETag will be different depending on which host happens to serve the static request (Yahoo's performance tip about ETag). Probably not a common case in Trac deployments, but worth noting.

Another question: my patch changes most <script> tags into calls to add_script() (and <link> into add_stylesheet()). I assume this was historical, and the add_*() functions are the preferred method now, right?

Seems so, yes. However for some templates (e.g. source:trunk/trac/templates/diff_view.html) which are used in several places, this means we will have to remember adding the same set of .css and .js each time (ok, here only diff.css, but you get the idea).

in reply to:  26 comment:29 by Christian Boos, 14 years ago

Actually, it seems that the wide use of fingerprinting techniques is mostly because of this reason, making possible a "never expires" policy which drastically reduces the network load (Yahoo's performance tip about Expires), rather than to circumvent buggy browser behavior (although this will also fix that problem, of course).

Last edited 14 years ago by Christian Boos (previous) (diff)

comment:30 by osimons, 14 years ago

I see that fingerprinting is a recommended technique, but only when used as an argument in the filename or path for file. Many proxies and browsers are naturally restrictive about caching URLs with query strings. So contrary to intention, worst case could be that adding ?v=NNNN could actually increase both network load and request count by forcing a fresh copy each time.

in reply to:  30 comment:31 by Remy Blank, 14 years ago

Replying to osimons:

I see that fingerprinting is a recommended technique, but only when used as an argument in the filename or path for file. Many proxies and browsers are naturally restrictive about caching URLs with query strings. So contrary to intention, worst case could be that adding ?v=NNNN could actually increase both network load and request count by forcing a fresh copy each time.

According to Google's paper, proxies wouldn't cache a Trac site anyway, as we don't send Cache-Control: public headers. But we can also find a way to move the version into the path, if required. Placing it in the query string just makes for a simple implementation.

in reply to:  30 comment:32 by Christian Boos, 14 years ago

Replying to osimons:

I see that fingerprinting is a recommended technique, but only when used as an argument in the filename or path for file.

We could do that as well, though this would greatly complicate deployment.

Many proxies and browsers are naturally restrictive about caching URLs with query strings. So contrary to intention, worst case could be that adding ?v=NNNN could actually increase both network load and request count by forcing a fresh copy each time.

While it's probably true that this will disable caching by proxies ("Don't include a query string in the URL for static resources"), I suppose it will already be a huge saving to rely only on browser caches. We should nevertheless verify this with all the major browsers.

I'm confident it's not a problem in practice, otherwise heavily optimized sites like Wikipedia wouldn't do it.

comment:33 by Remy Blank, 14 years ago

There's one thing I still don't understand. We already process the If-Modified-Since header and return a 304 if it matches. What does the ETag bring (for static resources) that the If-Modified-Since header doesn't? And if the If-Modified-Since header isn't enough to avoid issues with CSS and JS caching, will ETag really help (in the patch, it will change at the same time as If-Modified-Since won't match anymore)?

comment:34 by Christian Boos, 14 years ago

There might be browsers which don't react appropriately on Last-Modified: but could handle ETag: better… according to Google's page-speed doc:

Last-Modified is a "weak" caching header in that the browser applies a heuristic to determine whether to fetch the item from cache or not. (The heuristics are different among different browsers.)

Note that it seems that we currently process the If-Modified-Since: but only send a 304 back, not a Last-Modified: header. Maybe this is also a glitch, which will be fixed by osimons' patch.

But at the risk of restarting the whole thread, let me reiterate this is not the main issue, as the problem with browser unduly caching resources persists for at least IE and Opera even when all those headers are set properly (for example when it's lighttpd or Apache serving directly when htdocs_location is set).

comment:35 by Remy Blank, 14 years ago

So… What's the currently-favored solution? I would really like to avoid having to explain to too many people how they should do Shift+Reload after upgrading to every new version.

comment:36 by Christian Boos, 14 years ago

I'd say we should first focus on the initial issue (see comment:21 for the wrap-up) and leave alone for now or make a separate ticket for the possible changes (fixes or optimizations) for the ETag and other headers, which are of separate concern. If someone still needs to be convinced that it's not a separate issue, then please re-read comment:18 and comment:25 and try to convince me that it's the same issue.

As for the technical solution, I still think we need something along the lines of comment:12.

In trac/web/chrome.py:

class HeaderBuilder(object):
    def __init__(self, version=v):
        self.version = v
    def add_link(self, req, ...):
        ...
    def add_stylesheet(self, req, ...
...
tracheader = HeaderBuilder(version=trac.version)

# Backward compatibility API
add_link = tracheader.add_link
add_stylesheet = tracheader.add_stylesheet

In module code (e.g. trac/wiki/web_ui.py):

from trac.web.chrome import tracheader
...
    tracheader.add_stylesheet(req, 'common/css/diff.css')
    tracheader.add_script(req, 'common/js/diff.js')

For a plugin, it would be trivial to do something similar:

plugheader = HeaderBuilder(version=plug.VERSION)
...
    plugheader.add_stylesheet(req, 'plug.css')

comment:37 by Remy Blank, 13 years ago

I would like to move forward with this, and implement URL fingerprinting, with the version in the path and not as a query argument (as suggested in the various resources listed above), together with a far-future expiry date. This will somewhat complicate serving static resources through the web server, but an AliasMatch directive (or similar) should solve this issue.

If we place the version at the beginning of the path (e.g. /chrome/0.13/common/js/jquery.js), this should even work with images referenced from Trac's .css files, as the latter use relative paths. The only issue would be referencing images provided by Trac from .css files provided by plugins. Any good ideas for this case?

comment:38 by osimons, 13 years ago

I've got a few comments about this, and somewhat unsure about the consequences. I maintain much own code, and many plugins, and they all reuse Trac static resources in various ways. Not to mention the fact that I rewrite URLs for thousands of static locations to be served outside of Trac.

  1. Embedding version in the path will be messy for anyone that deploys static resources. Particularly as some will have versioning, and some plugins may not, and for some locations like /site/ versioning will make no sense at all. Apache rewrite magic should still prevail - there are ways, it's just much more complicated.
  2. The referencing between static files as you mention is another complex issue, particularly plugins to Trac resources which is most common need. Like in my Blog plugin having to get wiki.css styles, or code.css if needed, I reuse RSS icons and so on. Theme engines, extensions and general theme plugins will also have to handle more variation - extending both Trac and common plugins.
  3. Every change in static resources won't necessarily be reflected in a version change. If you run Trac from a Git or Mercurial mirror (or svn export), the version may be reported as 0.13dev for close to ~2 years.
  4. Just like I turn off web server signature, I'm not keen to advertise version of Trac and plugins that I run. Primarily for security, but also because it isn't something anyone with interest in my projects need to care about.
  5. Applications such as news readers (RSS) stores the markup, and 'starred' or 'saved' items may be called upon to render itself again to user after versions have been upgraded. That won't work so well when none of the external resources can be found.

And more? Most likely. How could it be improved? Not sure, but I'd very much prefer it if the versioning was optional - like a trac.ini config value to enable it for Trac and plugins. When we don't control all the code, there may just be new holes and issues opened up that are beyond our ability to fix.

[chrome]
versioned_resources = false

in reply to:  38 comment:39 by Remy Blank, 13 years ago

Replying to osimons:

I've got a few comments about this, and somewhat unsure about the consequences.

Yes, me too…

  1. Embedding version in the path will be messy for anyone that deploys static resources. Particularly as some will have versioning, and some plugins may not, and for some locations like /site/ versioning will make no sense at all. Apache rewrite magic should still prevail - there are ways, it's just much more complicated.

Unfortunately, it's the only reliable way. Using the query string prevents caching, so it's definitely not a good solution.

  1. The referencing between static files as you mention is another complex issue, particularly plugins to Trac resources which is most common need.

I know, that's how I became aware of the issue. I hadn't thought about theme engines, so it's even worse than I thought.

  1. Every change in static resources won't necessarily be reflected in a version change. If you run Trac from a Git or Mercurial mirror (or svn export), the version may be reported as 0.13dev for close to ~2 years.

That's true. OTOH, people installing from a SVN checkout at least get a correct version.

We could also try to find another version-specific tag. For example, a single, global version number kept in the database, and incremented every time Trac or a plugin is updated. Or increment it with a trac-admin command, and mention it on TracUpgrade.

  1. Just like I turn off web server signature, I'm not keen to advertise version of Trac and plugins that I run. Primarily for security, but also because it isn't something anyone with interest in my projects need to care about.

Good point. We could replace the version number with a hash of the version, but it wouldn't really be more secure, as you just would have to compare against the hashes of a few known versions. We could hash the version number with a secret key (specified in trac.ini, and auto-generated at install-time).

  1. Applications such as news readers (RSS) stores the markup, and 'starred' or 'saved' items may be called upon to render itself again to user after versions have been upgraded. That won't work so well when none of the external resources can be found.

That's a bit an edge case, and it doesn't really work today, either. Though the resources are found, after upgrading they aren't the right ones.

But we could also simply ignore the version component when serving static resources. So you could ask for /chrome/1.2.3/common/js/jquery.js and you would still get the resource. The important part is that the links to the resources change.

And more? Most likely. How could it be improved? Not sure, but I'd very much prefer it if the versioning was optional

It's possible that trying to keep as much as possible of the current mechanism makes the problem harder than it is. Maybe we should re-think the serving of static resources from the beginning, and try to re-design it with versioning from the ground up.

Then again, I have the feeling that using a single, global version number, and having it auto-increment whenever the version of any component changes (checked at environment startup), could be doable. The links to static resources could then have the following form:

/chrome/123/common/css/trac.css
/chrome/123/tracfullblog/css/fullblog.css

This allows relative links from plugin resources to reference core resources without having to specify the version number. Trac should serve the same static resources for any version number. That is, all resources matching the following pattern should serve the same file:

/chrome/([0-9]+/)?common/css/trac.css

This would even be backward-compatible. I'm going to experiment with that idea and report back.

by Remy Blank, 13 years ago

Implemented fingerprinting for static resources.

comment:40 by Remy Blank, 13 years ago

9936-static-fingerprint-r10741.patch implements fingerprinting on static resources:

  • A hash of static resources is calculated by taking the SHA1 of the concatenation of the content of all available static resources, and taking the first 8 characters of the hex digest.
  • Static resources are served at URLs matching the following pattern:
    /chrome/(![0-9a-f]+/)?(?P<prefix>[^/]+)/+(?<filename>.+)
    
    If the fingerprint is present and matches the current hash, an Expires: header is sent with a date one year in the future. Otherwise, no Expires: header is sent. Other than that, the fingerprint is ignored.
  • Links to static resources are generated with the same pattern as above, with the hash determined at environment load time.
  • If [chrome] htdocs_location is set, no fingerprint is added to links using it.

This has some nice properties:

  • Fingerprinting is global and works for all resources below /chrome (core, plugins, site).
  • The previous URLs to static resources remain active, so the mechanism is backward-compatible.
  • CSS files from plugins can reference static resources from core, by using relative paths.
  • If the content of any file in the static resources changes (core, plugin, site), or if a file is added or removed, the fingerprint changes. This may generate more reloads than strictly necessary, for example when disabling or activating a plugin, but such events should be relatively rare.
  • For the new fingerprint to be calculated, Trac must be restarted. This is good practice anyway after an update (to core or a plugin), so it shouldn't be a limitation.
  • Serving static resources from the web server is still reasonably simple, by using an AliasMatch directive or similar, and simply ignoring the fingerprint.

AFAICT, this implementation addresses all of the concerns in comment:38. Also, it would be very simple to add an option to disable fingerprinting altogether, if desired. I may have missed a few implications, but on the whole, I'm quite happy with it. Thoughts?

comment:41 by Christian Boos, 13 years ago

Looks good indeed! My only concern so far is the time it will take to compute the static_hash… consider the TH:WikiExtrasPlugin for example and its collection of >3000 icons ;-)

Maybe we should only use stat information (file mtime and size), and forget about the pathological case of change of content without mtime and size modifications.

comment:42 by Remy Blank, 13 years ago

Release Notes: modified (diff)

Fingerprinting committed in [10769]. It can be controlled with the [trac] fingerprint_resources option:

  • content: Calculate the fingerprint from the resource contents.
  • meta: Calculate the fingerprint from the resource metadata (size and mtime).
  • disable: Disable fingerprinting.

I have measured the time to calculate the fingerprint with the th:WikiExtrasPlugin installed, which is a bit of a worst case. On my machine, it takes 1.9 seconds when using the content, and 1.4 seconds when using the metadata. As this is only apparent during environment startup, I hesitate to even keep the "meta" option.

I will update the documentation with the new functionality, especially the part about serving static resources from the web server.

in reply to:  42 ; comment:43 by Christian Boos, 13 years ago

Replying to rblank:

Fingerprinting committed in [10769]. It can be controlled with the [trac] fingerprint_resources option:

  • content: Calculate the fingerprint from the resource contents.
  • meta: Calculate the fingerprint from the resource metadata (size and mtime).
  • disable: Disable fingerprinting.

Looks great! Tested with IE and seems to work fine. The only caveat is that you shouldn't modify the deployed files (or not only those), but the source files, as of course the server doesn't have the possibility to read the deployed files.

I have measured the time to calculate the fingerprint with the th:WikiExtrasPlugin installed, which is a bit of a worst case. On my machine, it takes 1.9 seconds when using the content, and 1.4 seconds when using the metadata. As this is only apparent during environment startup, I hesitate to even keep the "meta" option.

I would have expected a bigger difference, but I think it's already noticeable. I also get 1.97s (content) vs. 1.27s (meta), which is appreciable enough. If you have a server setting where new server processes are recycled pretty regularly, I think this difference matters and will be noticeable. And maybe we'll find ways to improve the speed of meta even further.

We could also add other methods:

  • "servertime" would generate the hash from the server start time
  • "deploy" would compute the hash by content at deploy time and store it in the .ini file

I will update the documentation with the new functionality, especially the part about serving static resources from the web server.

In my setup (Apache), I had to do the following change, from:

Alias /trac/bct/chrome /packages/trac/virtualenv-0.13/share/htdocs

to:

AliasMatch /trac/bct/chrome/!([^/]*)/(.*) /packages/trac/virtualenv-0.13/share/htdocs/$2
Alias /trac/bct/chrome /packages/trac/virtualenv-0.13/share/htdocs

The second line is still needed for links generated via href.chrome(...).

We could think about adding "overrides" for Hrefs (e.g. (h.chrome = Href(h.chrome(Chrome(env).static_hash) for some "key" h values, like env.(abs_)href and req.(abs_)href). That way we'll cover all the resources and the first line should then be enough.


An alternative idea which would enable us to keep the previous Alias mapping and only that, is to complement the "deploy" idea described above by a step which would add a symbolic link !<static_hash> -> . to the deploy directory. OK, this would only work on Linux or on recent Windows system (via NTFS symbolic links or junction points), but it works well. There would be nothing to change while upgrading to 0.13 and it's also optimal in terms of performance as the hash is computed only once at deploy time. The only downside is that a deploy would be required after each software update of Trac or a plugin (but isn't the case already anyway?).

Last edited 13 years ago by Christian Boos (previous) (diff)

in reply to:  43 ; comment:44 by Remy Blank, 13 years ago

Replying to cboos:

I would have expected a bigger difference, but I think it's already noticeable. I also get 1.97s (content) vs. 1.27s (meta), which is appreciable enough.

Note that this is with a hot cache, where all the data is already in memory. I expect the difference to be much bigger if we have to hit the disk to read the content.

  • "servertime" would generate the hash from the server start time
  • "deploy" would compute the hash by content at deploy time and store it in the .ini file

Does "deploy" also handle static files from plugins? If that's the case, then yes, this would be a nice idea.

The second line is still needed for links generated via href.chrome(...).

Right, I would have forgotten that. Thanks for the notice.

We could think about adding "overrides" for Hrefs (e.g. (h.chrome = Href(h.chrome(Chrome(env).static_hash) for some "key" h values, like env.(abs_)href and req.(abs_)href). That way we'll cover all the resources and the first line should then be enough.

That sounds… complicated.

The only downside is that a deploy would be required after each software update of Trac or a plugin (but isn't the case already anyway?).

I serve static resources directly from /usr/lib/python2.7/site-packages/trac/htdocs, so I don't need the deploy step.

in reply to:  44 ; comment:45 by Christian Boos, 13 years ago

Replying to rblank:

  • "deploy" would compute the hash by content at deploy time and store it in the .ini file

Does "deploy" also handle static files from plugins? If that's the case, then yes, this would be a nice idea.

Yes, that's the case.

The only downside is that a deploy would be required after each software update of Trac or a plugin (but isn't the case already anyway?).

I serve static resources directly from /usr/lib/python2.7/site-packages/trac/htdocs, so I don't need the deploy step.

but that only works for Trac's own resources, not for those of the plugins, unless you have similar mappings for each plugin.

Other unusual deploy scenarios like the one we do here on t.e.o (use a single "fake" environment for deploy and reuse the deployed resources from different real environments) would also need to be adapted.

But in the common case, where there's a deploy done for each Trac environment, this approach looks optimal to me.

We could think about adding "overrides" for Hrefs (e.g. (h.chrome = Href(h.chrome(Chrome(env).static_hash) for some "key" h values, like env.(abs_)href and req.(abs_)href). That way we'll cover all the resources and the first line should then be enough.

That sounds… complicated.

Ok, maybe an actual patch will look less complex ;-) I'll try to come up with one.

in reply to:  45 comment:46 by Remy Blank, 13 years ago

Replying to cboos:

but that only works for Trac's own resources, not for those of the plugins, unless you have similar mappings for each plugin.

I thought there was an issue with deploy that prevented serving static resources of plugins from the web server, so I didn't even try. Or was that [trac] htdocs_location? Right, it was the latter (#9683).

comment:47 by Christian Boos, 13 years ago

Note that the configuration for Lighttpd (in combination with uwsgi) was slightly more annoying:

  • first, alias.url doesn't support regexps (it's like Alias in Apache), so you'd had to combine it with mod_rewrite's url.rewrite-once
  • second, with mod_uwsgi (i.e. what we use here on edgewall.org) it seems that alias.url doesn't work at all…

The solution I came up with was to use the url.redirect directive, e.g.

url.redirect = (
  "^/chrome/!([^/]*)/(.*)" => "http://www.edgewall.org/chrome13/$2"
)

Not sure if that's optimal, but it works.

comment:48 by Remy Blank, 13 years ago

Of course, the redirect is quite bad, as it adds a round-trip for every static resource. You could use the actual hash value in the alias.url, which would be much better, but you would have to remember updating the value on every update. This is probably acceptable when following "official" releases, but not when deploying from SVN.

in reply to:  48 ; comment:49 by Christian Boos, 13 years ago

Replying to rblank:

Of course, the redirect is quite bad, as it adds a round-trip for every static resource. You could use the actual hash value in the alias.url, which would be much better,

As I said, alias.url doesn't seem to work here, probably a limitation of mod_uwsgi.

in reply to:  49 comment:50 by Christian Boos, 13 years ago

Replying to cboos:

Replying to rblank:

Of course, the redirect is quite bad, as it adds a round-trip for every static resource. You could use the actual hash value in the alias.url, which would be much better,

As I said, alias.url doesn't seem to work here, probably a limitation of mod_uwsgi.

I'll try that recipe this evening…

comment:51 by Remy Blank, 13 years ago

See #10306 for an idea how to adapt trac-admin $ENV deploy to simplify the web server configuration.

comment:52 by osimons, 13 years ago

I'm still somewhat unsure about the practical consequences of this change. It will still be some time before I migrate my production towards 0.13, but thankfully the change contains disabled support… :-)

The way I read the code, it makes a hash from all enabled template-provider components. So, when I have project A B and C that all run from the same globally installed Trac and plugins code, they will make different fingerprints depending on what plugins are enabled in each environment? Enable an already installed plugin and the reloaded environment will have a new hash?

Further, as long as 'site' is part of the template providers, the existence of project-specific logos and other static files will in reality ensure that no two projects will really ever have the same hash?

If so, including the fingerprint in the file system becomes a really bad idea.

I'm quite sure the fingerprinting part works fine, but somehow feel that the deploy & runtime config side is somewhat 'sub-optimal'… Time will tell, I suppose.

in reply to:  52 ; comment:53 by Remy Blank, 13 years ago

Replying to osimons:

The way I read the code, it makes a hash from all enabled template-provider components. So, when I have project A B and C that all run from the same globally installed Trac and plugins code, they will make different fingerprints depending on what plugins are enabled in each environment? Enable an already installed plugin and the reloaded environment will have a new hash?

That is correct.

Further, as long as 'site' is part of the template providers, the existence of project-specific logos and other static files will in reality ensure that no two projects will really ever have the same hash?

That would be the case, yes. It's correct, if you think about it: when you edit a per-site CSS file, or change a logo, you want the clients to pick up the change without having to do a forced-reload. But I see how this could be a nuisance in the mass-hosting scenario.

I'm quite sure the fingerprinting part works fine, but somehow feel that the deploy & runtime config side is somewhat 'sub-optimal'… Time will tell, I suppose.

Better ideas welcome, of course.

in reply to:  53 ; comment:54 by osimons, 13 years ago

Replying to rblank:

Better ideas welcome, of course.

I think I can get most things to work by just matching the pattern and ignoring the actual hash in web server, but just don't make it even more complicated by writing the hash to the file system as you recently suggested (ref. #10306).

in reply to:  54 comment:55 by Remy Blank, 13 years ago

Replying to osimons:

but just don't make it even more complicated by writing the hash to the file system as you recently suggested (ref. #10306).

That's a shame, because it makes configuration much simpler in other cases. Maybe we should make that explicit with a command-line switch in trac-admin $ENV deploy, instead of only relying on the fingerprinting configuration.

comment:56 by Christian Boos, 13 years ago

I think the symbolic link approach is also interesting:

  • on deploy to /deploytarget, we copy everything there and compute the hash (e.g. 123added)
    • we create a symlink: /deploytarget/htdocs/!123added -> .
    • we create a /deploytarget/htdocs/LATEST file with the content 123added
  • only a simple HTTP server mapping of /chrome to /deploytarget/htdocs is needed, and will work for accessing a given resource either through links generated by add_stylesheet() (fingerprint-aware) or href.chrome() (not fingerprint-ware); it will also keep working for links still using a previously generated hash (symlinks are cheap, so you can keep much more entries than you would if you'd do full copies as suggested in #10306)
  • the Trac environment gets to know about the hash by reading the LATEST file ([trac] fingerprint_resources would need to point to that path, i.e. /deploytarget/htdocs/LATEST)

I think such a deploy scheme has the advantage of being fully backward compatible (the actual resources are still located directly below /deploytarget/htdocs) and for the fingerprinting support, this allow for single environment as well as for multiple environments in a single deploy location scenarios.

comment:57 by Remy Blank, 13 years ago

Last time I checked, symlinking on Windows was flakey at best (i.e. had strange side-effects). But on *nix, this may be the best solution.

What's the use of writing the hash to the LATEST file? Speeding up the loading of the environment?

We could also automate the deployment of static resources. If the deployment location was available e.g. in [trac] static_resources, we could check the deployed files against the ones in Trac and the plugins at environment startup, and update the deployed resources if necessary.

comment:58 by Peter Suter, 13 years ago

Sorry, I haven't really followed this too closely, but I just realized this hash calculation causes the recent tracd startup delay. Here it is quite noticeable as it takes over 10 seconds.

Apparently this is mainly because I left [inherit] htdocs_dir = empty, which os.walk in static_hash interprets as the current directory.

Is it always a configuration error to leave these *_dir settings empty? I assumed these were only required in more advanced setups.

in reply to:  56 ; comment:59 by Christian Boos, 13 years ago

Follow-up to 56:

  • on deploy to /deploytarget, …

Or better, the computation of the hash, the creation of the !hash symlink and the creation/update of the LATEST file (or better call it FINGERPRINT simply) could be done with a separate command:

trac-admin $ENV fingerprint /deploytarget

This could be used in simple scripts like:

trac-admin $ENV <<EOF
deploy $target
fingerprint $target
config set trac use_fingerprint $target/htdocs/FINGERPRINT
EOF

Externalizing the generation of the fingerprint value has other benefits:

  • we speed up the startup time, as this is typically a performance sensitive period, where you can easily have race issues (as you typically have many requests waiting for the server to become "reactive")
  • you could imagine completely different ways to generate the fingerprint:
    • using find+md5sum, or md5deep to generate a list of hashs and doing a md5sum on that list
    • using git or Mercurial to snapshot deployments and using the tip's hash as the fingerprint
    • etc. even an unsophisticated "generation" file would work (after each deployment or change in the htdocs, simply increments the number in the FINGERPRINT file)

All we would have to implement in Trac is some code to efficiently get the latest FINGERPRINT value, without having to constantly read or even stat that file…

in reply to:  59 ; comment:60 by Remy Blank, 13 years ago

Replying to cboos:

  • we speed up the startup time, as this is typically a performance sensitive period, where you can easily have race issues (as you typically have many requests waiting for the server to become "reactive")

If there are race conditions, we should fix them. They shouldn't be an "excuse" for optimization. We could also simply put the fingerprint into trac.ini and update it there.

  • you could imagine completely different ways to generate the fingerprint:

Isn't that slightly over-engineered? We don't really care about the fingerprint itself, as long as it changes when the content changes.

in reply to:  60 comment:61 by Christian Boos, 13 years ago

Replying to rblank:

Replying to cboos:

  • we speed up the startup time, as this is typically a performance sensitive period, where you can easily have race issues (as you typically have many requests waiting for the server to become "reactive")

If there are race conditions, we should fix them. They shouldn't be an "excuse" for optimization.

I don't think that race is easily fixable, as it's inherent to the situation: when you have a startup procedure P that takes a time T, then either:

  1. you implement some kind of mutual exclusion so that P is executed only once; in this case you have one client request performing P, and all the others waiting for a time T
  2. you allow for concurrent runs for P; this is not necessarily leading to a problem if done carefully, think about what we do for initializing the Components (r10295), it's just a waste of resource and possibly leads to a n*T delay (or even much longer if the load accumulates, just like what we had in the past with #9111)

One possible way to work around the problem would be a variation on 1., where the other clients would just use the previous value of the fingerprint (or a random one). But still, the overhead of checking for the file lock is not negligible either…

For these reasons I'd much prefer that we do an offline generation of the fingerprint.

We could also simply put the fingerprint into trac.ini and update it there.

That would be a simple alternative, yes (and finishing #7378 would help to raise the acceptance of an automatic modification of the trac.ini…).

  • you could imagine completely different ways to generate the fingerprint:

Isn't that slightly over-engineered? We don't really care about the fingerprint itself, as long as it changes when the content changes.

Just tossing out ideas ;-) But yes, in the end having trac-admin $ENV deploy just update the fingerprint for $ENV in its trac.ini file is certainly the less troublesome solution. We would just need to find a way to propagate that fingerprint to other environments, in case multiple environments share the same deployment target. This is where having a "FINGERPRINT" file would be slightly more convenient, as the other environments would automatically react on a change.

comment:62 by anonymous, 13 years ago

if htacces is a viable solution, the guys at htm5boilerplate came up with a rule that scans say "style.1234567.css" and redirects to "style.css" while leaving requests to "styles.css" alone (backwards compatible). This way, the actual fingerprint is irrelevant (can be whatever and doesn't impact code), and the browser cache can be busted easily by including it.

comment:63 by Remy Blank, 13 years ago

#10315 requests better documentation for setting up the web server to serve static content.

comment:64 by lkraav <leho@…>, 13 years ago

Cc: leho@… added

comment:65 by trac-tickets@…, 13 years ago

Cc: trac-tickets@… added

comment:66 by xin20111119nix@…, 13 years ago

Cc: xin20111119nix@… added

in reply to:  6 comment:67 by Carsten Klein <carsten.klein@…>, 13 years ago

Replying to cboos:

Replying to Carsten Klein <carsten.klein@…>:

Would it not suffice to include an ETAG with a given resource?

We're talking about a general solution here, not just when Trac is serving those resources through the Chrome component.

ETAG is a fundamental part of the HTTP standard. Apache, when being used for servicing static content, or better yet NGINX, will generate those ETAG headers automatically.

Why reimplement the wheel and revert to old work arounds the world already has come over with due to the availability of and widespread support for ETAGs?

in reply to:  18 comment:68 by Carsten Klein <carsten.klein@…>, 13 years ago

Replying to cboos:

The problem we try to address here is to circumvent the overly aggressive caching done by web browsers which sometimes decide to ignore caching headers. For example, on t.e.o we serve the static resources via lighttpd, which properly sets the timestamps, ETag and what not. Yet this is not enough. There seem to be an emerging technique of appending fake parameters to such resource URLs as a way to force those browsers to do the right thing. Like it or not (my first reaction was not to find that particularly elegant either), but this seems to become common practice, and if this solves a real problem I don't see why we should not use it.

The "emerging technique" is actually quite old and was a work around in these times due to missing ETAG support in both the available servers and the user agents. If you find user agents still not behaving well then do not use them. It is also the responsibility of the server, i.e. waiting for 100 CONTINUE requests from the client after sending out the initial set of headers. Perhaps lighthttp does not send these as required? Some user agents will also first HEAD a given resource prior to GET. Perhaps you are using a user agent that will simply GET, with lighthttp servicing the whole file instead of waiting for the client to 100 CONTINUE after the headers and so on…

Believe me, adding a randomized part to the filename was a work around in ancient times and should no longer be necessary, except for obuscating the urls.

comment:69 by Christian Boos, 13 years ago

Please re-read comment:21. In short, we also use ETag, it works, except for some version of IE9 and some version of Opera I tested at that time.

But I came to think that we shouldn't care about these browsers. The approach implemented in r10769 proved to be more painful than it was worth it, to set things up correctly on the server side. Also the current technique for getting the hash at startup is way too slow (even the meta one). I'd strongly advocate to use disabled as the default.

Last edited 13 years ago by Christian Boos (previous) (diff)

comment:70 by Remy Blank, 13 years ago

Some anecdotal evidence:

  • With every release, we get a number of reports of "the page not looking right", especially major releases. This is almost always solved by using Shift+Reload to force a reload of CSS and JS.
  • While developing, and switching between my 0.12-stable and trunk workspaces, I so regularly see this issue that I now automatically always hit Shift+Reload instead of reload. I'm using recent browsers, and all of them exhibit this behavior (albeit not always consistently).
  • The technique is recommended by Google: http://code.google.com/speed/page-speed/docs/caching.html
  • A few other references to URL fingerprinting:

While none of this is authoritative, I do feel that ETag doesn't solve the issue at hand, and something needs to be done about it.

As for calculating the hash, note that the time is only spent at process startup and not on every request, so it's relatively infrequent (unless you use CGI, but then you have more pressing issues). We have measured the impact already (see comment:42 and comment:43). I had even tested with the content technique and lots of files (with the WikiExtrasPlugin installed, i.e. >7000 files) and even then startup time was ok-ish (around 3 seconds IIRC).

Disabling fingerprinting by default will have exactly no impact at all, as we will continue getting reports from users and tell them to ask their admin to enable fingerprinting. So if you feel strongly about it, we should rather revert [10769] completely (and no, I don't feel strongly about it, so that outcome would be perfectly ok to me: tried something, didn't meet expectations, reverted).

in reply to:  70 comment:71 by Christian Boos, 13 years ago

Replying to rblank:

As for calculating the hash, note that the time is only spent at process startup and not on every request, so it's relatively infrequent (unless you use CGI, but then you have more pressing issues).

No, it's unfortunately not only for CGI. In a lot of setups, you have some kind of automatic server process "recycling" (e.g. Apache's MaxRequestsPerChild != 0, uwsgi killing processes after a timeout, etc.). The net effect is that you'll hit at random times a restarting server …

We have measured the impact already (see comment:42 and comment:43). I had even tested with the content technique and lots of files (with the WikiExtrasPlugin installed, i.e. >7000 files) and even then startup time was ok-ish (around 3 seconds IIRC).

… and then 3 seconds is way too much (yes, TH:WikiExtrasPlugin is a must have ;-) ).

Disabling fingerprinting by default will have exactly no impact at all, as we will continue getting reports from users and tell them to ask their admin to enable fingerprinting. So if you feel strongly about it, we should rather revert [10769] completely (and no, I don't feel strongly about it, so that outcome would be perfectly ok to me: tried something, didn't meet expectations, reverted).

Well, I do feel strongly about the hit at startup time, for the reasons explained above. Also, for Trac development it's quite painful: I could disable it, but as it is useful, I keep it enabled and each time I restart the server I have to swear ;-)

There are also the concerns about concurrent startup I explained in comment:61 (notably [point 2., the performance impact when you have lots of server processes each trying to get the fingerprint).

So for my point of view, the ideal solution would be to have fingerprinting with an alternative fast method (at deploy time, or using a FINGERPRINT file - the problem is that I don't see how to make that a default since it probably involves some kind of administrative action).

Maybe meta as the default and a file method (with a fingerprint_file option) would be a good compromise: "tolerably" correct and slow default, a way to make it exact (content method) but even slower, and a way to make it fast and correct (file) at the price of some extra setup.

comment:72 by osimons, 13 years ago

Note also that the hash calculation is performed not just once per process, but once per project per process. I usually run 5-8 workers on a server (that will also recycle + increase/decrease depending on traffic), and each worker will serve the thousands of projects I'm hosting. So startup time is by no means insignificant.

The other big problem for me is the totally unusable deployment functionality for such a hosting scenario. With each project in theory having a unique fingerprint (due to differences in logo and mix of plugins enabled), deploying anything to filesystem that includes the hash or per-project information will just be a no-go for me. Add a new logo to the project (I have an "upload logo" admin feature too), and the fingerprint becomes invalid. Littering the file system with symbolic links will just add complexity.

I will set this to disabled regardless, so personally it is no big deal. However, it will be no fun to support this feature on IRC and mailing lists when we already get a significant number of questions related to the current 0.11/0.12 'deploy' command that leaves people rather bewildered.

I think the feature is over-engineered, but you all knew my opinion on that already, I guess… ;-) It creates more problems than it solves.

comment:73 by Remy Blank, 13 years ago

The net effect is that you'll hit at random times a restarting server …

Yes, so about once an hour you'll wait for 1.9 additional seconds for your reply. Unless you use several processes, then you'll only wait for the usual 5 seconds.

Ok, so I think we are in agreement. Let's revert [10769] and find a better solution.

I don't know about over-engineered, I still haven't seen a simpler solution that solves the initial issue. The problem statement is simple: ensure that resource files that change on the server are reloaded timely in the browser. When do we know that a file has changed? When its content changes. How can we force the browser to reload a resource? By changing its URL (and no, ETag unfortunately doesn't fit the bill). So let's add a hash over all the resource content to the URL. That's pretty much the simplest solution that I can imagine. We could have calculated per-resource hashes, but I'm not sure this would have been simpler. Requiring the deploy step does start to seem complicated even to me, so here I agree, but that's now what we have now.

Now, if by over-engineered you mean "anything that is different from what we had before", then yes, it's over-engineered.

in reply to:  73 comment:74 by Christian Boos, 13 years ago

Replying to rblank:

The net effect is that you'll hit at random times a restarting server …

Yes, so about once an hour you'll wait for 1.9 additional seconds for your reply. Unless you use several processes, then you'll only wait for the usual 5 seconds.

People only take notice when it's slow. OK, I know we currently have even more serious performance issues, but let's not add more of them.

Ok, so I think we are in agreement. Let's revert [10769] and find a better solution.

I don't think we should necessarily throw everything away, it's only the hash generation that is problematic. If we find a way to make it correct and fast as a default, all the better. But at worst, even if we have a default of disabled for general use (a.k.a. the status quo), it would be useful to have a kind of developer mode (e.g. my file suggestion or even a tracini value that make it take the hash of the timestamp of trac.ini, say).

Now, if by over-engineered you mean "anything that is different from what we had before", then yes, it's over-engineered.

Then by this definition, I'm all for over-engineering ;-)

comment:75 by osimons, 13 years ago

I don't think 'developer mode' is a particularly good argument for this infrastructure. Firefox Web Developer toolbar (or any other alternative) → Disable Cache. Done.

The 'over-engineering' remark was not just about 'different', obviously. No doubt theoretically sound, but the complexity of its deployment and usage just don't balance in real usage scenarios. If I restart Apache and it spawns minimum 5 workers, each load 2500 projects, and each project scan takes 2 seconds, then 5*2500*2 = 25000 seconds is spent before all projects can be accessed in all workers. And even doing trac-admin $ENV deploy in any form for 2500 projects to have project-specific trac.ini files updates is also highly unreasonable. Throw plugins into the mix, and adding project resources in each project htdocs directory in running projects, and the feature is just not manageable for the average admin.

Now, I may be on the extreme end of usage, but a few hundred projects is not uncommon for internal corporate usage or class / assignment-based projects in educational institutions.

in reply to:  75 comment:76 by Christian Boos, 13 years ago

Replying to osimons:

I don't think 'developer mode' is a particularly good argument for this infrastructure. Firefox Web Developer toolbar (or any other alternative) → Disable Cache. Done.

When I hack the CSS, I usually test against the whole "menagerie" (IE, Opera, FF, Chrome, Safari) each with its own way to disable the cache… but OK, I'll concede it was not a very strong argument ;-)

The point is that there are setups in which using a fingerprint would be useful and doable (like the setup with have here on edgewall), so I'm a bit reluctant to remove that possibility altogether. It's just that I don't think it's appropriate to go with the content or meta strategy by default.

Also, that way, if one day you come up with a way to generate a hash that would be appropriate in your large scale setup, then you could also benefit from fingerprinting (and surely that will be a plus as you wouldn't have to tell your 2500*n users to press shift+reload ;-) ).

For example, wouldn't that tracini mode also work in your case? Changing the logo involves a trac.ini save, adding new plugins implies enabling components, and for a major Trac version upgrade, you could run a find -name trac.ini -exec touch {} ;. Ok, the downside would be an extra server load when the trac.ini is modified for other reasons.

comment:77 by Remy Blank, 13 years ago

I still think restarting from a clean slate is the best option:

  • The hash calculation is an issue.
  • The static resource deployment is an issue.
  • The web server configuration is an issue, when it is configured to serve static resources.
  • The configuration of the functionality in Trac is too complex (frankly, anything else than enabled / disabled is too complicated IMO; I agreed to have meta only because it was the only way to get some exposure for the functionality).

There's not much left of [10769] if you remove all that. Let's just start again from the problem statement and work from there.

comment:78 by osimons, 13 years ago

The way this is heading, I'm thinking we should split this out into an API - something like IStaticResourceLinker to allow replacing the simple default (no-fingerprint) implementation of link generation, or IStaticResourceLinkManipulator to allow plugins to modify the simple default link suggestion for all or just a subset of links.

Then reuse the fingerprint code as basis for an external plugin or an optional tracopt plugin.

Redoing it as API would also make it easier to integrate Trac into other frameworks and applications - ask for common/trac.css and the plugin could change it to /static/django-site.css or whatever.

comment:79 by Christian Boos, 13 years ago

I have nothing against trying a new approach, though I don't have much to propose myself.

In the meantime, I suggest the following simple improvement over the existing:

  • trac/web/chrome.py

     
    344344        """Automatically reload template files after modification.""")
    345345
    346346    fingerprint_resources = ChoiceOption('trac', 'fingerprint_resources',
    347                                          ['content', 'meta', 'disabled'],
     347                                         ['disabled', 'content', 'meta',
     348                                          'tracini'],
    348349        """Control the fingerprinting of static resources.
    349350
    350351        URLs to static resources below `/chrome` have the form
     
    353354        ''metadata'' (size and mtime, for "meta"). This allows aggressive
    354355        caching of static resources on the browser, while still ensuring that
    355356        they are reloaded when they change.
     357
     358        Alternatively, a less precise but much faster mode called
     359        "tracini" can be used, which takes the timestamp of the
     360        trac.ini for generating the hash. One must therefore ensure
     361        that this file gets touched at appropriate time.
    356362
    357363        Setting this option to "disabled" disables fingerprinting, and
    358364        reverts the URLs to static resources to `/chrome/.*`.
     
    632638            def update(path):
    633639                st = os.stat(path)
    634640                hash.update(str(st.st_size) + str(st.st_mtime))
     641        elif self.fingerprint_resources == 'tracini':
     642            hash = sha1()
     643            hash.update(str(self.config.lastmtime))
     644            return '!' + hash.hexdigest()[:8]
    635645        else:
    636646            return None
  • no performance impact by default, status quo
  • opt-in for the meta/content fingerprinting
  • and a fast mode for me ;-) (and t.e.o)

by Remy Blank, 13 years ago

Attachment: 9936-revert-r10960.patch added

Revert [10769].

comment:80 by Remy Blank, 13 years ago

Here's my suggestion: 9936-revert-r10960.patch. Let's get this over with and get things moving again. The initial patch can still be re-used partially for another approach.

Ok to apply?

comment:81 by Christian Boos, 13 years ago

Well, I don't really see why you insist on going back to the status quo on trunk before coming up with the "new solution". Applying my patch from comment:79 doesn't prevent someone to try another approach, and in the meantime we have a working solution (probably imperfect, but that's also what we had for months on trunk). Going back to pre-r10769 won't motivate me to come with another solution, like I already said at this point I have no better idea. With the upcoming changes from #10012, it would really be useful to have a working solution on t.e.o.

in reply to:  81 comment:82 by Remy Blank, 13 years ago

Replying to cboos:

Well, I don't really see why you insist on going back to the status quo on trunk before coming up with the "new solution". Applying my patch from comment:79 doesn't prevent someone to try another approach, and in the meantime we have a working solution (probably imperfect, but that's also what we had for months on trunk).

For several reasons: the approach clearly is a dead-end (confirmed by the discussion here), and it introduces unnecessary complexity upon which a new approach would have to build. I would even go as far as to say that any new approach would be influenced by the old one in that case, and I feel this shouldn't be. And even if a new approach did want to build on the old one, nothing prevents applying the patch as a first step.

I also don't expect rapid progress on this, due to our being out of ideas, so I feel that we shouldn't leave a half-baked idea that nobody is happy about in trunk. It makes life more difficult for everyone installing from trunk, and has been lingering there for too long already (my fault, of course, but I'm trying to get things straight again).

comment:83 by Christian Boos, 13 years ago

Ok, I understand the concerns now. So let's revert this to allow for other approaches.

On my side, I'll propose a patch which builds on r10769 and keeps the static_hash part (which I'm happy with), and reworks the hash generation to something more flexible yet simpler.

Note that the method and rate of change of the fingerprinting make us wander on a two-axis plane, one axis I'd call "false negatives" (how frequently browsers didn't reload when they should have) and the other "false positives" (how frequently we trigger a reload when it was not needed).

  • With the status quo (0.12 or no fingerprinting), we have of course no false positives, but we have the maximal value possible for the false negatives (i.e. the current problem of undue caching)
  • Ideally we should have neither false positive, nor false negatives; that could be reachable:
    • if browsers would react on ETag like we hoped they would…
    • if we had a per-file fingerprinting based on content (might be too expensive? see also comment:9 and comment:3)
  • The 'content' method was a way to get no false negatives, but at the expense of a prohibitive cost at start-up, and also had a high false positive rate (a change to any resource would trigger the reload of all the others)
  • The other methods I proposed all have in common to have an even higher false positive rate (it's also a global change which would trigger the reload of all resources, and that change can even have nothing to do with a resource change), combined with a risk of false negatives which entirely depends on the procedure used…

As bad as this last option sounds, I nevertheless like it because if done properly as part of the deploy procedure, it can be very effective and fast. It's clearly an "advanced configuration" though, and as fingerprinting often implies adding some extra mappings in the web server configuration, this shouldn't be enabled by default.

So my next patch will introduce a PathOption to replace the ChoiceOption, and it will allow one to specify:

  • a file like "trac.ini", or some global path; we'll then use the timestamp of that file as the trigger; if we also modify deploy to always update the timestamp of the htdocs folder, using the path to the latter would be perfect for deploy-based use cases
  • or an hex value (used directly as the hash, allowing for the use cases described in comment:59),
  • or some other value (anything not a file, not an hex number and not empty) to create a hash based on the current time at first call (i.e. once per environment reload cycle; high false positive rate, moderate false negative risk in case of "live" change on some resources without a server restart, but otherwise a no brainer)
  • if left empty, no fingerprinting happens (the default)
Last edited 13 years ago by Christian Boos (previous) (diff)

by Christian Boos, 13 years ago

Applies on r10960 (i.e. still on top of r10769)

by Christian Boos, 13 years ago

Same as the other, but will apply once r10769 gets reverted… (i.e. it re-adds the parts I wanted to keep)

comment:84 by Remy Blank, 13 years ago

Owner: changed from Remy Blank to Christian Boos
Release Notes: modified (diff)

[10769] reverted in [10966]. Over to you, Christian ;)

comment:85 by Christian Boos, 13 years ago

Thanks! Let's see where this heads to.

Note that another point in favor of the "explicit" setting of the fingerprint versus the generating it from the 'content' is that the resources scanned for generating the hash are not necessarily those served (comment:3).

Now that I'm satisfied with the speed issue, the last remaining problems for me are:

  • the few URLs generated directly using href.chrome() which don't contain the fingerprint
  • the form of the URLs which requires a server-side mapping; are we really sure we can't simply use a query parameter? At least for WordPress, the usage of ?ver=... seems to be common and the browsers cache resources having such URLs

If we manage to make it possible to override href.chrome() behavior (was that what comment:78 suggested?), I think we could solve those two remaining issues.

comment:86 by j.beilicke@…, 13 years ago

Cc: j.beilicke@… added

comment:87 by Remy Blank, 12 years ago

Milestone: 1.01.0-triage

Preparing for 1.0.

comment:88 by Christian Boos, 12 years ago

Milestone: next-stable-1.0.xnext-major-releases
Type: defectenhancement

Go back to it at a later stage.

comment:89 by Christian Boos, 12 years ago

Keywords: css js added

#10797 was closed as duplicate (well, it actually shows how people could be bitten by this).

comment:90 by Ryan J Ollos, 10 years ago

Owner: Christian Boos removed

comment:91 by Jan Beilicke <j.beilicke@…>, 9 years ago

Cc: j.beilicke@… removed

comment:92 by Massimo <massimo.b@…>, 6 years ago

Cc: massimo.b@… added

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
The ticket will be disowned.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.