Context Navigation

← Previous Change
Wiki History
Next Change →

0.11.5

Timestamp:: Aug 7, 2009, 10:46:03 AM (15 years ago)
Author:: Christian Boos
Comment:: start with content of Shane Caraveo's mail on trac-dev ( googlegroups:trac-dev:a48473cd1dcfbd7c)

Legend:

: Unmodified
: Added
: Removed
: Modified

TracDev/Performance/0.11.5

               v1
+= Analysis of Performance of Trac 0.11.5 =
+ ''initial content from a mail authored by Shane Caraveo's on trac-dev (googlegroups:trac-dev:a48473cd1dcfbd7c)''
+I've recently reduced per request time for *simple* template generation
+from 150ms to 45ms on my laptop (dual 1.6ghz core duo 2GB, tested with
+ab against a simple template).  I also reduced page load times (reported
+via google page load/firebug) to around a second (from 2-8 seconds in
+some cases).  These timings are without any network overhead since I'm
+testing on the local machine.  Times are all taken on my laptop with my
+typical environment/applications open (lots of crap).
+This is all done with 11.5.
+Even with the gains I still get 100% cpu spikes every request, but at
+least it's a shorter duration.  Still digging further on that.  I also
+have done a bit on overall browser page load.
+Here's my current brain dump.  If some of my conclusions are silly,
+please let me know why.
+== General ==
+In general there are only a couple big wins.  For me it was removing
+gc.collect (see trac.main) and the timing and estimation plugin.
+Everything else was small potatoes in comparison (10ms here, 5ms there),
+but together they added up to a good 40-50ms per request.  Think of it
+this way, using 100%cpu and 50ms/request limits you to a max of 20
+requests/second/cpu.  Every ms counts if we want decent throughput.  I'd
+like to get under 30ms.
+=== How I timed ===
+My test setup is apache prefork with wsgi deamon mode, one process,
+mod_deflate enabled.  My laptop is pretty loaded, very little free
+memory but generally enough that trac shouldn't be swapping.  Postgres
+is the database, running on the laptop as well.
+For timing template generation I used ApacheBench and tossed out
+"warmup" requests, sometimes testing with keep alive (getting slightly
+better req/s)
+{{{
+ab [-k] -c 1 -n 10 url
+}}}
+With zero network latency, the timing is essentially how much time trac
+is taking to generate the content.  I tested against a simple template
+generated from a small plugin that just renders a template file (this is
+to involve as little other processing such as querying the ticket table
+or permissions), and against WikiStart.
+For "page load" I used Google Page Load extension for firebug and
+firefox 3.5.  I also tested with Safari 4 which has similar
+functionality now, it's much faster than firefox  :(
+I tested with plugins enabled since I wanted to also discover whether
+any plugins were causing me grief, some were.
+I tested against empty projects, so table indexing or table size are not
+involved.  I also tested against an small sized project in some
+instances, which showed up issues in a couple plugins (eg. t&e).
+=== Profiling issues ===
+It seems to me that cProfile adds times from generators to the function
+that uses them, masking where the real time is spent.  That makes
+profiling Genshi terribly difficult.  I also tried dtrace out, it gives
+the same rough data as cProfile.  Knowing a tiny bit about the python
+engine (having written a c level debugger extension), I am figuring that
+it has something to do with how generators work in the python core.
+A couple areas that cProfile claimed a lot of exclusive time in Genshi I
+rewrote in C.  This didn't change my performance at all, but did shift
+timing to other area's which helped me find a couple of the actual time
+sinks in Genshi (see below).
+== For general performance ==
+Use either a multithreaded apache if you use mod_python, or use wsgi in
+multithreaded deamon mode.  It wont remove high-cpu slow requests, but
+it does lower memory use to some extent.  While that didn't improve the
+per-request performance, I got higher concurrent throughput with that
+setup.   Apache prefork with mod_python or wsgi in embeded mode will
+chew up memory, and due to high cpu usage, processes will be fighting
+each other more for resources.
+=== customizations and plugins ===
+The single largest gain I got with plugins was by removing timing and
+estimation plugin (primarily seen in ticket reports page where it does
+another sql query for each ticket, doubling the request time).  It's
+stream filters slow lots of stuff down.  Removing other plugins with
+stream filters or reworking them to not use stream filters when possible
+is a win.  In some instances, TracHours plugin is a better solution if
+you don't need all the t&e features.
+Get rid of stuff like the google analytics plugin and just add the code
+to a site or theme template.  You'll get better performance.  Its not
+that it's a bad plugin, but it does something simple that adds *genshi*
+overhead.
+Examine any custom templates for extraneous py:match use, get rid of it.
+I had one match in the site template that was removing something from
+an admin panel, which I had forgotten about, big mistake and it cost on
+the performance of *every* page.  I directly patched the template
+instead and saved myself 10ms per request.  This kind of thing doesn't
+show up in the profiler, it was some debug dumps I inserted deep into
+the Genshi bowels that showed me the light.
+Make sure all static resources for all plugins are exported.  trac-admin
+deploy will only deploy static resources for enabled plugins in that
+project.
+=== javascript ===
+This is mostly about page load times.
+minify all the javascript output by trac-admin deploy, and make sure it
+gets cache headers via apache conf.
+Upgrade to jquery 1.3, it's much faster.  trac trunk has done this, and
+there is a diff somewhere that shows what type of changes have to be
+made.  You'd have to examine any plugins for similar jquery updates that
+need to be done.
+If you use firebug, be sure you have disabled it (or at least disable
+the console tab) for trac pages, the console somehow interrupts and adds
+a couple seconds delay in the browser.
+=== css ===
+I haven't got to this yet, but there is a lot of slow css there.
+Removing all the css as a test, my page load in the browser was 200ms
+faster.  Google Page Load is good at reporting what css *might* be slow,
+it reports lots of inefficient css in trac.
+== trac issues ==
+=== gc.collect ===
+The next big item was removing gc.collect.  If I recall correctly,
+gc.collect took at least 50ms of the request time, sometimes upwards of
+ms.  It also hides other bugs (ticket:8443 for example, the postgres
+idle bug came back without gc.collect, I've got a patch in that bug).
+gc.collect could be done in a background thread from tracd, but I
+wouldn't suggest background threads for mod_python prefork.  With wsgi
+daemon mode, just limit the number of requests before restart to
+something you think is fine and test memory consumption.  I think the
+use of gc.collect should be seriously reconsidered.
+I have a sneaking suspicion (unproven) that people who use mod_python
+and claim turning off keep alive and/or mod_deflate are having problems
+due to gc.collect.  As I understand apache filters (e.g. mod_deflate)
+they wont finish up the request until the mod_python handler has
+returned.  So that extra time in gc.collect delays the request being
+returned, which delays mod_deflate finishing.  It also delays the start
+of the next request over a persistent socket connection (ie. keep alive).
+=== filters ===
+Going through all the match_request implementations and removing
+permission checking (put it in process_request), make sure regex matches
+are precompiled, and generally simplifying things helps.  There are a
+number of those in trac core, but plugins are greater abusers in this
+area.  Also examine any IRequestFilter use and simplify.
+=== trac.config ===
+Other than Genshi, profiling showed trac.config to be the single largest
+time on simple template generation.  Profiling showed me that the get
+method in the Configuation class (trac.config.Configuration) was slow.
+I added caching there for a few extra milliseconds boost.  I'm also
+looking at removing the auto-reload if the ini file changes, maybe using
+spread or memcached to create reload notifications, to get rid of file
+stats, but timing doesn't show this to be a large issue on my laptop.
+=== repository ===
+While I still want to remove the sync on every request (get rid of
+system calls/file stats), I have been unable to show that performance
+changes much when I test with it removed.  There are still bigger fish
+to fry.
+=== database pool ===
+Watching the postgres log file, I can tell that a lot of cursors get
+created without being actually used to do a query.  This shows up
+because psycopg2 executes a BEGIN when a cursor is created.  I haven't
+yet looked into where that is happening, but it's extra work the system
+is doing for nothing.
+=== wiki ===
+The current wiki parser design is slow, doing the same large regex over
+each line.  I think a general redesign to use re.finditer, rather than
+line split then re.match, would help improve performance here.  However,
+post-caching the content would be better.  I'm experimenting with
+partial caching of the wiki content and have reduced my request timing
+on WikiStart from 120ms to 75ms while still getting correct content.
+The patch I have doesn't cache macro's unless the macro arguments have
+'cache' in them (good for page outline macro, which btw seems to
+re-parse the entire page, macros included).  There may be other issues
+with the approach I have taken, I haven't tested it much.  Once I get
+further, I might actually pre-process when the wiki page is saved and
+stash the result into the database, avoiding the parse in most requests.
+== Genshi ==
+Use Genshi trunk, some performance gain there.
+With Genshi, one big win is turning off the whitespace filter, I think I
+got around 10ms on a small page, the gain would be larger on larger
+pages.  I did that by patching trac.web.chrome where there is a call to
+stream.render, and adding strip_whitespace=False.  While this increases
+the file size delivered slightly (and adds a lot of whitespace to the
+source view), that can be countered by mod_deflate.  The namespace
+flattener is another slow filter, but it's necessary.
+Another win with Genshi is patching genshi.output.encode and getting rid
+of the iteration that calls _encode a bazillion times.  Instead, just do:
+{{{
+  out.write(_encode(u''.join(list(iterator))))
+}}}
+This is much faster.
+I should note, I also tried changing trac so that the output went
+straight through to the wsgi layer (rather than buffering it in a
+cStringIO object).  While this improved the initial wait time
+dramatically, it actually slowed down the overall request time.  I
+believe this slowdown was also due to the _encode calls inside the
+iteration over the stream.  I'm still going to experiment with this more
+at some point.
+Overall, I like Genshi, now that I kind of get how it works it has a
+certain elegance, but I am left unconvinced that it's design and
+flexibility outweighs the performance issues, and the added difficulty
+in figuring them out (refer to profiling at the top).  One way I'm
+thinking of working around this is to beef up the xmlrpc interfaces and
+do everything with ajax.
+== Misc ==
+=== plugins ===
+I don't think most plugin authors know where they will generate
+performance penalties.  It's hard to know all those details without
+spending a lot of time on it.
+generally pay attention to any plugins that use stream filters or
+request filters.  These are called for every request and more often than
+not do some thing that is time intensive.  Additionally, IMO, a request
+handlers match_request should never do anything beyond
+req.path_info.startswith(/mypath).  Many match_request implementations
+do a lot more, which gets executed for every request.
+=== wysiwyg plugin ===
+this adds a large js file to *every* request whether useful or not.  I
+removed the large script from the request filter, then I changed the
+load script to check and see if there is a textarea element in the dom,
+then load the larger script if necessary.