Edgewall Software

Changes between Initial Version and Version 1 of TracDev/Performance/0.11.5


Ignore:
Timestamp:
Aug 7, 2009, 10:46:03 AM (15 years ago)
Author:
Christian Boos
Comment:

start with content of Shane Caraveo's mail on trac-dev ( googlegroups:trac-dev:a48473cd1dcfbd7c)

Legend:

Unmodified
Added
Removed
Modified
  • TracDev/Performance/0.11.5

    v1 v1  
     1= Analysis of Performance of Trac 0.11.5 =
     2
     3 ''initial content from a mail authored by Shane Caraveo's on trac-dev (googlegroups:trac-dev:a48473cd1dcfbd7c)''
     4
     5I've recently reduced per request time for *simple* template generation
     6from 150ms to 45ms on my laptop (dual 1.6ghz core duo 2GB, tested with
     7ab against a simple template).  I also reduced page load times (reported
     8via google page load/firebug) to around a second (from 2-8 seconds in
     9some cases).  These timings are without any network overhead since I'm
     10testing on the local machine.  Times are all taken on my laptop with my
     11typical environment/applications open (lots of crap).
     12
     13This is all done with 11.5.
     14
     15Even with the gains I still get 100% cpu spikes every request, but at
     16least it's a shorter duration.  Still digging further on that.  I also
     17have done a bit on overall browser page load.
     18
     19Here's my current brain dump.  If some of my conclusions are silly,
     20please let me know why.
     21
     22
     23== General ==
     24
     25In general there are only a couple big wins.  For me it was removing
     26gc.collect (see trac.main) and the timing and estimation plugin.
     27Everything else was small potatoes in comparison (10ms here, 5ms there),
     28but together they added up to a good 40-50ms per request.  Think of it
     29this way, using 100%cpu and 50ms/request limits you to a max of 20
     30requests/second/cpu.  Every ms counts if we want decent throughput.  I'd
     31like to get under 30ms.
     32
     33=== How I timed ===
     34
     35My test setup is apache prefork with wsgi deamon mode, one process,
     36mod_deflate enabled.  My laptop is pretty loaded, very little free
     37memory but generally enough that trac shouldn't be swapping.  Postgres
     38is the database, running on the laptop as well.
     39
     40For timing template generation I used ApacheBench and tossed out
     41"warmup" requests, sometimes testing with keep alive (getting slightly
     42better req/s)
     43
     44{{{
     45ab [-k] -c 1 -n 10 url
     46}}}
     47
     48With zero network latency, the timing is essentially how much time trac
     49is taking to generate the content.  I tested against a simple template
     50generated from a small plugin that just renders a template file (this is
     51to involve as little other processing such as querying the ticket table
     52or permissions), and against WikiStart.
     53
     54For "page load" I used Google Page Load extension for firebug and
     55firefox 3.5.  I also tested with Safari 4 which has similar
     56functionality now, it's much faster than firefox  :(
     57
     58I tested with plugins enabled since I wanted to also discover whether
     59any plugins were causing me grief, some were.
     60
     61I tested against empty projects, so table indexing or table size are not
     62involved.  I also tested against an small sized project in some
     63instances, which showed up issues in a couple plugins (eg. t&e).
     64
     65=== Profiling issues ===
     66
     67It seems to me that cProfile adds times from generators to the function
     68that uses them, masking where the real time is spent.  That makes
     69profiling Genshi terribly difficult.  I also tried dtrace out, it gives
     70the same rough data as cProfile.  Knowing a tiny bit about the python
     71engine (having written a c level debugger extension), I am figuring that
     72it has something to do with how generators work in the python core.
     73
     74A couple areas that cProfile claimed a lot of exclusive time in Genshi I
     75rewrote in C.  This didn't change my performance at all, but did shift
     76timing to other area's which helped me find a couple of the actual time
     77sinks in Genshi (see below).
     78
     79== For general performance ==
     80
     81Use either a multithreaded apache if you use mod_python, or use wsgi in
     82multithreaded deamon mode.  It wont remove high-cpu slow requests, but
     83it does lower memory use to some extent.  While that didn't improve the
     84per-request performance, I got higher concurrent throughput with that
     85setup.   Apache prefork with mod_python or wsgi in embeded mode will
     86chew up memory, and due to high cpu usage, processes will be fighting
     87each other more for resources.
     88
     89=== customizations and plugins ===
     90
     91The single largest gain I got with plugins was by removing timing and
     92estimation plugin (primarily seen in ticket reports page where it does
     93another sql query for each ticket, doubling the request time).  It's
     94stream filters slow lots of stuff down.  Removing other plugins with
     95stream filters or reworking them to not use stream filters when possible
     96is a win.  In some instances, TracHours plugin is a better solution if
     97you don't need all the t&e features.
     98
     99Get rid of stuff like the google analytics plugin and just add the code
     100to a site or theme template.  You'll get better performance.  Its not
     101that it's a bad plugin, but it does something simple that adds *genshi*
     102overhead.
     103
     104Examine any custom templates for extraneous py:match use, get rid of it.
     105I had one match in the site template that was removing something from
     106an admin panel, which I had forgotten about, big mistake and it cost on
     107the performance of *every* page.  I directly patched the template
     108instead and saved myself 10ms per request.  This kind of thing doesn't
     109show up in the profiler, it was some debug dumps I inserted deep into
     110the Genshi bowels that showed me the light.
     111
     112Make sure all static resources for all plugins are exported.  trac-admin
     113deploy will only deploy static resources for enabled plugins in that
     114project.
     115
     116=== javascript ===
     117
     118This is mostly about page load times.
     119
     120minify all the javascript output by trac-admin deploy, and make sure it
     121gets cache headers via apache conf.
     122
     123Upgrade to jquery 1.3, it's much faster.  trac trunk has done this, and
     124there is a diff somewhere that shows what type of changes have to be
     125made.  You'd have to examine any plugins for similar jquery updates that
     126need to be done.
     127
     128If you use firebug, be sure you have disabled it (or at least disable
     129the console tab) for trac pages, the console somehow interrupts and adds
     130a couple seconds delay in the browser.
     131
     132=== css ===
     133
     134I haven't got to this yet, but there is a lot of slow css there.
     135Removing all the css as a test, my page load in the browser was 200ms
     136faster.  Google Page Load is good at reporting what css *might* be slow,
     137it reports lots of inefficient css in trac.
     138
     139== trac issues ==
     140
     141=== gc.collect ===
     142
     143The next big item was removing gc.collect.  If I recall correctly,
     144gc.collect took at least 50ms of the request time, sometimes upwards of
     145100ms.  It also hides other bugs (ticket:8443 for example, the postgres
     146idle bug came back without gc.collect, I've got a patch in that bug).
     147gc.collect could be done in a background thread from tracd, but I
     148wouldn't suggest background threads for mod_python prefork.  With wsgi
     149daemon mode, just limit the number of requests before restart to
     150something you think is fine and test memory consumption.  I think the
     151use of gc.collect should be seriously reconsidered.
     152
     153I have a sneaking suspicion (unproven) that people who use mod_python
     154and claim turning off keep alive and/or mod_deflate are having problems
     155due to gc.collect.  As I understand apache filters (e.g. mod_deflate)
     156they wont finish up the request until the mod_python handler has
     157returned.  So that extra time in gc.collect delays the request being
     158returned, which delays mod_deflate finishing.  It also delays the start
     159of the next request over a persistent socket connection (ie. keep alive).
     160
     161=== filters ===
     162
     163Going through all the match_request implementations and removing
     164permission checking (put it in process_request), make sure regex matches
     165are precompiled, and generally simplifying things helps.  There are a
     166number of those in trac core, but plugins are greater abusers in this
     167area.  Also examine any IRequestFilter use and simplify.
     168
     169=== trac.config ===
     170
     171Other than Genshi, profiling showed trac.config to be the single largest
     172time on simple template generation.  Profiling showed me that the get
     173method in the Configuation class (trac.config.Configuration) was slow.
     174I added caching there for a few extra milliseconds boost.  I'm also
     175looking at removing the auto-reload if the ini file changes, maybe using
     176spread or memcached to create reload notifications, to get rid of file
     177stats, but timing doesn't show this to be a large issue on my laptop.
     178
     179=== repository ===
     180
     181While I still want to remove the sync on every request (get rid of
     182system calls/file stats), I have been unable to show that performance
     183changes much when I test with it removed.  There are still bigger fish
     184to fry.
     185
     186=== database pool ===
     187
     188Watching the postgres log file, I can tell that a lot of cursors get
     189created without being actually used to do a query.  This shows up
     190because psycopg2 executes a BEGIN when a cursor is created.  I haven't
     191yet looked into where that is happening, but it's extra work the system
     192is doing for nothing.
     193
     194=== wiki ===
     195
     196The current wiki parser design is slow, doing the same large regex over
     197each line.  I think a general redesign to use re.finditer, rather than
     198line split then re.match, would help improve performance here.  However,
     199post-caching the content would be better.  I'm experimenting with
     200partial caching of the wiki content and have reduced my request timing
     201on WikiStart from 120ms to 75ms while still getting correct content.
     202The patch I have doesn't cache macro's unless the macro arguments have
     203'cache' in them (good for page outline macro, which btw seems to
     204re-parse the entire page, macros included).  There may be other issues
     205with the approach I have taken, I haven't tested it much.  Once I get
     206further, I might actually pre-process when the wiki page is saved and
     207stash the result into the database, avoiding the parse in most requests.
     208
     209== Genshi ==
     210
     211Use Genshi trunk, some performance gain there.
     212
     213With Genshi, one big win is turning off the whitespace filter, I think I
     214got around 10ms on a small page, the gain would be larger on larger
     215pages.  I did that by patching trac.web.chrome where there is a call to
     216stream.render, and adding strip_whitespace=False.  While this increases
     217the file size delivered slightly (and adds a lot of whitespace to the
     218source view), that can be countered by mod_deflate.  The namespace
     219flattener is another slow filter, but it's necessary.
     220
     221Another win with Genshi is patching genshi.output.encode and getting rid
     222of the iteration that calls _encode a bazillion times.  Instead, just do:
     223
     224{{{
     225  out.write(_encode(u''.join(list(iterator))))
     226}}}
     227
     228This is much faster.
     229
     230I should note, I also tried changing trac so that the output went
     231straight through to the wsgi layer (rather than buffering it in a
     232cStringIO object).  While this improved the initial wait time
     233dramatically, it actually slowed down the overall request time.  I
     234believe this slowdown was also due to the _encode calls inside the
     235iteration over the stream.  I'm still going to experiment with this more
     236at some point.
     237
     238Overall, I like Genshi, now that I kind of get how it works it has a
     239certain elegance, but I am left unconvinced that it's design and
     240flexibility outweighs the performance issues, and the added difficulty
     241in figuring them out (refer to profiling at the top).  One way I'm
     242thinking of working around this is to beef up the xmlrpc interfaces and
     243do everything with ajax.
     244
     245== Misc ==
     246
     247=== plugins ===
     248
     249I don't think most plugin authors know where they will generate
     250performance penalties.  It's hard to know all those details without
     251spending a lot of time on it.
     252
     253generally pay attention to any plugins that use stream filters or
     254request filters.  These are called for every request and more often than
     255not do some thing that is time intensive.  Additionally, IMO, a request
     256handlers match_request should never do anything beyond
     257req.path_info.startswith(/mypath).  Many match_request implementations
     258do a lot more, which gets executed for every request.
     259
     260=== wysiwyg plugin ===
     261
     262this adds a large js file to *every* request whether useful or not.  I
     263removed the large script from the request filter, then I changed the
     264load script to check and see if there is a textarea element in the dom,
     265then load the larger script if necessary.