Opened 16 years ago
Last modified 10 years ago
#8507 new enhancement
Developer WSGI Profiling and Debugging
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | normal | Milestone: | next-major-releases |
Component: | general | Version: | none |
Severity: | normal | Keywords: | development |
Cc: | exarkun@…, Ryan J Ollos | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
I have a wsgi setup for development that includes the ability to use a couple different methods of profiling as well as debugging using DBGP (a debugger protocol for dynamic langauges, used in Komodo, Eclipse DLTK, and a bunch of other editors/ides). I hope this makes it easy for others who are interested in profiling and/or debugging trac. A couple small changes would make it useful for any wsgi application.
Attachments (4)
Change History (25)
by , 16 years ago
by , 16 years ago
Attachment: | tracdev.py added |
---|
tracdev.py contains all the guts for profilers and debuggers
comment:1 by , 16 years ago
Note: This also includes an example of doing the gc.collect from a background thread. Patch trac.main to remove the call to gc.collect, and turn on the background thread via the apache conf. Your per-request throughput will improve.
comment:3 by , 16 years ago
Keywords: | development added |
---|---|
Milestone: | → 0.12 |
I think the GcCollect
should be integrated in trac.web.main
so that we can get rid of the explicit collect after each request. The GcCollect
object should be instantiated only when env['wsgi.run_once'] == False
, of course.
The trac.wsgi itself could go somewhere in contrib, e.g. contrib/tracdev.wsgi?
comment:4 by , 15 years ago
Just found out about http://dev.pocoo.org/projects/flickzeug, which has profiling and leak detection support.
comment:5 by , 15 years ago
I'm currently trying to fit per-environment profiling into Trac itself, independently of the frontend, based on cProfile
. It's almost working, but still needs some tweaks. Patch coming shortly.
by , 15 years ago
Attachment: | 8507-pluggable-profiling-r8466.patch added |
---|
Pluggable profiling infrastructure
comment:6 by , 15 years ago
The patch above introduces an IProfiler
interface and an implementation using cProfile
. It allows profiling complete requests as well as isolated Component
methods, and stores its data in one file per process per environment reload.
A few comments:
- This is a proof of concept, or rather, a second sample implementation. In particular, the display of profiling data is currently hardcoded.
- Profiling should work for all frontends, but I have only tested
tracd
so far. In particular, thecProfile
profiler implementation should be thread-safe.
- The
IProfiler
should be generic enough to be usable for other profiler backends, in particular the ones that Shane has used.
- Request profiling doesn't include environment creation. I have done this on purpose, as that operation should be relatively rare and therefore is usually not interesting.
- On my machine, the test suite has taken a 10% performance hit (with profiling disabled!). I suspect that this is due to the many environment creations, but I will have to check that. It could also be due to reading
profile_main
on every request. I'll try to replace the option with a flag in the WSGI environment and see how it behaves.
To profile complete requests, set [profiling] profile_main = true
in trac.ini
and watch the $ENV/profiling
folder for profiling data. To display the data, use trac-admin $ENV profile view <path> [...]
, it will aggregate the data from several files if desired.
To profile isolated Component
methods, just decorate them with trac.profile.profile
. The decorator is reentrant.
Here's some sample data from a few renderings of the front page of my sandbox:
825709 function calls (757124 primitive calls) in 16.402 CPU seconds Ordered by: internal time List reduced from 1799 to 30 due to restriction <30> ncalls tottime percall cumtime percall filename:lineno(function) 75 4.722 0.063 5.024 0.067 {gc.collect} 730 4.674 0.006 4.674 0.006 {built-in method acquire} 3230 0.718 0.000 1.491 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:234(items) 248580 0.664 0.000 0.664 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:211(get) 274 0.591 0.002 0.591 0.002 {method 'write' of 'file' objects} 40 0.457 0.011 0.457 0.011 {method 'sendall' of '_socket.socket' objects} 10170/405 0.218 0.000 1.903 0.005 /home/joe/src/genshi/advanced-i18n/genshi/filters/i18n.py:605(__call__) 269 0.144 0.001 5.388 0.020 /usr/lib/python2.5/logging/__init__.py:1132(callHandlers) 3230 0.122 0.000 0.122 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:224(keys) 21615/2960 0.111 0.000 3.467 0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:523(_flatten) 79 0.106 0.001 0.106 0.001 /home/joe/src/trac/trunk/trac/db/sqlite_backend.py:49(_rollback_on_error) 269 0.100 0.000 0.172 0.001 /usr/lib/python2.5/logging/__init__.py:606(release) 75 0.077 0.001 11.285 0.150 /home/joe/src/trac/trunk/trac/web/main.py:143(dispatch) 3185/308 0.076 0.000 0.191 0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/astutil.py:80(visit) 15676/3065 0.076 0.000 4.114 0.001 /home/joe/src/genshi/advanced-i18n/genshi/core.py:268(_ensure) 1201 0.068 0.000 0.068 0.000 /usr/lib/python2.5/threading.py:44(_note) 25690 0.067 0.000 0.067 0.000 {isinstance} 42806/42402 0.063 0.000 0.064 0.000 {len} 3406 0.059 0.000 0.113 0.000 /home/joe/src/genshi/advanced-i18n/genshi/core.py:689(__new__) 38185 0.056 0.000 0.056 0.000 {method 'append' of 'list' objects} 260/255 0.052 0.000 0.393 0.002 /home/joe/src/trac/trunk/trac/web/api.py:195(__getattr__) 2625/2120 0.049 0.000 4.156 0.002 /home/joe/src/genshi/advanced-i18n/genshi/output.py:682(__call__) 901/69 0.049 0.000 0.112 0.002 /usr/lib/python2.5/sre_compile.py:38(_compile) 550/71 0.047 0.000 0.166 0.002 /usr/lib/python2.5/sre_parse.py:385(_parse) 720 0.045 0.000 0.051 0.000 /home/joe/src/trac/trunk/trac/config.py:523(compare) 1470 0.042 0.000 0.044 0.000 /usr/lib/python2.5/threading.py:733(currentThread) 637 0.040 0.000 0.040 0.000 {compile} 8580/2880 0.039 0.000 3.506 0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/markup.py:307(_match) 3230 0.039 0.000 1.536 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:182(__len__) 798/339 0.038 0.000 0.161 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/astutil.py:580(visit_Call)
This confirms Shane's findings about gc.collect()
. But it's also interesting to note that we seem to have lots of contention on locks, as a lot of time is spent in acquire()
. This could be due to logging (my level is DEBUG), or it may even be a measurement artifact due to multi-threading.
My current questions:
- Does it make sense to make profiling pluggable? Is the
IProfiler
interface adequate?
- Should request profiling be enabled with an option in
trac.ini
or with anenv['trac.profile'] option? This could be set e.g. in
tracdwith a
—profileswitch. I tend to prefer
trac.ini` as it's easier to explain to a user.
comment:7 by , 15 years ago
Here's some data with logging disabled:
1398735 function calls (1273754 primitive calls) in 20.844 CPU seconds Ordered by: internal time List reduced from 1521 to 30 due to restriction <30> ncalls tottime percall cumtime percall filename:lineno(function) 150 9.393 0.063 9.989 0.067 {gc.collect} 6460 1.367 0.000 2.862 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:234(items) 497160 1.294 0.000 1.294 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:211(get) 1443 0.882 0.001 0.882 0.001 {built-in method acquire} 80 0.668 0.008 0.668 0.008 {method 'sendall' of '_socket.socket' objects} 6508/6507 0.391 0.000 0.485 0.000 /home/joe/src/trac/trunk/trac/core.py:194(__getitem__) 20340/810 0.318 0.000 3.512 0.004 /home/joe/src/genshi/advanced-i18n/genshi/filters/i18n.py:605(__call__) 147 0.234 0.002 0.234 0.002 /home/joe/src/trac/trunk/trac/db/sqlite_backend.py:49(_rollback_on_error) 6460 0.225 0.000 0.225 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:224(keys) 43230/5920 0.219 0.000 5.880 0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:523(_flatten) 5621 0.157 0.000 0.245 0.000 /home/joe/src/genshi/advanced-i18n/genshi/core.py:689(__new__) 30241/5905 0.107 0.000 6.891 0.001 /home/joe/src/genshi/advanced-i18n/genshi/core.py:268(_ensure) 43174 0.100 0.000 0.100 0.000 {isinstance} 5250/4240 0.093 0.000 6.924 0.002 /home/joe/src/genshi/advanced-i18n/genshi/output.py:682(__call__) 10589 0.089 0.000 0.151 0.000 extensions/pyexpat.c:853(Default) 45628/45421 0.084 0.000 0.085 0.000 {len} 2529 0.083 0.000 0.289 0.000 /home/joe/src/trac/trunk/trac/config.py:296(get) 5144/4030 0.083 0.000 2.257 0.001 {built-in method sub} 6460 0.075 0.000 2.947 0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:182(__len__) 637 0.074 0.000 0.074 0.000 {compile} 1844 0.073 0.000 0.073 0.000 /usr/lib/python2.5/ConfigParser.py:559(_interpolate) 45624 0.069 0.000 0.069 0.000 {method 'append' of 'list' objects} 6 0.067 0.011 0.233 0.039 {built-in method ParseFile} 1904 0.066 0.000 0.366 0.000 /home/joe/src/trac/trunk/trac/config.py:435(__get__) 4540/4120 0.063 0.000 7.060 0.002 /home/joe/src/genshi/advanced-i18n/genshi/output.py:298(__call__) 17160/5760 0.063 0.000 5.942 0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/markup.py:307(_match) 161/12 0.060 0.000 0.072 0.006 /home/joe/src/genshi/advanced-i18n/genshi/template/markup.py:114(_extract_directives) 3185/308 0.059 0.000 0.208 0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/astutil.py:80(visit) 6890 0.056 0.000 0.056 0.000 {method 'join' of 'unicode' objects} 150 0.055 0.000 0.158 0.001 /home/joe/src/trac/trunk/trac/versioncontrol/api.py:82(pre_process_request)
So lock contention is not as bad as I thought, but still significant. Other interesting points:
ComponentManager.__getitem__()
PyFormatCursor._rollback_on_error()
I'll try to see if I can get info about the callers. This should tell us which locks have contention.
comment:8 by , 15 years ago
Cc: | added |
---|
follow-up: 10 comment:9 by , 15 years ago
thanks a lot for this! would profiling be also usable for plugins?
comment:10 by , 15 years ago
Replying to rupert.thurner:
thanks a lot for this! would profiling be also usable for plugins?
Full-request profiling will obviously include data from calls located in plugins. For plugin development, the @profile
decorator should allow profiling specific parts of a plugin. So I guess the answer is yes.
comment:11 by , 15 years ago
Milestone: | 0.12 → next-minor-0.12.x |
---|---|
Owner: | set to |
I'm not sure I will have time to work on this before 0.12.
comment:12 by , 15 years ago
Yet another profiler for Python, http://code.google.com/p/yappi.
Seems to be a perfect match for our needs (see why yappi).
comment:13 by , 15 years ago
Milestone: | next-minor-0.12.x → next-major-0.1X |
---|
follow-ups: 15 16 comment:14 by , 13 years ago
Note: This also includes an example of doing the gc.collect from a background thread.
Do you realize that CPython's garbage collector is not concurrent? It doesn't matter what thread you call gc.collect
in, all threads are suspended while it is running.
Calling gc.collect
at all seems totally bogus to me. The garbage collector will kick in when allocations dictate it should. Applications that need it called more frequently should adjust the generation thresholds. Explicitly triggering a full collection is almost always wrong, and doing it once per request is totally wrong.
I just deleted the call in my trac deployment and saw the whole site go from unusably slow to painfully slow (while memory usage remained exactly where it has always been). I suggest upstream do the same.
comment:15 by , 13 years ago
Replying to exarkun:
Calling
gc.collect
at all seems totally bogus to me. The garbage collector will kick in when allocations dictate it should.
That's the theory, yes. In practice, Python will gladly eat all available memory if requests come in at a high enough frequency.
Applications that need it called more frequently should adjust the generation thresholds. Explicitly triggering a full collection is almost always wrong, and doing it once per request is totally wrong.
Python has evolved since we introduced that call, and it may be wrong now to do so, but it certainly wasn't at the time. It was a working solution to a real issue.
I just deleted the call in my trac deployment and saw the whole site go from unusably slow to painfully slow (while memory usage remained exactly where it has always been).
Maybe it also depends on the traffic hitting your site. Could you please provide some context info about your installation (OS, Python version, DB connector version, frontend, approximate queries per second)? That would give us a data point for the investigation.
comment:16 by , 13 years ago
Replying to exarkun:
… saw the whole site go from unusably slow to painfully slow
Hm, I have a hard time to guess what's preferable between the two ;-)
follow-up: 18 comment:17 by , 13 years ago
That's the theory, yes. In practice, Python will gladly eat all available memory if requests come in at a high enough frequency.
This explanation supposes that Python's garbage collector is time based. It is not. It is triggered based on how many allocations have happened. Higher request rate will just trigger the garbage collector sooner.
Maybe it also depends on the traffic hitting your site. Could you please provide some context info about your installation (OS, Python version, DB connector version, frontend, approximate queries per second)? That would give us a data point for the investigation.
I doubt this information is useful in any way, but here you go. Ubuntu Hardy, Python 2.5.2, psycopg2 2.0.13, Twisted Web, and the number of queries per second is "as many as possible", which usually ends up near 2 per second, because that's as fast as trac can serve them.
comment:18 by , 13 years ago
Replying to exarkun:
That's the theory, yes. In practice, Python will gladly eat all available memory if requests come in at a high enough frequency.
This explanation supposes that Python's garbage collector is time based. It is not. It is triggered based on how many allocations have happened. Higher request rate will just trigger the garbage collector sooner.
Well, we're speaking out of experience here. Please have a look at #6614 (in particular its comment:17 which explains why I introduced this explicit gc.collect()
). The ticket is long but instructive and reflects our "struggle" to keep things in control (somewhat).
Of course, it would be interesting to get some new numbers with a recent configuration (Trac 0.12.x, Genshi 0.6), with and without this explicit gc.
comment:19 by , 13 years ago
Cc: | added |
---|
Well, we're speaking out of experience here. Please have a look at #6614 (in particular its comment:17 which explains why I introduced this explicit gc.collect()). The ticket is long but instructive and reflects our "struggle" to keep things in control (somewhat).
It sounds like the gc.collect()
is helping memory usage because it triggers early collection of a reference cycle containing large objects (eg memory pools). A cheaper way to accomplish the same goal is to explicitly break the reference cycles. This will lead to them being collectable due to reference counting, which means they'll be freed as soon as they are no longer referenced. This is better than calling gc.collect()
because it doesn't incur the cost of traversing all objects in memory to find collectable cycles.
I don't claim a deep understanding of how the Pool objects are managed, but trac.versioncontrol.svn_fs.application_pool
and its handling strikes me as quite suspicious.
That aside, twistedmatrix.com's trac 0.11.6 deployment does not see the memory issues described on #6614, even now that the per-request gc.collect()
call has been removed. Memory usage on our instance (which is busy 24/7) stabilizes at around 300MB to 400MB (after about a week of operation).
comment:21 by , 10 years ago
Cc: | added; removed |
---|
comment:22 by , 10 years ago
Owner: | removed |
---|
trac.wsgi