Edgewall Software
Modify

Opened 8 years ago

Last modified 2 years ago

#8507 new enhancement

Developer WSGI Profiling and Debugging

Reported by: Shane Caraveo <shanec@…> Owned by:
Priority: normal Milestone: next-major-releases
Component: general Version: none
Severity: normal Keywords: development
Cc: exarkun@…, Ryan J Ollos
Release Notes:
API Changes:

Description

I have a wsgi setup for development that includes the ability to use a couple different methods of profiling as well as debugging using DBGP (a debugger protocol for dynamic langauges, used in Komodo, Eclipse DLTK, and a bunch of other editors/ides). I hope this makes it easy for others who are interested in profiling and/or debugging trac. A couple small changes would make it useful for any wsgi application.

Attachments (4)

trac.wsgi (3.7 KB ) - added by Shane Caraveo <shanec@…> 8 years ago.
trac.wsgi
tracdev.py (8.9 KB ) - added by Shane Caraveo <shanec@…> 8 years ago.
tracdev.py contains all the guts for profilers and debuggers
trac.2.wsgi (3.7 KB ) - added by Shane Caraveo <shanec@…> 8 years ago.
trac.wsgi v2
8507-pluggable-profiling-r8466.patch (7.7 KB ) - added by Remy Blank 8 years ago.
Pluggable profiling infrastructure

Download all attachments as: .zip

Change History (25)

Changed 8 years ago by Shane Caraveo <shanec@…>

Attachment: trac.wsgi added

trac.wsgi

Changed 8 years ago by Shane Caraveo <shanec@…>

Attachment: tracdev.py added

tracdev.py contains all the guts for profilers and debuggers

comment:1 Changed 8 years ago by Shane Caraveo <shanec@…>

Note: This also includes an example of doing the gc.collect from a background thread. Patch trac.main to remove the call to gc.collect, and turn on the background thread via the apache conf. Your per-request throughput will improve.

comment:2 Changed 8 years ago by Christian Boos

It would be nice to have this interfaced with tracd as well.

Changed 8 years ago by Shane Caraveo <shanec@…>

Attachment: trac.2.wsgi added

trac.wsgi v2

comment:3 Changed 8 years ago by Christian Boos

Keywords: development added
Milestone: 0.12

I think the GcCollect should be integrated in trac.web.main so that we can get rid of the explicit collect after each request. The GcCollect object should be instantiated only when env['wsgi.run_once'] == False, of course.

The trac.wsgi itself could go somewhere in contrib, e.g. contrib/tracdev.wsgi?

comment:4 Changed 8 years ago by Christian Boos

Just found out about http://dev.pocoo.org/projects/flickzeug, which has profiling and leak detection support.

comment:5 Changed 8 years ago by Remy Blank

I'm currently trying to fit per-environment profiling into Trac itself, independently of the frontend, based on cProfile. It's almost working, but still needs some tweaks. Patch coming shortly.

Changed 8 years ago by Remy Blank

Pluggable profiling infrastructure

comment:6 Changed 8 years ago by Remy Blank

The patch above introduces an IProfiler interface and an implementation using cProfile. It allows profiling complete requests as well as isolated Component methods, and stores its data in one file per process per environment reload.

A few comments:

  • This is a proof of concept, or rather, a second sample implementation. In particular, the display of profiling data is currently hardcoded.
  • Profiling should work for all frontends, but I have only tested tracd so far. In particular, the cProfile profiler implementation should be thread-safe.
  • The IProfiler should be generic enough to be usable for other profiler backends, in particular the ones that Shane has used.
  • Request profiling doesn't include environment creation. I have done this on purpose, as that operation should be relatively rare and therefore is usually not interesting.
  • On my machine, the test suite has taken a 10% performance hit (with profiling disabled!). I suspect that this is due to the many environment creations, but I will have to check that. It could also be due to reading profile_main on every request. I'll try to replace the option with a flag in the WSGI environment and see how it behaves.

To profile complete requests, set [profiling] profile_main = true in trac.ini and watch the $ENV/profiling folder for profiling data. To display the data, use trac-admin $ENV profile view <path> [...], it will aggregate the data from several files if desired.

To profile isolated Component methods, just decorate them with trac.profile.profile. The decorator is reentrant.

Here's some sample data from a few renderings of the front page of my sandbox:

         825709 function calls (757124 primitive calls) in 16.402 CPU seconds

   Ordered by: internal time
   List reduced from 1799 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       75    4.722    0.063    5.024    0.067 {gc.collect}
      730    4.674    0.006    4.674    0.006 {built-in method acquire}
     3230    0.718    0.000    1.491    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:234(items)
   248580    0.664    0.000    0.664    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:211(get)
      274    0.591    0.002    0.591    0.002 {method 'write' of 'file' objects}
       40    0.457    0.011    0.457    0.011 {method 'sendall' of '_socket.socket' objects}
10170/405    0.218    0.000    1.903    0.005 /home/joe/src/genshi/advanced-i18n/genshi/filters/i18n.py:605(__call__)
      269    0.144    0.001    5.388    0.020 /usr/lib/python2.5/logging/__init__.py:1132(callHandlers)
     3230    0.122    0.000    0.122    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:224(keys)
21615/2960    0.111    0.000    3.467    0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:523(_flatten)
       79    0.106    0.001    0.106    0.001 /home/joe/src/trac/trunk/trac/db/sqlite_backend.py:49(_rollback_on_error)
      269    0.100    0.000    0.172    0.001 /usr/lib/python2.5/logging/__init__.py:606(release)
       75    0.077    0.001   11.285    0.150 /home/joe/src/trac/trunk/trac/web/main.py:143(dispatch)
 3185/308    0.076    0.000    0.191    0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/astutil.py:80(visit)
15676/3065    0.076    0.000    4.114    0.001 /home/joe/src/genshi/advanced-i18n/genshi/core.py:268(_ensure)
     1201    0.068    0.000    0.068    0.000 /usr/lib/python2.5/threading.py:44(_note)
    25690    0.067    0.000    0.067    0.000 {isinstance}
42806/42402    0.063    0.000    0.064    0.000 {len}
     3406    0.059    0.000    0.113    0.000 /home/joe/src/genshi/advanced-i18n/genshi/core.py:689(__new__)
    38185    0.056    0.000    0.056    0.000 {method 'append' of 'list' objects}
  260/255    0.052    0.000    0.393    0.002 /home/joe/src/trac/trunk/trac/web/api.py:195(__getattr__)
2625/2120    0.049    0.000    4.156    0.002 /home/joe/src/genshi/advanced-i18n/genshi/output.py:682(__call__)
   901/69    0.049    0.000    0.112    0.002 /usr/lib/python2.5/sre_compile.py:38(_compile)
   550/71    0.047    0.000    0.166    0.002 /usr/lib/python2.5/sre_parse.py:385(_parse)
      720    0.045    0.000    0.051    0.000 /home/joe/src/trac/trunk/trac/config.py:523(compare)
     1470    0.042    0.000    0.044    0.000 /usr/lib/python2.5/threading.py:733(currentThread)
      637    0.040    0.000    0.040    0.000 {compile}
8580/2880    0.039    0.000    3.506    0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/markup.py:307(_match)
     3230    0.039    0.000    1.536    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:182(__len__)
  798/339    0.038    0.000    0.161    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/astutil.py:580(visit_Call)

This confirms Shane's findings about gc.collect(). But it's also interesting to note that we seem to have lots of contention on locks, as a lot of time is spent in acquire(). This could be due to logging (my level is DEBUG), or it may even be a measurement artifact due to multi-threading.

My current questions:

  • Does it make sense to make profiling pluggable? Is the IProfiler interface adequate?
  • Should request profiling be enabled with an option in trac.ini or with an env['trac.profile'] option? This could be set e.g. in tracd with a —profile switch. I tend to prefer trac.ini` as it's easier to explain to a user.

comment:7 Changed 8 years ago by Remy Blank

Here's some data with logging disabled:

         1398735 function calls (1273754 primitive calls) in 20.844 CPU seconds

   Ordered by: internal time
   List reduced from 1521 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      150    9.393    0.063    9.989    0.067 {gc.collect}
     6460    1.367    0.000    2.862    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:234(items)
   497160    1.294    0.000    1.294    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:211(get)
     1443    0.882    0.001    0.882    0.001 {built-in method acquire}
       80    0.668    0.008    0.668    0.008 {method 'sendall' of '_socket.socket' objects}
6508/6507    0.391    0.000    0.485    0.000 /home/joe/src/trac/trunk/trac/core.py:194(__getitem__)
20340/810    0.318    0.000    3.512    0.004 /home/joe/src/genshi/advanced-i18n/genshi/filters/i18n.py:605(__call__)
      147    0.234    0.002    0.234    0.002 /home/joe/src/trac/trunk/trac/db/sqlite_backend.py:49(_rollback_on_error)
     6460    0.225    0.000    0.225    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:224(keys)
43230/5920    0.219    0.000    5.880    0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:523(_flatten)
     5621    0.157    0.000    0.245    0.000 /home/joe/src/genshi/advanced-i18n/genshi/core.py:689(__new__)
30241/5905    0.107    0.000    6.891    0.001 /home/joe/src/genshi/advanced-i18n/genshi/core.py:268(_ensure)
    43174    0.100    0.000    0.100    0.000 {isinstance}
5250/4240    0.093    0.000    6.924    0.002 /home/joe/src/genshi/advanced-i18n/genshi/output.py:682(__call__)
    10589    0.089    0.000    0.151    0.000 extensions/pyexpat.c:853(Default)
45628/45421    0.084    0.000    0.085    0.000 {len}
     2529    0.083    0.000    0.289    0.000 /home/joe/src/trac/trunk/trac/config.py:296(get)
5144/4030    0.083    0.000    2.257    0.001 {built-in method sub}
     6460    0.075    0.000    2.947    0.000 /home/joe/src/genshi/advanced-i18n/genshi/template/base.py:182(__len__)
      637    0.074    0.000    0.074    0.000 {compile}
     1844    0.073    0.000    0.073    0.000 /usr/lib/python2.5/ConfigParser.py:559(_interpolate)
    45624    0.069    0.000    0.069    0.000 {method 'append' of 'list' objects}
        6    0.067    0.011    0.233    0.039 {built-in method ParseFile}
     1904    0.066    0.000    0.366    0.000 /home/joe/src/trac/trunk/trac/config.py:435(__get__)
4540/4120    0.063    0.000    7.060    0.002 /home/joe/src/genshi/advanced-i18n/genshi/output.py:298(__call__)
17160/5760    0.063    0.000    5.942    0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/markup.py:307(_match)
   161/12    0.060    0.000    0.072    0.006 /home/joe/src/genshi/advanced-i18n/genshi/template/markup.py:114(_extract_directives)
 3185/308    0.059    0.000    0.208    0.001 /home/joe/src/genshi/advanced-i18n/genshi/template/astutil.py:80(visit)
     6890    0.056    0.000    0.056    0.000 {method 'join' of 'unicode' objects}
      150    0.055    0.000    0.158    0.001 /home/joe/src/trac/trunk/trac/versioncontrol/api.py:82(pre_process_request)

So lock contention is not as bad as I thought, but still significant. Other interesting points:

  • ComponentManager.__getitem__()
  • PyFormatCursor._rollback_on_error()

I'll try to see if I can get info about the callers. This should tell us which locks have contention.

comment:8 Changed 8 years ago by Ryan Ollos <ryano@…>

Cc: ryano@… added

comment:9 Changed 8 years ago by rupert.thurner

thanks a lot for this! would profiling be also usable for plugins?

comment:10 in reply to:  9 Changed 8 years ago by Remy Blank

Replying to rupert.thurner:

thanks a lot for this! would profiling be also usable for plugins?

Full-request profiling will obviously include data from calls located in plugins. For plugin development, the @profile decorator should allow profiling specific parts of a plugin. So I guess the answer is yes.

comment:11 Changed 8 years ago by Remy Blank

Milestone: 0.12next-minor-0.12.x
Owner: set to Remy Blank

I'm not sure I will have time to work on this before 0.12.

comment:12 Changed 8 years ago by Christian Boos

Yet another profiler for Python, http://code.google.com/p/yappi.

Seems to be a perfect match for our needs (see why yappi).

comment:13 Changed 7 years ago by Christian Boos

Milestone: next-minor-0.12.xnext-major-0.1X

comment:14 Changed 5 years ago by exarkun

Note: This also includes an example of doing the gc.collect from a background thread.

Do you realize that CPython's garbage collector is not concurrent? It doesn't matter what thread you call gc.collect in, all threads are suspended while it is running.

Calling gc.collect at all seems totally bogus to me. The garbage collector will kick in when allocations dictate it should. Applications that need it called more frequently should adjust the generation thresholds. Explicitly triggering a full collection is almost always wrong, and doing it once per request is totally wrong.

I just deleted the call in my trac deployment and saw the whole site go from unusably slow to painfully slow (while memory usage remained exactly where it has always been). I suggest upstream do the same.

comment:15 in reply to:  14 Changed 5 years ago by Remy Blank

Replying to exarkun:

Calling gc.collect at all seems totally bogus to me. The garbage collector will kick in when allocations dictate it should.

That's the theory, yes. In practice, Python will gladly eat all available memory if requests come in at a high enough frequency.

Applications that need it called more frequently should adjust the generation thresholds. Explicitly triggering a full collection is almost always wrong, and doing it once per request is totally wrong.

Python has evolved since we introduced that call, and it may be wrong now to do so, but it certainly wasn't at the time. It was a working solution to a real issue.

I just deleted the call in my trac deployment and saw the whole site go from unusably slow to painfully slow (while memory usage remained exactly where it has always been).

Maybe it also depends on the traffic hitting your site. Could you please provide some context info about your installation (OS, Python version, DB connector version, frontend, approximate queries per second)? That would give us a data point for the investigation.

comment:16 in reply to:  14 Changed 5 years ago by Christian Boos

Replying to exarkun:

… saw the whole site go from unusably slow to painfully slow

Hm, I have a hard time to guess what's preferable between the two ;-)

comment:17 Changed 5 years ago by exarkun

That's the theory, yes. In practice, Python will gladly eat all available memory if requests come in at a high enough frequency.

This explanation supposes that Python's garbage collector is time based. It is not. It is triggered based on how many allocations have happened. Higher request rate will just trigger the garbage collector sooner.

Maybe it also depends on the traffic hitting your site. Could you please provide some context info about your installation (OS, Python version, DB connector version, frontend, approximate queries per second)? That would give us a data point for the investigation.

I doubt this information is useful in any way, but here you go. Ubuntu Hardy, Python 2.5.2, psycopg2 2.0.13, Twisted Web, and the number of queries per second is "as many as possible", which usually ends up near 2 per second, because that's as fast as trac can serve them.

comment:18 in reply to:  17 Changed 5 years ago by Christian Boos

Replying to exarkun:

That's the theory, yes. In practice, Python will gladly eat all available memory if requests come in at a high enough frequency.

This explanation supposes that Python's garbage collector is time based. It is not. It is triggered based on how many allocations have happened. Higher request rate will just trigger the garbage collector sooner.

Well, we're speaking out of experience here. Please have a look at #6614 (in particular its comment:17 which explains why I introduced this explicit gc.collect()). The ticket is long but instructive and reflects our "struggle" to keep things in control (somewhat).

Of course, it would be interesting to get some new numbers with a recent configuration (Trac 0.12.x, Genshi 0.6), with and without this explicit gc.

comment:19 Changed 5 years ago by exarkun@…

Cc: exarkun@… added

Well, we're speaking out of experience here. Please have a look at #6614 (in particular its comment:17 which explains why I introduced this explicit gc.collect()). The ticket is long but instructive and reflects our "struggle" to keep things in control (somewhat).

It sounds like the gc.collect() is helping memory usage because it triggers early collection of a reference cycle containing large objects (eg memory pools). A cheaper way to accomplish the same goal is to explicitly break the reference cycles. This will lead to them being collectable due to reference counting, which means they'll be freed as soon as they are no longer referenced. This is better than calling gc.collect() because it doesn't incur the cost of traversing all objects in memory to find collectable cycles.

I don't claim a deep understanding of how the Pool objects are managed, but trac.versioncontrol.svn_fs.application_pool and its handling strikes me as quite suspicious.

That aside, twistedmatrix.com's trac 0.11.6 deployment does not see the memory issues described on #6614, even now that the per-request gc.collect() call has been removed. Memory usage on our instance (which is busy 24/7) stabilizes at around 300MB to 400MB (after about a week of operation).

comment:21 Changed 3 years ago by Ryan J Ollos

Cc: Ryan J Ollos added; ryano@… removed

comment:22 Changed 2 years ago by Ryan J Ollos

Owner: Remy Blank deleted

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
The ticket will be disowned. Next status will be 'new'.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .

 
Note: See TracTickets for help on using tickets.