Edgewall Software
Modify

Opened 13 years ago

Closed 13 years ago

#9919 closed enhancement (wontfix)

Improve on logging

Reported by: Carsten Klein <carsten.klein@…> Owned by:
Priority: normal Milestone:
Component: general Version: 0.13dev
Severity: normal Keywords: logging
Cc: Branch:
Release Notes:
API Changes:

This will require a wrapper to the standard logging.Logger, which is actually a no-brainer, and which is also compatible to the original api.

Internal Changes:

Description (last modified by Christian Boos)

Currently, all calls to logging happen regardless of whether logging was enabled, or enabled for a given log level.

This basically includes calls to access attributes, perform calculations or simply do string concatenation and so on.

It would be more efficient to have a wrapper to the original logger which will expose a few methods, or properties if you will.

Say, if you want to log.debug() something, you would not simply log.debug(inefficient_calculation_or_property_or_call_here) but instead you would

if log.isDebug(): # or log.is_debug

log.debug(inefficient_calculation_or_property_or_call_here)

This would increase overall performance a bit, don't you think?

Attachments (2)

log.patch (2.1 KB ) - added by Carsten Klein <carsten.klein@…> 13 years ago.
Patch for making this possible
log.2.patch (2.5 KB ) - added by Carsten Klein <carsten.klein@…> 13 years ago.
revised patch

Download all attachments as: .zip

Change History (17)

by Carsten Klein <carsten.klein@…>, 13 years ago

Attachment: log.patch added

Patch for making this possible

by Carsten Klein <carsten.klein@…>, 13 years ago

Attachment: log.2.patch added

revised patch

comment:1 by Carsten Klein <carsten.klein@…>, 13 years ago

@cached does not work with non components

see revised patch

comment:2 by Remy Blank, 13 years ago

Have you done any profiling that shows which logging statements are a bottleneck? I'd rather comment them out than make the code more complex.

comment:3 by Carsten Klein <carsten.klein@…>, 13 years ago

Remy: it is not about actual bottlenecks, it is about preventing unnecessary property access, string concatenation, and method calls, which, over time add to the overall response time of the application, especially under high load.

As for the code complexity:

Instead of writing

self.log.debug('message %s %s %s', self.method(), self.prop + self.prop, i)

one would have to write

if self.log.is_debug:
  self.log.debug('message %s %s %s', self.method(), self.prop + self.prop, i)

This is actually something which can be introduced over time, and not something that must be introduced right now. Identifying, like you said, the most costly logging calls, and from there migrating all calls to the new scheme over time will do the trick.

I for my self would definitely give it a try.

Some more detail:

Simply calling for example log.debug() will, on behalf of the logging system, call at least two methods, namely isEnabledFor(DEBUG) and in that method, getEffectiveLevel(), which, in turn, will recursively access self and parent loggers' level attribute.

As trac does not install child loggers, only the always existing RootLogger's level attribute might be accessed, which in turn is an internal call in the VM.

Say, you have a simple debug log message "debugging, it is me". This will result in at least three method calls, namely debug(), isEnabledFor() (along with access to the DEBUG property of the module), and a call to getEffectiveLevel(), which in turn will access the level properties of both, the logger and its parent RootLogger, provided that logger does not have a value assigned to the level attribute.

If you, say, need to concatenate multiple strings for actually getting the log message out, then you will have to add that overhead to the call as well. And if you happen to access properties of your class or call any methods or functions…

You see, the number of processor cycles needed for getting the log message out, and then detecting that one does not even have the required level to actually do so, are immense.

Whereas, by simply first checking whether log.is_debug consumes only a few cycles by adding a simple guard to your code.

A simple calculation might show this to you:

Say, a call to debug() will cost you 10 cycles alone. Calling the other methods from within debug() will then also cost you 10 cycles each, so we are up to 30 cycles. Accessing properties will cost you an additional 10 cycles per property accessed.

So this will be 40..50 cycles per call to log.debug().

In the other case you will only have 20..30 cycles as it is by now. This can be made more efficient using decorators such as @cached.

Considering 10000 users using the site and per call approx. 100+ log.debug() statements, times the number of cycles, you will end up in (worst case) 10000*100*50 = 50000000 cycles for the existing approach and only (worst case) 30000000 for the inefficient new approach.

This means that the efficiency of the new approach lies by 40%. Of course this number might be even higher, provided that we further optimize the LoggerWrapper.

Of course, cycles used during calls might vary, and I presume that the number is way higher than what I assumed here or might be lower based on further optimization a/o caching in the VM, which I do not see in the Python world right now.

I know we are talking GHz here, but a system is not just a trac/mod_python instance, so every cycle spared will add to the overall performance of the system, especially to the I/O subsystem.

And if we could reduce the number of cycles for accessing the is_debug property to say 20, then this would be an even greater benefit, not considering the amount of cycles spared to assemble the parameters for each call to log.debug() by not entering the guarded call to log.debug() in the first place.

This then would lead to an efficiency increase of approx. 60% over the existing approach, without considering the additional overhead induced by assembling the method call parameters.

I reckon that the overall benefit could be raised to approx. 60..80% per call to the logging facility, based on what time and work we invest into the optimization of the LoggerWrapper, which eventually should be only a small effort.

comment:4 by Carsten Klein <carsten.klein@…>, 13 years ago

Note: the provided patch is in no way a final proposition, the interface is way to expressive and must be reduced. I for myself would go for the property based approach, with the lambdas calling the self.isEnabledFor(…) method then instead of calling the dedicated methods such as isDebug()…

in reply to:  3 ; comment:5 by Remy Blank, 13 years ago

Replying to Carsten Klein <carsten.klein@…>:

Remy: it is not about actual bottlenecks, it is about preventing unnecessary property access, string concatenation, and method calls, which, over time add to the overall response time of the application, especially under high load.

I did understand that from your description. Modifying all log statements the way you suggest will make the code less readable, so my question still holds: how much performance do we gain in return? Making assumptions is fine, but measuring is better. Take the worst case of logging you can find in core, profile it (with logging set to INFO or WARN) and calculate its share of response time. Then, replace it with:

if False:
    self.log.debug(...)

and measure again. Then multiply the difference by about half the number of logging statements reached during a typical request, and you know how much you will gain.

Yes, it's the "premature optimization is the root of all evil" thing again.

comment:6 by Christian Boos, 13 years ago

Ditto. Not to mention that lots of real bottlenecks have already been identified. If you want to help improve the performance, start by reading TracDev/Performance.

comment:7 by Christian Boos, 13 years ago

And more specifically about the impact of logging on performance, I suspect it has more to do with the logging lock that has to be acquired before emitting the log record, probably not with evaluating the arguments of the call. IOW, I think the performance impact of debug logging is only apparent at the DEBUG logging level.

in reply to:  5 comment:8 by Carsten Klein <carsten.klein@…>, 13 years ago

Replying to rblank:

Yes, it's the "premature optimization is the root of all evil" thing again.

Coming from the Java world, it is a common and well accepted idiom there to first ask whether debug was enabled and after that begin assembling the call to the logger's debug method, so that it will not impact on performance of the standard course of action when debugging was disabled.

But as for your request to actually do some profiling, I will do that when I find the time for it and report my findings. I will also set up a clone on github so that you can use the modified sources yourself for also doing some profiling.

comment:9 by Carsten Klein <carsten.klein@…>, 13 years ago

Ah now I see what you actually meant: the logging facility is not called that often by the code. So the number of times the logging facility is actually being called is rather negligible compared to the other calls in the system.

Could it be, that due to performance issues overall logging was stripped down to a minimum? Or was there never a need for trac to be more chatty about its internal state on the DEBUG log level?

I for my part would like trac to be more chatty during debug…

Nevertheless I will profile the new approach against the old approach.

in reply to:  9 ; comment:10 by Christian Boos, 13 years ago

Description: modified (diff)

Replying to Carsten Klein <carsten.klein@…>:

I for my part would like trac to be more chatty during debug…

Then try [trac] debug_sql = true, in trac.ini ;-)

in reply to:  9 ; comment:11 by Remy Blank, 13 years ago

Replying to Carsten Klein <carsten.klein@…>:

Nevertheless I will profile the new approach against the old approach.

Don't get me wrong: if you do find a significant overhead with the current logging, I'd be more than happy to apply patches improving the situation. My feeling, though, is that you might find at most one or two locations where logging really has an impact, and in that case, it might be simpler to optimize only those few locations (possibly by disabling those logging statements altogether) than to refactor the logging system.

in reply to:  10 comment:12 by Carsten Klein <carsten.klein@…>, 13 years ago

Replying to cboos:

Replying to Carsten Klein <carsten.klein@…>:

I for my part would like trac to be more chatty during debug…

Then try [trac] debug_sql = true, in trac.ini ;-)

:D

As you happen to mention it. This is actually a good example for placing a guard around the log.debug statements. (cf. trac.db.util.IterableCursor.)

But the way it is implemented cannot be used with components a/o model classes (an environment or logger thereof is always available), nor would it be feasible to introduce a debug_xxx flag for every plugin in the field.

in reply to:  11 comment:13 by Carsten Klein <carsten.klein@…>, 13 years ago

Replying to rblank:

Don't get me wrong: if you do find a significant overhead with the current logging, I'd be more than happy to apply patches improving the situation. My feeling, though, is that you might find at most one or two locations where logging really has an impact, and in that case, it might be simpler to optimize only those few locations (possibly by disabling those logging statements altogether) than to refactor the logging system.

Don't get me wrong either. Perhaps I have not made my self clear enough: as of now, component developers just like me refrain from putting debug statements into the code basically due to the fact that one cannot guard these statements. So basically, the enhancement would be for future or current plugin/component developers that put more debug statements into their components than just

try:
   do something
except:
   log.debug(something failed)
   raise

Eventually, I would like to place a debug statement everywhere in my code where it would make any sense, for example

log.debug(entering method.... args:)

do something, modify internal state

log.debug(internal state modified: state)

do even more

log.debug(exiting method.... state)

Now, with the current approach this definitely would slow down overall execution of the method.

Of course, existing sources would not have to be modified to use the new LoggerWrapper, they would work just like before, except that code using the new properties could be more expressive in their overall logging behaviour without having to introduce something like in for example the db layer.

comment:14 by Carsten Klein <carsten.klein@…>, 13 years ago

You can find a first prototype implementation of this here:

https://github.com/axnsoftware/trac-bugs-n-features/commits/trac-issue-9919

feel free to test it/comment upon it in this ticket.

comment:15 by Christian Boos, 13 years ago

Keywords: logging added
Resolution: wontfix
Status: newclosed

Well, the proposed changes can't provide any performance improvement in Trac, quite to the contrary, they add a test which will be anyway done in the logger subsystem, not to mention going through that extra layer.

If you really want the possibility to write such conditional logging statements in your code, you can always add helper functions like:

def is_info(logger):
    return logger.isEnabledFor(logging.INFO)

so that you can write:

if is_info(self.log):
   self.log.info("what's the context? %r", locals())

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The ticket will remain with no owner.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from (none) to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.