#9418 closed defect (fixed)
Creating instances of Component class is unsafe in multi-thread
Reported by: | Jun Omae | Owned by: | Jun Omae |
---|---|---|---|
Priority: | normal | Milestone: | 1.0 |
Component: | general | Version: | 0.12dev |
Severity: | normal | Keywords: | core component threadsafe |
Cc: | leho@…, felix.schwarz@… | Branch: | |
Release Notes: |
Various internal improvements to the way Components are initialized |
||
API Changes: |
Ensure that a |
||
Internal Changes: |
Description
The fetch/store operations are not atomic for ComponentManager.components
in trac/core.py:115-121,161.
The operations are unsafe in multi-thread.
Trac 0.12rc1 and 0.11.7 have the problem.
Test code
#! /usr/bin/python import sys import time from threading import Thread, Event from trac import __version__ as VERSION from trac.core import Component, ComponentManager class MTComponent(Component): def __init__(self): pass class TestThread(Thread): def __init__(self, manager, event): Thread.__init__(self) self.manager = manager self.event = event def run(self): manager = self.manager self.event.wait() obj = MTComponent(manager) self.id = id(obj) def main(): print 'Python %s' % sys.version print 'Trac %s' % VERSION for i in range(0, 100): manager = ComponentManager() event = Event() event.clear() threads = [] for j in range(0, 64): thr = TestThread(manager, event) threads.append(thr) thr.start() sys.setcheckinterval(1) event.set() for thr in threads: thr.join() ids = set(thr.id for thr in threads) if len(ids) != 1: print '#%d: Unsafe in multi-thread - %r' % (i, ids) main()
Result
Python 2.4.3 (#1, Sep 3 2009, 15:37:12) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] Trac 0.12rc1 #12: Unsafe in multi-thread - set([-1211591892, -1211592340]) #13: Unsafe in multi-thread - set([-1211591796, -1211592116]) #14: Unsafe in multi-thread - set([-1211592180, -1211591700, -1211591732]) #20: Unsafe in multi-thread - set([-1211591348, -1211591732]) #42: Unsafe in multi-thread - set([-1211591188, -1211592308, -1211591156]) #46: Unsafe in multi-thread - set([-1211590932, -1211592052]) #52: Unsafe in multi-thread - set([-1211590548, -1211590516]) #57: Unsafe in multi-thread - set([-1211590196, -1211590964]) #60: Unsafe in multi-thread - set([-1211590004, -1211592116]) #70: Unsafe in multi-thread - set([-1211589300, -1211589684, -1211589332]) #73: Unsafe in multi-thread - set([-1211589140, -1211590868]) #76: Unsafe in multi-thread - set([-1211588948, -1211591220]) #83: Unsafe in multi-thread - set([-1211555700, -1211591860]) #84: Unsafe in multi-thread - set([-1211555764, -1211555636]) #86: Unsafe in multi-thread - set([-1211555508, -1211555700]) #96: Unsafe in multi-thread - set([-1211590548, -1211588820])
Attachments (2)
Change History (24)
comment:1 by , 14 years ago
follow-up: 3 comment:2 by , 14 years ago
(Heh, mid-air collision :)
Yes, we know that. It has been like that since the component architecture was created, but has never been an issue in practice AFAIK. I'm not sure it's worth making them completely thread-safe, for the following reasons:
- Components usually do almost nothing in the constructor, if they even have one.
- Once a component has been instantiated, it remains registered with the environment, and won't be instantiated again. So this is only an issue on startup, and yes, at that point it is possible that a component is instantiated twice, and one is discarded.
- Components don't (or shouldn't) keep references to other components as attributes. They always retrieve components with
MyComponent(env)
, which ensures that once the startup phase is completed, always the same instance is returned. - Making the components thread-safe has a permanent performance impact (not only at startup) that should be measured, and brings its own potential issues with locking.
Suggesting "wontfix", but I guess I'll be preempted on that one ;)
About #8658, I don't understand how it's possible for methods to be called before __init__()
completes. Isn't the component be registered with the environment after __init__()
has finished running?
follow-up: 4 comment:3 by , 14 years ago
Milestone: | 0.12 |
---|
Replying to rblank:
(Heh, mid-air collision :)
Yes, we know that. It has been like that since the component architecture was created, but has never been an issue in practice AFAIK.
Well, #8658 is precisely such an example, and I've seen other cases in t.e.o logs.
About #8658, I don't understand how it's possible for methods to be called before
__init__()
completes. Isn't the component be registered with the environment after__init__()
has finished running?
Consider the replacement method for __init__
, installed by the ComponentMeta
metaclass of components:
def maybe_init(self, compmgr, init=init, cls=new_class): if cls not in compmgr.components: compmgr.components[cls] = self if init: try: init(self)
Imagine 2 threads calling MyComponent(mgr)
concurrently:
- thread A goes till the
compmgr.components[cls] = self
statement - (context switch)
- thread B now passes the
if cls not in compmgr.components
and ends up using the instance, uninitialized - (context switch)
- thread A will now call
init(self)
, a bit too late…
Changing the code without adding a lock only leads to other kind of similar bugs (e.g. init()
called twice), so we really need synchronization here. But as you already noted, this presents the risk of running into deadlocks if not done carefully. So I think we should keep this on the radar, but definitely not for 0.12 (due in the coming days).
follow-up: 6 comment:4 by , 14 years ago
Replying to cboos:
Changing the code without adding a lock only leads to other kind of similar bugs (e.g.
init()
called twice), so we really need synchronization here.
That was my suggestion: register with the component manager after calling init()
instead of before. We could call init()
twice, but on distinct component instances, both would be initialized correctly, one would be registered and the other would be dropped. This can already happen now, if a thread switch occurs between __new__()
and __init__()
. So we would keep only one pathological case instead of two.
comment:5 by , 14 years ago
Cc: | added |
---|
follow-up: 7 comment:6 by , 14 years ago
Milestone: | → 0.12.1 |
---|
Replying to rblank:
… We could call
init()
twice, but on distinct component instances, both would be initialized correctly… This can already happen now
You're right, this can already happen now, although even more rarely. So what about:
-
trac/core.py
99 99 return new_class 100 100 101 101 # Only override __init__ for Components not inheriting ComponentManager 102 if True not in [issubclass(x, ComponentManager) for x in bases]:102 if not any(issubclass(x, ComponentManager) for x in bases): 103 103 # Allow components to have a no-argument initializer so that 104 104 # they don't need to worry about accepting the component manager 105 105 # as argument and invoking the super-class initializer … … 113 113 break 114 114 def maybe_init(self, compmgr, init=init, cls=new_class): 115 115 if cls not in compmgr.components: 116 compmgr.components[cls] = self117 116 if init: 118 try: 119 init(self) 120 except: 121 del compmgr.components[cls] 122 raise 117 init(self) 118 compmgr.components[cls] = self 123 119 maybe_init._original = init 124 120 new_class.__init__ = maybe_init
(the first chunk is just a Python 2.3 → 2.4 clean-up).
But somehow I don't think it would be a good idea to apply before 0.12, on the ground that we can't be 100% sure this won't make things worse in some cases and as this bug has been there forever we can well wait for another release.
Fixing the issue fully would still be the goal of #8658, because Component
instances are often referred to as being unique in the scope of one ComponentManager
and I think it would be nice if we could guarantee that.
follow-up: 8 comment:7 by , 14 years ago
Replying to cboos:
You're right, this can already happen now, although even more rarely. So what about:
Yes, that's exactly what I was thinking. Don't forget to import any()
from trac.util.compat
for 2.4.
But somehow I don't think it would be a good idea to apply before 0.12, on the ground that we can't be 100% sure this won't make things worse in some cases and as this bug has been there forever we can well wait for another release.
Yes, I agree.
comment:8 by , 14 years ago
Keywords: | core component threadsafe added |
---|---|
Milestone: | 0.12.1 → next-major-0.1X |
Replying to rblank:
Replying to cboos:
You're right, this can already happen now, although even more rarely. So what about:
Yes, that's exactly what I was thinking. Don't forget to import
any()
fromtrac.util.compat
for 2.4
Oh well, I always forgot this is only coming with 2.5. I'll wait for that change then, no need to introduce a dependency on trac.util
just for that. I know people like to reuse trac.core
by copy/pasting it, let's not make that harder by adding an import
.
Regarding the breaking of work between this ticket #9418 and #8658, I think it makes more sense to associate my patch to #8658, as it will fix the error reported there, and keep this one for a possible future improvement making Component initialization thread-safe, which also matches the present summary.
by , 14 years ago
Attachment: | 9418-r9841.diff added |
---|
comment:9 by , 14 years ago
Understood that the issue is rarely and the performance impact.
Created 9418-r9841.diff. New ComponentMeta.__call__
method. The method locks for each component class and returns the same instance for each component class.
I maybe think that the patch escapes the performance impact (e.g. slow __init__
method, etc.).
follow-ups: 11 12 comment:10 by , 14 years ago
Interesting. You're effectively making sure __init__
is only called once by reimplementing __call__
; that's a great idea! But then, I wonder if it still make sense to keep the maybe_init
and associated logic, we can just call the normal __init__
, no?
Your approach with two levels of locks seems to minimize the risk of deadlock. The only situation I can think about for now would be two components for which the __init__
methods would try to instantiate the other component, something which is buggy anyway, of course.
We need to get a feel of the performance impact this has, and see if there are no locks by ways we didn't expect.
comment:11 by , 14 years ago
we can just call the normal
__init__
, no?
-
trac/core.py
diff --git a/trac/core.py b/trac/core.py
a b class ComponentMeta(type): 104 104 # Don't put the Component base class in the registry 105 105 return new_class 106 106 107 # Only override __init__ for Components not inheriting ComponentManager108 if True not in [issubclass(x, ComponentManager) for x in bases]:109 # Allow components to have a no-argument initializer so that110 # they don't need to worry about accepting the component manager111 # as argument and invoking the super-class initializer112 init = d.get('__init__')113 if not init:114 # Because we're replacing the initializer, we need to make sure115 # that any inherited initializers are also called.116 for init in [b.__init__._original for b in new_class.mro()117 if issubclass(b, Component)118 and '__init__' in b.__dict__]:119 break120 def maybe_init(self, compmgr, init=init, cls=new_class):121 if init:122 init(self)123 maybe_init._original = init124 new_class.__init__ = maybe_init125 126 107 if d.get('abstract'): 127 108 # Don't put abstract component classes in the registry 128 109 return new_class … … class ComponentMeta(type): 157 138 if self is not None: 158 139 return self 159 140 self = cls.__new__(cls, *args, **kwargs) 160 self.__init__(* args, **kwargs)141 self.__init__(*list(args)[1:], **kwargs) 161 142 compmgr.components[cls] = self 162 143 return self 163 144 finally:
Indeed, it seems to work fine.
comment:12 by , 14 years ago
Replying to cboos:
we can just call the normal
__init__
, no?
Thanks for commnet:11. Cause make test
didn't pass, I kept the maybe_init
method…
I revised the previous patch - 9416-r9841-without-lock.diff.
The revised patch …
- Remove
maybe_init
method that is complex and difficult. - Move
Component.__new__
toComponentMeta.__call__
. - fetch/store of
ComponentManager.components
are simple. - Wihtout lock
We need to get a feel of the performance impact this has, and see if there are no locks by ways we didn't expect.
I agreed.
by , 14 years ago
Attachment: | 9416-r9841-without-lock.diff added |
---|
comment:13 by , 14 years ago
Cc: | added |
---|
comment:14 by , 14 years ago
Milestone: | next-major-0.1X → 0.13 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
9416-r9841-without-lock.diff applied in r10295, thanks!
For now, I'd say the above fix will do, and I fear that adding some locking mechanism would be much more trouble than its worth. If anyone has a clean and robust solution though, let us know and reopen.
Note also that finally the simpler fix in #8658 didn't stay in 0.12-stable, so that branch will have to live with the bug.
comment:15 by , 14 years ago
Owner: | set to |
---|
follow-up: 17 comment:16 by , 14 years ago
Now that I have finally understood how this is working, I have a few questions. Why make a copy of args
just for the purpose of passing it to the __init__()
method?
self.__init__(*list(args)[1:], **kwargs)
Isn't args
always a tuple anyway? We could use the simpler:
self.__init__(*args[1:], **kwargs)
Or even not pass any arguments at all, as component constructors never have arguments. That is, change the signature of __call__()
to:
def __call__(self, compmgr=None):
Another question: when instantiating the component manager, it is created with:
self = super(Component, cls).__new__(cls)
Other components are created with:
self = cls.__new__(cls)
What's the reason for this asymmetry?
follow-up: 18 comment:17 by , 14 years ago
Replying to rblank:
Isn't
args
always a tuple anyway? We could use the simpler:self.__init__(*args[1:], **kwargs)
This would work as well, yes. I'm not sure why I came up with *list(...)
, maybe I was unsure of the *args[1:]
syntax and forgot to try out. Using the latter works fine indeed.
Or even not pass any arguments at all, as component constructors never have arguments. That is, change the signature of
__call__()
to:def __call__(self, compmgr=None):
I'm not sure we would gain anything by restricting the signature. Agreed, currently all the components are initialized with Comp(env)
, but keeping the generic way, this would simply allow one to pass whatever arguments to the Comp.__init__()
method, for the component instantiation.
Another question: when instantiating the component manager, it is created with:
self = super(Component, cls).__new__(cls)Other components are created with:
self = cls.__new__(cls)What's the reason for this asymmetry?
Jun would need to answer this one. Currently it doesn't make a difference one way or the other, as neither ComponentManager
nor Component
classes or its subclasses implement __new__
(any longer). Possibly he simplified only the second after removing Component.__new__
and forgot to do it for the first.
If you feel like those changes are worth doing, just say so, I have them pending and tested ;-)
comment:18 by , 14 years ago
Replying to cboos:
I'm not sure we would gain anything by restricting the signature. Agreed, currently all the components are initialized with
Comp(env)
, but keeping the generic way, this would simply allow one to pass whatever arguments to theComp.__init__()
method, for the component instantiation.
It would reduce confusion. Passing arguments to Comp.__init__()
isn't practical anyway, as you would have to pass them (always the same) every time you want to get a reference to the component, as you never know when it's going to be instantiated.
If you feel like those changes are worth doing, just say so, I have them pending and tested ;-)
If you thing the points are valid, then yes, please do :)
follow-ups: 20 21 comment:19 by , 14 years ago
Clean-up committed in r10298, along with some more comments.
As for parameters to __init__()
, I was thinking about situations where you have enough control… but it's true that we don't use this idiom here, and I don't know if there are plugins who take benefit of this (Agilo? Felix may want to chime in…).
I'd be OK to change this and enforce the __init__(self)
signature for non-ComponentManager
components.
comment:20 by , 14 years ago
Replying to cboos:
As for parameters to
__init__()
, I was thinking about situations where you have enough control… but it's true that we don't use this idiom here, and I don't know if there are plugins who take benefit of this (Agilo? Felix may want to chime in…).
I think we don't use parameters for init in Agilo. We try to adher to the Trac way if possible ;-)
comment:21 by , 14 years ago
… enforce the
__init__(self)
signature for non-ComponentManager
components.
Done in r10300.
Yes, known issue, the problem has always been there. This is a bit annoying as this can manifest itself in various ways, all resulting from the Component's
__init__
method not having been called. See #8658.I leave it up to you to close one or the other as duplicate (for example, if you have an upcoming fix, keep this one open).