Edgewall Software

Opened 14 years ago

Closed 13 years ago

Last modified 12 years ago

#9418 closed defect (fixed)

Creating instances of Component class is unsafe in multi-thread — at Version 22

Reported by: Jun Omae Owned by: Jun Omae
Priority: normal Milestone: 1.0
Component: general Version: 0.12dev
Severity: normal Keywords: core component threadsafe
Cc: leho@…, felix.schwarz@… Branch:
Release Notes:

Various internal improvements to the way Components are initialized

API Changes:

Ensure that a Component's __init__(self) method is always called before any other.

Internal Changes:

Description

The fetch/store operations are not atomic for ComponentManager.components in trac/core.py:115-121,161. The operations are unsafe in multi-thread.

Trac 0.12rc1 and 0.11.7 have the problem.

Test code

#! /usr/bin/python

import sys
import time
from threading import Thread, Event

from trac import __version__ as VERSION
from trac.core import Component, ComponentManager


class MTComponent(Component):
    def __init__(self):
        pass

class TestThread(Thread):
    def __init__(self, manager, event):
        Thread.__init__(self)
        self.manager = manager
        self.event = event

    def run(self):
        manager = self.manager
        self.event.wait()
        obj = MTComponent(manager)
        self.id = id(obj)

def main():
    print 'Python %s' % sys.version
    print 'Trac %s' % VERSION

    for i in range(0, 100):
        manager = ComponentManager()
        event = Event()
        event.clear()
        threads = []
        for j in range(0, 64):
            thr = TestThread(manager, event)
            threads.append(thr)
            thr.start()
        sys.setcheckinterval(1)
        event.set()
        for thr in threads:
            thr.join()
        ids = set(thr.id for thr in threads)
        if len(ids) != 1:
            print '#%d: Unsafe in multi-thread - %r' % (i, ids)

main()

Result

Python 2.4.3 (#1, Sep  3 2009, 15:37:12)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
Trac 0.12rc1
#12: Unsafe in multi-thread - set([-1211591892, -1211592340])
#13: Unsafe in multi-thread - set([-1211591796, -1211592116])
#14: Unsafe in multi-thread - set([-1211592180, -1211591700, -1211591732])
#20: Unsafe in multi-thread - set([-1211591348, -1211591732])
#42: Unsafe in multi-thread - set([-1211591188, -1211592308, -1211591156])
#46: Unsafe in multi-thread - set([-1211590932, -1211592052])
#52: Unsafe in multi-thread - set([-1211590548, -1211590516])
#57: Unsafe in multi-thread - set([-1211590196, -1211590964])
#60: Unsafe in multi-thread - set([-1211590004, -1211592116])
#70: Unsafe in multi-thread - set([-1211589300, -1211589684, -1211589332])
#73: Unsafe in multi-thread - set([-1211589140, -1211590868])
#76: Unsafe in multi-thread - set([-1211588948, -1211591220])
#83: Unsafe in multi-thread - set([-1211555700, -1211591860])
#84: Unsafe in multi-thread - set([-1211555764, -1211555636])
#86: Unsafe in multi-thread - set([-1211555508, -1211555700])
#96: Unsafe in multi-thread - set([-1211590548, -1211588820])

Change History (24)

comment:1 by Christian Boos, 14 years ago

Yes, known issue, the problem has always been there. This is a bit annoying as this can manifest itself in various ways, all resulting from the Component's __init__ method not having been called. See #8658.

I leave it up to you to close one or the other as duplicate (for example, if you have an upcoming fix, keep this one open).

comment:2 by Remy Blank, 14 years ago

(Heh, mid-air collision :)

Yes, we know that. It has been like that since the component architecture was created, but has never been an issue in practice AFAIK. I'm not sure it's worth making them completely thread-safe, for the following reasons:

  • Components usually do almost nothing in the constructor, if they even have one.
  • Once a component has been instantiated, it remains registered with the environment, and won't be instantiated again. So this is only an issue on startup, and yes, at that point it is possible that a component is instantiated twice, and one is discarded.
  • Components don't (or shouldn't) keep references to other components as attributes. They always retrieve components with MyComponent(env), which ensures that once the startup phase is completed, always the same instance is returned.
  • Making the components thread-safe has a permanent performance impact (not only at startup) that should be measured, and brings its own potential issues with locking.

Suggesting "wontfix", but I guess I'll be preempted on that one ;)

About #8658, I don't understand how it's possible for methods to be called before __init__() completes. Isn't the component be registered with the environment after __init__() has finished running?

in reply to:  2 ; comment:3 by Christian Boos, 14 years ago

Milestone: 0.12

Replying to rblank:

(Heh, mid-air collision :)

Yes, we know that. It has been like that since the component architecture was created, but has never been an issue in practice AFAIK.

Well, #8658 is precisely such an example, and I've seen other cases in t.e.o logs.

About #8658, I don't understand how it's possible for methods to be called before __init__() completes. Isn't the component be registered with the environment after __init__() has finished running?

Consider the replacement method for __init__, installed by the ComponentMeta metaclass of components:

           def maybe_init(self, compmgr, init=init, cls=new_class):
                if cls not in compmgr.components:
                    compmgr.components[cls] = self
                    if init:
                        try:
                            init(self)

Imagine 2 threads calling MyComponent(mgr) concurrently:

  1. thread A goes till the compmgr.components[cls] = self statement
  2. (context switch)
  3. thread B now passes the if cls not in compmgr.components and ends up using the instance, uninitialized
  4. (context switch)
  5. thread A will now call init(self), a bit too late…

Changing the code without adding a lock only leads to other kind of similar bugs (e.g. init() called twice), so we really need synchronization here. But as you already noted, this presents the risk of running into deadlocks if not done carefully. So I think we should keep this on the radar, but definitely not for 0.12 (due in the coming days).

in reply to:  3 ; comment:4 by Remy Blank, 14 years ago

Replying to cboos:

Changing the code without adding a lock only leads to other kind of similar bugs (e.g. init() called twice), so we really need synchronization here.

That was my suggestion: register with the component manager after calling init() instead of before. We could call init() twice, but on distinct component instances, both would be initialized correctly, one would be registered and the other would be dropped. This can already happen now, if a thread switch occurs between __new__() and __init__(). So we would keep only one pathological case instead of two.

comment:5 by lkraav <leho@…>, 14 years ago

Cc: leho@… added

in reply to:  4 ; comment:6 by Christian Boos, 14 years ago

Milestone: 0.12.1

Replying to rblank:

… We could call init() twice, but on distinct component instances, both would be initialized correctly… This can already happen now

You're right, this can already happen now, although even more rarely. So what about:

  • trac/core.py

     
    9999            return new_class
    100100
    101101        # Only override __init__ for Components not inheriting ComponentManager
    102         if True not in [issubclass(x, ComponentManager) for x in bases]:
     102        if not any(issubclass(x, ComponentManager) for x in bases):
    103103            # Allow components to have a no-argument initializer so that
    104104            # they don't need to worry about accepting the component manager
    105105            # as argument and invoking the super-class initializer
     
    113113                    break
    114114            def maybe_init(self, compmgr, init=init, cls=new_class):
    115115                if cls not in compmgr.components:
    116                     compmgr.components[cls] = self
    117116                    if init:
    118                         try:
    119                             init(self)
    120                         except:
    121                             del compmgr.components[cls]
    122                             raise
     117                        init(self)
     118                    compmgr.components[cls] = self
    123119            maybe_init._original = init
    124120            new_class.__init__ = maybe_init

(the first chunk is just a Python 2.3 → 2.4 clean-up).

But somehow I don't think it would be a good idea to apply before 0.12, on the ground that we can't be 100% sure this won't make things worse in some cases and as this bug has been there forever we can well wait for another release.

Fixing the issue fully would still be the goal of #8658, because Component instances are often referred to as being unique in the scope of one ComponentManager and I think it would be nice if we could guarantee that.

in reply to:  6 ; comment:7 by Remy Blank, 14 years ago

Replying to cboos:

You're right, this can already happen now, although even more rarely. So what about:

Yes, that's exactly what I was thinking. Don't forget to import any() from trac.util.compat for 2.4.

But somehow I don't think it would be a good idea to apply before 0.12, on the ground that we can't be 100% sure this won't make things worse in some cases and as this bug has been there forever we can well wait for another release.

Yes, I agree.

in reply to:  7 comment:8 by Christian Boos, 14 years ago

Keywords: core component threadsafe added
Milestone: 0.12.1next-major-0.1X

Replying to rblank:

Replying to cboos:

You're right, this can already happen now, although even more rarely. So what about:

Yes, that's exactly what I was thinking. Don't forget to import any() from trac.util.compat for 2.4

Oh well, I always forgot this is only coming with 2.5. I'll wait for that change then, no need to introduce a dependency on trac.util just for that. I know people like to reuse trac.core by copy/pasting it, let's not make that harder by adding an import.

Regarding the breaking of work between this ticket #9418 and #8658, I think it makes more sense to associate my patch to #8658, as it will fix the error reported there, and keep this one for a possible future improvement making Component initialization thread-safe, which also matches the present summary.

by Jun Omae, 14 years ago

Attachment: 9418-r9841.diff added

comment:9 by Jun Omae, 14 years ago

Understood that the issue is rarely and the performance impact.

Created 9418-r9841.diff. New ComponentMeta.__call__ method. The method locks for each component class and returns the same instance for each component class.

I maybe think that the patch escapes the performance impact (e.g. slow __init__ method, etc.).

comment:10 by Christian Boos, 14 years ago

Interesting. You're effectively making sure __init__ is only called once by reimplementing __call__; that's a great idea! But then, I wonder if it still make sense to keep the maybe_init and associated logic, we can just call the normal __init__, no?

Your approach with two levels of locks seems to minimize the risk of deadlock. The only situation I can think about for now would be two components for which the __init__ methods would try to instantiate the other component, something which is buggy anyway, of course.

We need to get a feel of the performance impact this has, and see if there are no locks by ways we didn't expect.

in reply to:  10 comment:11 by Christian Boos, 14 years ago

we can just call the normal __init__, no?

  • trac/core.py

    diff --git a/trac/core.py b/trac/core.py
    a b class ComponentMeta(type):  
    104104            # Don't put the Component base class in the registry
    105105            return new_class
    106106
    107         # Only override __init__ for Components not inheriting ComponentManager
    108         if True not in [issubclass(x, ComponentManager) for x in bases]:
    109             # Allow components to have a no-argument initializer so that
    110             # they don't need to worry about accepting the component manager
    111             # as argument and invoking the super-class initializer
    112             init = d.get('__init__')
    113             if not init:
    114                 # Because we're replacing the initializer, we need to make sure
    115                 # that any inherited initializers are also called.
    116                 for init in [b.__init__._original for b in new_class.mro()
    117                              if issubclass(b, Component)
    118                              and '__init__' in b.__dict__]:
    119                     break
    120             def maybe_init(self, compmgr, init=init, cls=new_class):
    121                 if init:
    122                     init(self)
    123             maybe_init._original = init
    124             new_class.__init__ = maybe_init
    125 
    126107        if d.get('abstract'):
    127108            # Don't put abstract component classes in the registry
    128109            return new_class
    class ComponentMeta(type):  
    157138            if self is not None:
    158139                return self
    159140            self = cls.__new__(cls, *args, **kwargs)
    160             self.__init__(*args, **kwargs)
     141            self.__init__(*list(args)[1:], **kwargs)
    161142            compmgr.components[cls] = self
    162143            return self
    163144        finally:

Indeed, it seems to work fine.

in reply to:  10 comment:12 by Jun Omae, 14 years ago

Replying to cboos:

we can just call the normal __init__, no?

Thanks for commnet:11. Cause make test didn't pass, I kept the maybe_init method… I revised the previous patch - 9416-r9841-without-lock.diff.

The revised patch …

  1. Remove maybe_init method that is complex and difficult.
  2. Move Component.__new__ to ComponentMeta.__call__.
  3. fetch/store of ComponentManager.components are simple.
  4. Wihtout lock

We need to get a feel of the performance impact this has, and see if there are no locks by ways we didn't expect.

I agreed.

by Jun Omae, 14 years ago

comment:13 by Felix Schwarz, 14 years ago

Cc: felix.schwarz@… added

comment:14 by Christian Boos, 13 years ago

Milestone: next-major-0.1X0.13
Resolution: fixed
Status: newclosed

9416-r9841-without-lock.diff applied in r10295, thanks!

For now, I'd say the above fix will do, and I fear that adding some locking mechanism would be much more trouble than its worth. If anyone has a clean and robust solution though, let us know and reopen.

Note also that finally the simpler fix in #8658 didn't stay in 0.12-stable, so that branch will have to live with the bug.

comment:15 by Christian Boos, 13 years ago

Owner: set to Jun Omae

comment:16 by Remy Blank, 13 years ago

Now that I have finally understood how this is working, I have a few questions. Why make a copy of args just for the purpose of passing it to the __init__() method?

self.__init__(*list(args)[1:], **kwargs)

Isn't args always a tuple anyway? We could use the simpler:

self.__init__(*args[1:], **kwargs)

Or even not pass any arguments at all, as component constructors never have arguments. That is, change the signature of __call__() to:

def __call__(self, compmgr=None):

Another question: when instantiating the component manager, it is created with:

self = super(Component, cls).__new__(cls)

Other components are created with:

self = cls.__new__(cls)

What's the reason for this asymmetry?

in reply to:  16 ; comment:17 by Christian Boos, 13 years ago

Replying to rblank:

Isn't args always a tuple anyway? We could use the simpler:

self.__init__(*args[1:], **kwargs)

This would work as well, yes. I'm not sure why I came up with *list(...), maybe I was unsure of the *args[1:] syntax and forgot to try out. Using the latter works fine indeed.

Or even not pass any arguments at all, as component constructors never have arguments. That is, change the signature of __call__() to:

def __call__(self, compmgr=None):

I'm not sure we would gain anything by restricting the signature. Agreed, currently all the components are initialized with Comp(env), but keeping the generic way, this would simply allow one to pass whatever arguments to the Comp.__init__() method, for the component instantiation.

Another question: when instantiating the component manager, it is created with:

self = super(Component, cls).__new__(cls)

Other components are created with:

self = cls.__new__(cls)

What's the reason for this asymmetry?

Jun would need to answer this one. Currently it doesn't make a difference one way or the other, as neither ComponentManager nor Component classes or its subclasses implement __new__ (any longer). Possibly he simplified only the second after removing Component.__new__ and forgot to do it for the first.

If you feel like those changes are worth doing, just say so, I have them pending and tested ;-)

in reply to:  17 comment:18 by Remy Blank, 13 years ago

Replying to cboos:

I'm not sure we would gain anything by restricting the signature. Agreed, currently all the components are initialized with Comp(env), but keeping the generic way, this would simply allow one to pass whatever arguments to the Comp.__init__() method, for the component instantiation.

It would reduce confusion. Passing arguments to Comp.__init__() isn't practical anyway, as you would have to pass them (always the same) every time you want to get a reference to the component, as you never know when it's going to be instantiated.

If you feel like those changes are worth doing, just say so, I have them pending and tested ;-)

If you thing the points are valid, then yes, please do :)

comment:19 by Christian Boos, 13 years ago

Clean-up committed in r10298, along with some more comments.

As for parameters to __init__(), I was thinking about situations where you have enough control… but it's true that we don't use this idiom here, and I don't know if there are plugins who take benefit of this (Agilo? Felix may want to chime in…).

I'd be OK to change this and enforce the __init__(self) signature for non-ComponentManager components.

in reply to:  19 comment:20 by Felix Schwarz, 13 years ago

Replying to cboos:

As for parameters to __init__(), I was thinking about situations where you have enough control… but it's true that we don't use this idiom here, and I don't know if there are plugins who take benefit of this (Agilo? Felix may want to chime in…).

I think we don't use parameters for init in Agilo. We try to adher to the Trac way if possible ;-)

in reply to:  19 comment:21 by Christian Boos, 13 years ago

… enforce the __init__(self) signature for non-ComponentManager components.

Done in r10300.

comment:22 by Christian Boos, 12 years ago

API Changes: modified (diff)
Release Notes: modified (diff)
Note: See TracTickets for help on using tickets.