Edgewall Software

Opened 19 years ago

Last modified 18 years ago

#2170 closed defect

Crash while doing a login/logout in the Timeline — at Version 7

Reported by: Christian Boos Owned by: Christian Boos
Priority: highest Milestone: 0.9
Component: general Version: devel
Severity: critical Keywords: crash
Cc: vyt@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

This is somehow reproductible, though I got slightly different stack traces each time. But it always bombs in connection_commit, though.

Configuration:

  • tracd on Linux
  • python 2.3.4
  • pysqlite 2.0.4 with sqlite 3.2.7.

With older versions of pysqlite, I got simply dead-locks, but this was not a good thing either…

Here's the stack trace:

dserver.bct.de - - [04/Oct/2005 15:59:05] "GET /stable-bct-trac/login HTTP/1.1" 401 -
[New Thread 557091 (LWP 12565)]
Trac[session] DEBUG: Promoting anonymous session 90d7a758b262bd4e88eced41 to authenticated session for user cboos

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 557091 (LWP 12565)]
0x40582056 in pysqlite_sleep (seconds=-21474836.48) at src/util.c:37
37          Py_DECREF(ret);
(gdb)
(gdb)
(gdb) bt
#0  0x40582056 in pysqlite_sleep (seconds=-21474836.48) at src/util.c:37
#1  0x4058222d in _sqlite_step_with_busyhandler (statement=0x8550dc8, _connection=0x4084e988) at src/util.c:86
#2  0x4057d940 in connection_commit (self=0x4084e988, args=0x0) at src/connection.c:230
#3  0x080c0c42 in call_function (pp_stack=0x45363450, oparg=0) at Python/ceval.c:3425
#4  0x080bd5d4 in eval_frame (f=0x844ee8c) at Python/ceval.c:2116
#5  0x080c102c in fast_function (func=0x40839e44, pp_stack=0x453635d0, n=2, na=2, nk=0) at Python/ceval.c:3518
#6  0x080c0e46 in call_function (pp_stack=0x453635d0, oparg=1) at Python/ceval.c:3458
#7  0x080bd5d4 in eval_frame (f=0x84b675c) at Python/ceval.c:2116
...

pyframe points to _do_login, in source:trunk/trac/web/auth.py@2326#L119

Change History (7)

comment:1 by Christian Boos, 19 years ago

Ok, now I manage to reproduce the crash systematically:

  • in Thunderbird, I refresh the Timeline feed
  • half a second later, in Firefox, I reload the page which does a logout

Now, looking more closely to the pysqlite2 code within gdb, I located the cause of the crash itself:

  • util.c

    old new  
    8080        }
    8181
    8282        how_long = 0.01* (1 << counter);
    83         if (how_long > 1.0) {
     83        if (how_long > 1.0 || counter > 30) {
    8484            how_long = 1.0;
    8585        }
    8686        pysqlite_sleep(how_long);
    8787
    8888        counter++;

… now I'm back with my database locks :)

Ok, it looks like a commit is not "atomic" at all, and with our database pool, we reuse database connections which are in a commit in progress, hence the deadlock.

comment:2 by Christian Boos, 19 years ago

I disabled 2.0.4 and reinstalled pysqlite-1.1.6: Now I can still trigger the database lock by doing the operations described above but the database recovers from that state for subsequent queries, whereas with 2.0.4, the database remains locked forever.

comment:3 by Christian Boos, 19 years ago

Description: modified (diff)
Owner: changed from Jonas Borgström to Christian Boos
Status: newassigned

I just filled pysqlite:ticket:124, which contains some additional explanations about the issue: in short, we have to use 10 and not 10000 for the timeout value when using pysqlite2:

  • db.py

     
    254254                                 'directory it is located in.' \
    255255                                 % (getuser(), path)
    256256
    257         timeout = int(params.get('timeout', 10000))
    258257        if have_pysqlite == 2:
     258            timeout = int(params.get('timeout', 10)) # 10 seconds
    259259            # Convert unicode to UTF-8 bytestrings. This is case-sensitive, so
    260260            # we need two converters
    261261            sqlite.register_converter('text', str)
     
    264264            cnx = sqlite.connect(path, detect_types=sqlite.PARSE_DECLTYPES,
    265265                                 check_same_thread=False, timeout=timeout)
    266266        else:
     267            timeout = int(params.get('timeout', 10000)) # 10 seconds
    267268            cnx = sqlite.connect(path, timeout=timeout)
    268269        ConnectionWrapper.__init__(self, cnx)

comment:4 by Christian Boos, 19 years ago

… not that simple: applying the previous change was not enough, because the database effectively remains locked when using 2.0.4.

I see in the log that I got this exception:

Exception pysqlite2.dbapi2.OperationalError: 'cannot rollback transaction - SQL statements in progress' in <bound method PooledConnection.__del__ of <trac.db.PooledConnection object at 0x40cf2814>> ignored

which I can relate to the SQLiteConnection object which was doing the COMMIT. So it appears that the connection object doesn't recover cleanly after a commit which timed out.

comment:5 by Christian Boos, 19 years ago

I reverted to pysqlite-1.1.6, but I still manage to have persistent database locks, as soon as Trac needs to COMMIT anything:

Traceback (most recent call last):
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/web/standalone.py", line 235, in _do_trac_req
    dispatch_request(path_info, req, env)
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/web/main.py", line 139, in dispatch_request
    dispatcher.dispatch(req)
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/web/main.py", line 107, in dispatch
    resp = chosen_handler.process_request(req)
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/web_ui/browser.py", line 122, in process_request
    self._render_directory(req, repos, node, rev)
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/web_ui/browser.py", line 154, in _render_directory
    changes = get_changes(self.env, repos, [i['rev'] for i in info])
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/web_ui/util.py", line 30, in get_changes
    changeset = repos.get_changeset(rev)
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 42, in get_changeset
    self.sync()
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 81, in sync
    self.db.commit()
  File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/db.py", line 131, in commit
    self.cnx.commit()
  File "/usr/lib/python2.3/site-packages/sqlite/main.py", line 539, in commit
    self.db.execute("COMMIT")
OperationalError: database is locked

Read access is OK, which is a little bit better than with 2.0.4 …

comment:6 by Christian Boos, 19 years ago

Isn't the problems occurring because we lend ourselves to database locks?

  1. request A - thread A : involves a repo sync.
    This will start a COMMIT, the COMMIT will be "in progress" for a while
  2. shortly after: request B - thread B : only needs to get access to the DB for a SELECT. But there's a COMMIT in progress, so it will fail, database lock for B.
  3. since B initiated a query, the commit in A will also fail! Database lock for A too.

Step 3. is wrong, IMO, A shouldn't fail. At this point, I'm not sure who's the culprit: SQLite, pysqlite, or the fact that we allow B even if there's a COMMIT in progress.

comment:7 by Christian Boos, 19 years ago

Description: modified (diff)

Found some relevant discussion about the problem:

Consider a scenario where one process is holding a read lock that it is trying to promote to a reserved lock and a second process is holding a reserved lock that it is trying to promote to an exclusive lock. The first process cannot proceed because it is blocked by the second and the second process cannot proceed because it is blocked by the first. If both processes invoke the busy handlers, neither will make any progress.

  • Karl Mierle had a similar issue back in June with Argon, which he describe there Locking
  • The SQLite author was contacted by the PySQLite author about the former problem, and he replied
    • use EXCLUSIVE transactions, but this reduce the amount of concurrency
    • one of the involved threads must ROLLBACK
Note: See TracTickets for help on using tickets.