#2170 closed defect (fixed)
Crash while doing a login/logout in the Timeline
Reported by: | Christian Boos | Owned by: | Christian Boos |
---|---|---|---|
Priority: | highest | Milestone: | 0.9 |
Component: | general | Version: | devel |
Severity: | critical | Keywords: | crash |
Cc: | vyt@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
This is somehow reproductible, though I got slightly different stack
traces each time.
But it always bombs in connection_commit
, though.
Configuration:
- tracd on Linux
- python 2.3.4
- pysqlite 2.0.4 with sqlite 3.2.7.
With older versions of pysqlite, I got simply dead-locks, but this was not a good thing either…
Here's the stack trace:
dserver.bct.de - - [04/Oct/2005 15:59:05] "GET /stable-bct-trac/login HTTP/1.1" 401 - [New Thread 557091 (LWP 12565)] Trac[session] DEBUG: Promoting anonymous session 90d7a758b262bd4e88eced41 to authenticated session for user cboos Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 557091 (LWP 12565)] 0x40582056 in pysqlite_sleep (seconds=-21474836.48) at src/util.c:37 37 Py_DECREF(ret); (gdb) (gdb) (gdb) bt #0 0x40582056 in pysqlite_sleep (seconds=-21474836.48) at src/util.c:37 #1 0x4058222d in _sqlite_step_with_busyhandler (statement=0x8550dc8, _connection=0x4084e988) at src/util.c:86 #2 0x4057d940 in connection_commit (self=0x4084e988, args=0x0) at src/connection.c:230 #3 0x080c0c42 in call_function (pp_stack=0x45363450, oparg=0) at Python/ceval.c:3425 #4 0x080bd5d4 in eval_frame (f=0x844ee8c) at Python/ceval.c:2116 #5 0x080c102c in fast_function (func=0x40839e44, pp_stack=0x453635d0, n=2, na=2, nk=0) at Python/ceval.c:3518 #6 0x080c0e46 in call_function (pp_stack=0x453635d0, oparg=1) at Python/ceval.c:3458 #7 0x080bd5d4 in eval_frame (f=0x84b675c) at Python/ceval.c:2116 ...
pyframe points to _do_login
, in source:trunk/trac/web/auth.py@2326#L119
Attachments (2)
Change History (17)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
I disabled 2.0.4 and reinstalled pysqlite-1.1.6: Now I can still trigger the database lock by doing the operations described above but the database recovers from that state for subsequent queries, whereas with 2.0.4, the database remains locked forever.
comment:3 by , 19 years ago
Description: | modified (diff) |
---|---|
Owner: | changed from | to
Status: | new → assigned |
I just filled pysqlite:ticket:124, which contains some additional explanations about the issue: in short, we have to use 10 and not 10000 for the timeout value when using pysqlite2:
-
db.py
254 254 'directory it is located in.' \ 255 255 % (getuser(), path) 256 256 257 timeout = int(params.get('timeout', 10000))258 257 if have_pysqlite == 2: 258 timeout = int(params.get('timeout', 10)) # 10 seconds 259 259 # Convert unicode to UTF-8 bytestrings. This is case-sensitive, so 260 260 # we need two converters 261 261 sqlite.register_converter('text', str) … … 264 264 cnx = sqlite.connect(path, detect_types=sqlite.PARSE_DECLTYPES, 265 265 check_same_thread=False, timeout=timeout) 266 266 else: 267 timeout = int(params.get('timeout', 10000)) # 10 seconds 267 268 cnx = sqlite.connect(path, timeout=timeout) 268 269 ConnectionWrapper.__init__(self, cnx)
comment:4 by , 19 years ago
… not that simple: applying the previous change was not enough, because the database effectively remains locked when using 2.0.4.
I see in the log that I got this exception:
Exception pysqlite2.dbapi2.OperationalError: 'cannot rollback transaction - SQL statements in progress' in <bound method PooledConnection.__del__ of <trac.db.PooledConnection object at 0x40cf2814>> ignored
which I can relate to the SQLiteConnection object which was doing
the COMMIT
.
So it appears that the connection object doesn't recover cleanly
after a commit which timed out.
comment:5 by , 19 years ago
I reverted to pysqlite-1.1.6, but I still manage to have persistent database
locks, as soon as Trac needs to COMMIT
anything:
Traceback (most recent call last): File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/web/standalone.py", line 235, in _do_trac_req dispatch_request(path_info, req, env) File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/web/main.py", line 139, in dispatch_request dispatcher.dispatch(req) File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/web/main.py", line 107, in dispatch resp = chosen_handler.process_request(req) File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/web_ui/browser.py", line 122, in process_request self._render_directory(req, repos, node, rev) File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/web_ui/browser.py", line 154, in _render_directory changes = get_changes(self.env, repos, [i['rev'] for i in info]) File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/web_ui/util.py", line 30, in get_changes changeset = repos.get_changeset(rev) File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 42, in get_changeset self.sync() File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 81, in sync self.db.commit() File "/opt/trac/stable-bct-trac/lib/python2.3/site-packages/trac/db.py", line 131, in commit self.cnx.commit() File "/usr/lib/python2.3/site-packages/sqlite/main.py", line 539, in commit self.db.execute("COMMIT") OperationalError: database is locked
Read access is OK, which is a little bit better than with 2.0.4 …
comment:6 by , 19 years ago
Isn't the problems occurring because we lend ourselves to database locks?
- request A - thread A : involves a repo sync.
This will start aCOMMIT
, theCOMMIT
will be "in progress" for a while - shortly after: request B - thread B : only needs to get access to
the DB for a
SELECT
. But there's aCOMMIT
in progress, so it will fail, database lock for B. - since B initiated a query, the commit in A will also fail! Database lock for A too.
Step 3. is wrong, IMO, A shouldn't fail.
At this point, I'm not sure who's the culprit: SQLite, pysqlite,
or the fact that we allow B even if there's a COMMIT
in progress.
comment:7 by , 19 years ago
Description: | modified (diff) |
---|
Found some relevant discussion about the problem:
- The SQLite API for sqlite3_busy_handler describes a similar scenario
In particular:
Consider a scenario where one process is holding a read lock that it is trying to promote to a reserved lock and a second process is holding a reserved lock that it is trying to promote to an exclusive lock. The first process cannot proceed because it is blocked by the second and the second process cannot proceed because it is blocked by the first. If both processes invoke the busy handlers, neither will make any progress.
comment:9 by , 19 years ago
I might have a fix for the issue: attachment:trac_no_writer_contention.patch
This uses the workaround described in #py126.
The sqlite binary has to be reconfigured with the
--enable-threadsafe
flag and recompiled, because
this flag is turned off by default (on Linux) and
the binaries used until now by Trac are most likely
not thread-safe (see #2196).
Therefore, the above patch contains also cmlenz's patch for #2196, which is a prerequisite for being able to run Trac with a threadsafe sqlite.
The patches to pysqlite that can be found on #py126 are not absolutely required, but they should help.
Some caveats:
- I haven't tested that on Windows yet
- I've got segfaults due to a second
__del__
call on a Cursor object (!?). This is happening on SuSE 9.0, with the original python (sys.version == "2.3+ (#1, Sep 23 2003, 23:07:16) \n[GCC 3.3.1 (SuSE Linux)]"
). This appears to be an early 2.3 release (2.3.1, from the date information). - The Python 2.3.4 that I compiled myself seems to be immune to the problem, so far. The difference might be due to the changes introduced in Python 2.3.3c1, where 2 critical bug fixes involving weakrefs and garbage collection have been fixed.
by , 19 years ago
Attachment: | trac_no_writer_contention.2.patch added |
---|
Second iteration of the patch, integrating cmlenz feedback, previous patch survived 1 day of intensive use, let's see how it goes with this one :)
comment:10 by , 19 years ago
Also, I tested the patch on Windows, it works well too, but it's so fast it's not easy to stress test :)
We would need some kind of web test automation there…
by , 19 years ago
Attachment: | trac_no_writer_contention.patch added |
---|
Minor cleanup to the previous patch
comment:11 by , 19 years ago
I've attached another revision of the patch. Nothing major, but this avoids the duplicate handling of OperationError
s in the PyFormatCursor
, etc.
comment:12 by , 19 years ago
Doh, “replace existing attachment” hit me here :-/ Should probably not be checked by default.
comment:13 by , 19 years ago
Cc: | added |
---|
comment:14 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
OK, this week not a single database lock and not any other problem detected, while testing the patches in our "production" environment.
Therefore I checked in the patch (the latest iteration from cmlenz) in r2355.
comment:15 by , 19 years ago
Keywords: | crash added |
---|
Ok, now I manage to reproduce the crash systematically:
Now, looking more closely to the pysqlite2 code within gdb, I located the cause of the crash itself:
util.c
) {… now I'm back with my database locks :)
Ok, it looks like a commit is not "atomic" at all, and with our database pool, we reuse database connections which are in a commit in progress, hence the deadlock.