Opened 16 years ago
Closed 4 years ago
#8067 closed defect (duplicate)
Stopped sync in 0.11.2.1
Reported by: | Christian Boos | Owned by: | Christian Boos |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | version control | Version: | 0.11-stable |
Severity: | major | Keywords: | cache, postgresql, mysql, pysqlite |
Cc: | eric.petit@…, techtonik@…, dale.miller@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
As reported by Anatoly on googlegroups:trac-dev:401827e96364f901, there can be situations where versioncontrol cache tables in the database are left in an inconsistent state, from which it is impossible to resume synchronization.
… It turned out to be locked cache after timeout. At first I thought about a race condition, but it was false. The cache can easily be jammed with a long resync from repository on a busy server. It took about 40 seconds to sync 200 revisions before timeout occurred. Take a look at cache.py sync(): On line 159 we check if a revision is being entered in DB by looking if next revision number is already present in
revisions
table. That means another thread is made an INSERT in line 184. The cache is now locked. To unlock it, another thread should update youngest revision at line 221.
So, the problem is when a timeout occurs during execution of 202:221 - there is an update innode_change
table that can potentially take a long time to experience a timeout (esp. when server is under a heavy load). There are also cset.get_changes() and self.repos.next_rev(next_youngest) calls that can contribute to the issue. I haven't found other way to recover than to manually patch DB.
Also, with a synchronization attempt on every request, this can also lead to a serious slowdown (#7490).
Attachments (1)
Change History (20)
comment:1 by , 16 years ago
Cc: | added |
---|
by , 16 years ago
Attachment: | t8067-stopped-sync-r7921.diff added |
---|
follow-up: 3 comment:2 by , 16 years ago
The t8067-stopped-sync-r7921.diff change should be more robust in case of an exception raised within the 1.2 - 1.5 part of the resync.
If the finally clause won't execute because the process is killed, then I think it's safe to assume there will also be no follow-up commit of the changes. Hopefully this doesn't leave a stale lock ("operation in progress" kind).
comment:3 by , 16 years ago
Hopefully this doesn't leave a stale lock ("operation in progress" kind).
That doesn't seem to be case on Windows Vista, Python 2.5.4 and its default sqlite3. tracd was killed from the task manager while it was populating the node_change
for a big changeset, and a follow-up run of tracd managed to resume sync.
It would be nice to get confirmation that this works also for other platforms/databases.
comment:4 by , 16 years ago
I should note that I have actually not been able to reproduce the original problem. Also, in normal circumstances, if an exception occurred during the sync, then the connection wrapper would be garbage collected and a rollback should happen anyway before the actual connection could be reused through the pool. Weird.
comment:5 by , 16 years ago
See also #3444, another weird commit that happened when it shouldn't have, with SQLite. Also, Anatoly mentioned that he was using MySQL, so maybe there's a "transaction-less" mode there which could explain the behavior in his case.
comment:6 by , 16 years ago
My MySQL uses MyISAM tables. I've rechecked documentation and it appeared that they lack multi-statement transaction support. See the comparison at http://dev.mysql.com/tech-resources/articles/storage-engine/part_3.html
Perhaps MySQL pages in Trac installation manual should enforce users to create InnoDB tables instead of MyISAM. Maybe it is possible to add a warning in logs or convert tables on upgrade automatically?
comment:8 by , 16 years ago
Cc: | added |
---|
comment:9 by , 16 years ago
Thanks for the precisions Anatoly. The behavior of MyISAM tables would indeed explain what happened in your case. I agree that we should make the MySQL requirements clear about that and add a run-time check as you suggested in #8089 (for the automatic upgrade, I'm less convinced).
Now, the big question is: could this possibly happen with other backends as well?
One possible scenario with PySqlite would be that "someone" (i.e. a plugin) sets the db connection in autocommit isolation level, then that connection is pushed back in the pool and later reused for sync().
For PostgreSQL, the sync() algorithm introduced in #4586 was trying to minimize the risks of inconsistencies, which used to happen in the 0.10.x days (#4043). In particular (ticket:4586#comment:8, point 2) there was the assumption that concurrent insert attempts using the same primary key will result in only one attempt to succeed, the other to fail. Though documentation on this is scarce, I've found the following precision (Tom Lane's paper http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/concurrency.pdf, slide 31). Quoting:
If two concurrent transactions try to insert rows having the same key value, then the second one will block until the first one finishes. If the first transaction commits, the second one must abort because of the uniqueness constraint; but if the first one aborts the second one can proceed.
So while it's not exactly how I supposed it worked (no immediate exception when inserting a duplicate key), in practice it's the same: the exception will be deferred until the other attempts are committed, or execution is resumed if other attempts have aborted.
So in theory we should be safe on the PostgreSQL side, even though I have the feeling that we traded our weird update problem between the revision
and node_change
tables for a similar one between the revision
and the system
one…
comment:10 by , 16 years ago
To be clear, the "weird" problem common to #4043 and this ticket is the following.
Let's say we start with the value 122 in both table a
and b
(which for this example are simple tables like (CREATE TABLE a (i INT PRIMARY KEY)
).
We then have two (or more) concurrent executions of the following code:
BEGIN; INSERT INTO a VALUES (123); INSERT INTO b VALUES (123); COMMIT;
(and there's no other possible kind of update for the above tables)
Then how can we possibly end up with 123 into a
, but still have the previous value 122 in b
? i.e. how could only part of the transaction succeed?
This can be explained if there's actually a COMMIT happening behind scene after each INSERT (pysqlite's autocommit) or if the BEGIN/COMMIT are no-ops and each successful INSERT is actually persistent (MySQL MyISAM tables).
But there was no real explanation for when this happened in a PostgreSQL database (see in particular ticket:4043#comment:9 and 10), where transactions are supposed to be reliable ;-)
comment:11 by , 16 years ago
So are we ready to close thi ticket now? Or is there anything left to investigate?
At least until someone manages to reproduce something similar to ticket:4043#comment:9
follow-up: 19 comment:12 by , 16 years ago
Keywords: | postgresql mysql pysqlite added |
---|---|
Milestone: | 0.11.4 → not applicable |
Priority: | high → normal |
The only satisfying answer for this issue at this time is for MySQL - ISAM. Whether we warn or not when such engine is used should be discussed in #8089.
For the other backends, it might be a problem with the db or the bindings, but as we don't know yet I'm a bit reluctant to close the ticket, so I think we'll leave it opened as a 3rd party issue.
comment:13 by , 16 years ago
Cc: | added |
---|
comment:14 by , 12 years ago
Got the same problem again. Needed to resync 5000+ revisions, request got killed during the process.
comment:15 by , 12 years ago
I see that my MySQL tables are again in MyISAM. Trac upgrade procedure should execute SET storage_engine=INNODB;
or set InnoDB tables explicitly.
comment:16 by , 12 years ago
For the reference, here is how to manually unwedge the sync:
$ tail log/trac.log 2012-12-23 16:29:43,544 Trac[cache] INFO: repos rev [9238] != cached rev [5770] 2012-12-23 16:29:43,566 Trac[api] INFO: Synchronized '(default)' repository in 0.08 seconds 2012-12-23 16:29:46,320 Trac[cache] INFO: repos rev [9238] != cached rev [5770] 2012-12-23 16:29:46,323 Trac[api] INFO: Synchronized '(default)' repository in 0.02 seconds $ mysql -h host -u user -p dbname DELETE FROM `node_change` WHERE `rev` > 5750; DELETE FROM `revision` WHERE `rev` > 5750; UPDATE `system` SET `value` = 5750 WHERE `name` = 'youngest_rev' LIMIT 1;
comment:17 by , 10 years ago
Keywords: | cache postgresql mysql pysqlite → cache, postgresql, mysql, pysqlite |
---|
comment:18 by , 9 years ago
If the MyISAM storage engine is the root cause of the issue, then I suggest to look into the patch at #12097.
comment:19 by , 4 years ago
Milestone: | not applicable |
---|---|
Resolution: | → duplicate |
Status: | new → closed |
Replying to Christian Boos:
The only satisfying answer for this issue at this time is for MySQL - ISAM. Whether we warn or not when such engine is used should be discussed in #8089.
This was done in #10993, so closing the ticket.
We can make another issue if we find evidence of the behavior for other database types.
Take care to do a rollback in case something went wrong.