Edgewall Software
Modify

Opened 9 years ago

Last modified 2 years ago

#8067 new defect

Stopped sync in 0.11.2.1

Reported by: Christian Boos Owned by: Christian Boos
Priority: normal Milestone: not applicable
Component: version control Version: 0.11-stable
Severity: major Keywords: cache, postgresql, mysql, pysqlite
Cc: eric.petit@…, techtonik@…, dale.miller@…
Release Notes:
API Changes:

Description

As reported by Anatoly on googlegroups:trac-dev:401827e96364f901, there can be situations where versioncontrol cache tables in the database are left in an inconsistent state, from which it is impossible to resume synchronization.

… It turned out to be locked cache after timeout. At first I thought about a race condition, but it was false. The cache can easily be jammed with a long resync from repository on a busy server. It took about 40 seconds to sync 200 revisions before timeout occurred. Take a look at cache.py sync(): On line 159 we check if a revision is being entered in DB by looking if next revision number is already present in revisions table. That means another thread is made an INSERT in line 184. The cache is now locked. To unlock it, another thread should update youngest revision at line 221.
So, the problem is when a timeout occurs during execution of 202:221 - there is an update in node_change table that can potentially take a long time to experience a timeout (esp. when server is under a heavy load). There are also cset.get_changes() and self.repos.next_rev(next_youngest) calls that can contribute to the issue. I haven't found other way to recover than to manually patch DB.

Also, with a synchronization attempt on every request, this can also lead to a serious slowdown (#7490).

Attachments (1)

t8067-stopped-sync-r7921.diff (1.0 KB ) - added by Christian Boos 9 years ago.
Take care to do a rollback in case something went wrong.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 9 years ago by Eric Petit <eric.petit@…>

Cc: eric.petit@… added

Changed 9 years ago by Christian Boos

Take care to do a rollback in case something went wrong.

comment:2 Changed 9 years ago by Christian Boos

The t8067-stopped-sync-r7921.diff change should be more robust in case of an exception raised within the 1.2 - 1.5 part of the resync.

If the finally clause won't execute because the process is killed, then I think it's safe to assume there will also be no follow-up commit of the changes. Hopefully this doesn't leave a stale lock ("operation in progress" kind).

comment:3 in reply to:  2 Changed 9 years ago by Christian Boos

Hopefully this doesn't leave a stale lock ("operation in progress" kind).

That doesn't seem to be case on Windows Vista, Python 2.5.4 and its default sqlite3. tracd was killed from the task manager while it was populating the node_change for a big changeset, and a follow-up run of tracd managed to resume sync.

It would be nice to get confirmation that this works also for other platforms/databases.

comment:4 Changed 9 years ago by Christian Boos

I should note that I have actually not been able to reproduce the original problem. Also, in normal circumstances, if an exception occurred during the sync, then the connection wrapper would be garbage collected and a rollback should happen anyway before the actual connection could be reused through the pool. Weird.

comment:5 Changed 9 years ago by Christian Boos

See also #3444, another weird commit that happened when it shouldn't have, with SQLite. Also, Anatoly mentioned that he was using MySQL, so maybe there's a "transaction-less" mode there which could explain the behavior in his case.

comment:6 Changed 9 years ago by anonymous

My MySQL uses MyISAM tables. I've rechecked documentation and it appeared that they lack multi-statement transaction support. See the comparison at http://dev.mysql.com/tech-resources/articles/storage-engine/part_3.html

Perhaps MySQL pages in Trac installation manual should enforce users to create InnoDB tables instead of MyISAM. Maybe it is possible to add a warning in logs or convert tables on upgrade automatically?

comment:7 Changed 9 years ago by anatoly techtonik <techtonik@…>

It was me. SPAM filter is a real annoyance.

comment:8 Changed 9 years ago by anatoly techtonik <techtonik@…>

Cc: techtonik@… added

comment:9 Changed 9 years ago by Christian Boos

Thanks for the precisions Anatoly. The behavior of MyISAM tables would indeed explain what happened in your case. I agree that we should make the MySQL requirements clear about that and add a run-time check as you suggested in #8089 (for the automatic upgrade, I'm less convinced).

Now, the big question is: could this possibly happen with other backends as well?

One possible scenario with PySqlite would be that "someone" (i.e. a plugin) sets the db connection in autocommit isolation level, then that connection is pushed back in the pool and later reused for sync().

For PostgreSQL, the sync() algorithm introduced in #4586 was trying to minimize the risks of inconsistencies, which used to happen in the 0.10.x days (#4043). In particular (ticket:4586#comment:8, point 2) there was the assumption that concurrent insert attempts using the same primary key will result in only one attempt to succeed, the other to fail. Though documentation on this is scarce, I've found the following precision (Tom Lane's paper http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/concurrency.pdf, slide 31). Quoting:

If two concurrent transactions try to insert rows having the same key value, then the second one will block until the first one finishes. If the first transaction commits, the second one must abort because of the uniqueness constraint; but if the first one aborts the second one can proceed.

So while it's not exactly how I supposed it worked (no immediate exception when inserting a duplicate key), in practice it's the same: the exception will be deferred until the other attempts are committed, or execution is resumed if other attempts have aborted.

So in theory we should be safe on the PostgreSQL side, even though I have the feeling that we traded our weird update problem between the revision and node_change tables for a similar one between the revision and the system one…

comment:10 Changed 9 years ago by Christian Boos

To be clear, the "weird" problem common to #4043 and this ticket is the following.

Let's say we start with the value 122 in both table a and b (which for this example are simple tables like (CREATE TABLE a (i INT PRIMARY KEY)).

We then have two (or more) concurrent executions of the following code:

BEGIN;
 INSERT INTO a VALUES (123);
 INSERT INTO b VALUES (123);
COMMIT;

(and there's no other possible kind of update for the above tables)

Then how can we possibly end up with 123 into a, but still have the previous value 122 in b? i.e. how could only part of the transaction succeed?

This can be explained if there's actually a COMMIT happening behind scene after each INSERT (pysqlite's autocommit) or if the BEGIN/COMMIT are no-ops and each successful INSERT is actually persistent (MySQL MyISAM tables).

But there was no real explanation for when this happened in a PostgreSQL database (see in particular ticket:4043#comment:9 and 10), where transactions are supposed to be reliable ;-)

comment:11 Changed 9 years ago by Jonas Borgström

So are we ready to close thi ticket now? Or is there anything left to investigate?

At least until someone manages to reproduce something similar to ticket:4043#comment:9

comment:12 Changed 9 years ago by Christian Boos

Keywords: postgresql mysql pysqlite added
Milestone: 0.11.4not applicable
Priority: highnormal

The only satisfying answer for this issue at this time is for MySQL - ISAM. Whether we warn or not when such engine is used should be discussed in #8089.

For the other backends, it might be a problem with the db or the bindings, but as we don't know yet I'm a bit reluctant to close the ticket, so I think we'll leave it opened as a 3rd party issue.

comment:13 Changed 9 years ago by Dale Miller <dale.miller@…>

Cc: dale.miller@… added

comment:14 Changed 5 years ago by anatoly techtonik <techtonik@…>

Got the same problem again. Needed to resync 5000+ revisions, request got killed during the process.

comment:15 Changed 5 years ago by anatoly techtonik <techtonik@…>

I see that my MySQL tables are again in MyISAM. Trac upgrade procedure should execute SET storage_engine=INNODB; or set InnoDB tables explicitly.

comment:16 Changed 5 years ago by anatoly techtonik <techtonik@…>

For the reference, here is how to manually unwedge the sync:

$ tail log/trac.log
2012-12-23 16:29:43,544 Trac[cache] INFO: repos rev [9238] != cached rev [5770]
2012-12-23 16:29:43,566 Trac[api] INFO: Synchronized '(default)' repository in 0.08 seconds
2012-12-23 16:29:46,320 Trac[cache] INFO: repos rev [9238] != cached rev [5770]
2012-12-23 16:29:46,323 Trac[api] INFO: Synchronized '(default)' repository in 0.02 seconds
$ mysql -h host -u user -p dbname
DELETE FROM `node_change` WHERE `rev` > 5750;
DELETE FROM `revision` WHERE `rev` > 5750;
UPDATE `system` SET `value` = 5750 WHERE `name` = 'youngest_rev' LIMIT 1;

comment:17 Changed 3 years ago by Ryan J Ollos

Keywords: cache postgresql mysql pysqlitecache, postgresql, mysql, pysqlite

comment:18 Changed 2 years ago by figaro

If the MyISAM storage engine is the root cause of the issue, then I suggest to look into the patch at #12097.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain Christian Boos.
The ticket will be disowned.
as The resolution will be set.
The owner will be changed from Christian Boos to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences .

 
Note: See TracTickets for help on using tickets.