Ticket #185 (closed defect: fixed)

Opened 5 years ago

Last modified 5 years ago

errors which occur in SQLITE_SCHEMA retry of execute are dropped

Reported by: glyph Owned by: gh
Priority: high Milestone:
Component: implementation Version:
Severity: critical Keywords:
Cc:

Description

I'm not exactly sure what the root cause here is, but I am pretty sure the error-handling logic in _pysqlite_query_execute is broken.

We discovered an issue with Axiom upon upgrading to pysqlite 2.1 (and greater) and the issue still exists at SVN trunk@HEAD. See http://divmod.org/trac/ticket/1717 - there's a patch attached there, but it's not a serious fix, I'm not quite sure how to write a test or a fix for this particular bug.

I originally thought it had something to do with the statement cache or the transaction handling, but later realized that it was more likely the lack of any error handling on that second _sqlite_step_with_busyhandler call.

In the right situation, this bug can cause data corruption, so I'm marking it as high priority, critical severity.

Attachments

retry-loop.diff (1.5 KB) - added by glyph 5 years ago.
A solution which puts the SQLITE_SCHEMA retry into a loop, so that error handling is consistent

Change History

Changed 5 years ago by glyph

A solution which puts the SQLITE_SCHEMA retry into a loop, so that error handling is consistent

comment:1 Changed 5 years ago by glyph

I've attached a possible fix for this issue which makes the re-attempt to execute the statement happen in a loop, which repeats until it gets a run that does not error out with SQLITE_SCHEMA.

Although I think I've fixed this issue, I can't construct a simple test which quickly executes the offending lines. However, the stress-test which we have been running that typically triggers this after 4 or 5 runs ran successfully for more than 1000 runs.

I tried to construct the simplest fix I could think of, so I didn't submit a patch that fixed more issues, but I should mention that I think this particular function is a rat's nest and should be both broken down into smaller units and implemented as a convenience layer in Python rather than in the extension module.

comment:2 Changed 5 years ago by glyph

GH, will you have the opportunity to include this fix, or one like it, in the next release of pysqlite? I'd really like it if Axiom could depend on the bundled sqlite bindings in Python 2.5.

comment:3 Changed 5 years ago by gerhard.haering

I'll try to look into this issue soon, and if I find a fix, I'll of course put it into the next pysqlite release, and make sure it also goes into Python 2.5.1.

Do you think that maybe #170 is related? It kinda smells like that for me. This even could lead to crashes, I managed to make it crash once with small viariations to timing, but unfortunately didn't have core dumps on at the time.

comment:4 Changed 5 years ago by glyph

It definitely looks like it to me. In fact, given the description on the ticket, this looks like a duplicate report.

Without the patch I've attached, the test-case on #170 fails instantly. After applying the patch, the test case on #170 does not fail for me after 30 minutes of running.

comment:5 Changed 5 years ago by gh

  • Status changed from new to closed
  • Resolution set to fixed

Glyph, after reviewing I applied your patch. I agree with you that the function in case is rather complex, and I will perhaps look into simplifiying it and splitting it into multiple functions, but going the way of low-level in C and a higher-level layer in Python is not an option for pysqlite. I did that in the first generation of pysqlite and it was an explicit decision to go the C-only way :-)

Thanks again for analyzing this so thouroughly and providing the patch.

comment:6 Changed 5 years ago by glyph

You're welcome. Thank you for applying it promptly.

Forget that I said the separate layer should be in Python - my main point is that the separate layer should be separate :). In Axiom, I want to interact with SQLite as directly as possible, without any intermediary behavior - especially transaction management, which Axiom has a lot more information about than it is possible for PySQLite to.

I would like to talk more about ideas for improving the architecture here, and I'm very curious about why you'd decide to put so much logic into C, but something tells me ticket comments aren't the right place to do it. Do you already have the rationale documented somewhere that I could read? And what's a more appropriate forum to continue this discussion?

comment:7 Changed 5 years ago by gh

As for a better place to discuss this, I'd suggest the pysqlite mailing list instead. It's mostly used for users, but there's no reason why one couldn't discuss more technical things there :-)

I will make a post there in a few minues and we can discuss things there if you like.

Note: See TracTickets for help on using tickets.