Context Navigation

Modify ↓

#11378 new defect

SQLite backend does not use auto-incrementing primary keys

Reported by:	ejucovy@…	Owned by:
Priority:	normal	Milestone:	next-major-releases
Component:	database backend	Version:
Severity:	normal	Keywords:
Cc:	Ryan J Ollos	Branch:
Release Notes:
API Changes:
Internal Changes:

Description

When using the SQLite backend, Trac does not actually create auto-incrementing columns when generating CREATE TABLE statements for Table objects that contain a Column(..., auto_increment=True).

The relevant code is in the sqlite_backend's _to_sql method:

        if column.auto_increment:
            ctype = "integer PRIMARY KEY"

According to the SQLite documentation, this will assign a unique value to each existing row — but a newly created row might reuse a value that was previously used by a deleted row. In order to prevent reuse of deleted values, the column must instead be created with type "integer PRIMARY KEY AUTOINCREMENT."

Specifically, the behavior only differs in cases when a deleted row happens to have the largest existing value for its primary key. To see this behavior in Trac:

Start up an environment using the SQLite backend
Enable the tracopt.ticket.deleter.ticketdeleter component
Log in as an admin
Create a new ticket (#1)
Create a second new ticket (#2)
Delete ticket #2
Create a new ticket

The newly created ticket will be #2, instead of #3. If you had instead deleted ticket #1 in step 6 above, the newly created ticket in step 7 would be #3.

I believe this behavior only occurs with the SQLite backend.

This does not only affect tickets, and does not only affect Trac core — any plugin that defines a database table with a Column(..., auto_increment=True) is also affected. For example I found out about this while looking into a TracPastePlugin bug, th:#4265, which exhibits the same behavior — the ID of a deleted paste will be reused by a new paste, if and only if the deleted paste is the most recently created, and the site is using the sqlite backend.

The SQLite documentation does explain that using the AUTOINCREMENT keyword means slower INSERTs:

The behavior implemented by the AUTOINCREMENT keyword is subtly different from the default behavior. With AUTOINCREMENT, rows with automatically selected ROWIDs are guaranteed to have ROWIDs that have never been used before by the same table in the same database. And the automatically generated ROWIDs are guaranteed to be monotonically increasing. These are important properties in certain applications. But if your application does not need these properties, you should probably stay with the default behavior since the use of AUTOINCREMENT requires additional work to be done as each row is inserted and thus causes INSERTs to run a little slower.

Was the omission of AUTOINCREMENT a deliberate decision, or is this a bug?

Attachments (1)

ticket-11378-with-test.diff (1.4 KB ) - added by ethan.jucovy@… 10 years ago.

Download all attachments as: .zip

Change History (9)

in reply to: description comment:1 by Ryan J Ollos, 10 years ago

Cc:	Ryan J Ollos added
Milestone:	→ next-dev-1.1.x

Replying to ejucovy@…:

I believe this behavior only occurs with the SQLite backend.

I can confirm that the behavior exists with SQLite, but does not exist with PostgreSQL. I think we should fix it, among other reasons in order to make the behavior consistent across database back-ends. We could do some profiling on the inserts to see if there is a significant penalty.

Was the omission of AUTOINCREMENT a deliberate decision, or is this a bug?

I have no idea, and I haven't found any relevant ticket or changeset comments. I'll leave the ticket open for a while in hopes that the more experienced Trac devs will comment on it. Hopefully we can fix the issue for 1.1.3.

by ethan.jucovy@…, 10 years ago

Attachment:	ticket-11378-with-test.diff added

follow-up: 3 comment:2 by ethan.jucovy@…, 10 years ago

If this is determined to be a bug, attachment:ticket-11378-with-test.diff provides a (currently failing) regression test and a fix. But I have a few questions on the proposed solution and on the tests:

Should the AUTOINCREMENT also be applied to integer primary keys if the auto_increment keyword is not specified? (i.e. the elif len(table.key) == 1 and column.name in table.key case) I think not, but I thought I should flag this.

I wanted to write a test that sets up two or three different environments (sqlite, postgres, and maybe mysql) and repeats the same actions and assertions in all three, to demonstrate the current inconsistency across backends. Is there an existing test suite and mechanism I could use for this?

It might also be worth writing a more abstract test that demonstrates this behavior in an arbitrary table defined just for the test, instead of / in addition to testing it on the ticket table.

Of course a migration/upgrade needs to be written; I haven't tried tackling that (yet). Looking around at old upgrades, I think this involves copying all ticket data into a temporary table, dropping the ticket table, and then recreating it; is that correct? (This would also need to be done for the report table; I think those are the only tables in the Trac core schema that declare auto_increment columns.) And I guess this should only happen if the environment is using the sqlite backend.

Should the upgrade also attempt to detect and upgrade any plugin-defined tables that use the auto_increment keyword, or should that be left up to each plugin's maintainers? I can't think of any good way to find those tables in general from within Trac core code, but on the flip side I can't think of any good way for a plugin to recognize whether the Trac codebase includes this fix and the installation is therefore ready for a migration to be applied.

in reply to: 2 comment:3 by Ryan J Ollos, 10 years ago

Replying to ethan.jucovy@…:

If this is determined to be a bug, attachment:ticket-11378-with-test.diff provides a (currently failing) regression test and a fix. But I have a few questions on the proposed solution and on the tests:

Should the AUTOINCREMENT also be applied to integer primary keys if the auto_increment keyword is not specified? (i.e. the elif len(table.key) == 1 and column.name in table.key case) I think not, but I thought I should flag this.

Your implementation looks correct to me.

I wanted to write a test that sets up two or three different environments (sqlite, postgres, and maybe mysql) and repeats the same actions and assertions in all three, to demonstrate the current inconsistency across backends. Is there an existing test suite and mechanism I could use for this?

I believe the recommended approach is to setup environments with different databases and run the tests in those environments with an environment variable set: TracDev/UnitTests#TestDatabase.

It might also be worth writing a more abstract test that demonstrates this behavior in an arbitrary table defined just for the test, instead of / in addition to testing it on the ticket table.

I've felt similarly when writing tests. When it comes down to it, I wouldn't criticize for taking the more practical approach of using an existing model. Defining a simple class and adding a table for an abstract test case is probably doable as well though.

Of course a migration/upgrade needs to be written; I haven't tried tackling that (yet). Looking around at old upgrades, I think this involves copying all ticket data into a temporary table, dropping the ticket table, and then recreating it; is that correct? (This would also need to be done for the report table; I think those are the only tables in the Trac core schema that declare auto_increment columns.) And I guess this should only happen if the environment is using the sqlite backend.

That all sounds correct to me.

Should the upgrade also attempt to detect and upgrade any plugin-defined tables that use the auto_increment keyword, or should that be left up to each plugin's maintainers? I can't think of any good way to find those tables in general from within Trac core code, but on the flip side I can't think of any good way for a plugin to recognize whether the Trac codebase includes this fix and the installation is therefore ready for a migration to be applied.

My feeling is that the best approach is probably to push the change in a major version, describing the changes in the API change notes, and let plugin maintainers adapt to the change.

follow-ups: 5 6 comment:4 by Jun Omae, 10 years ago

Propose a wontfix.

MySQL's InnoDB has the same behavior. See http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html.

InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.

If you absolutely want to prevent reusing id, I don't that auto-increment should be used. Instead, should use uuid or your own sequence implementation.

mysql> CREATE TABLE testtab (id INTEGER PRIMARY KEY NOT NULL auto_increment, foo TEXT) ENGINE=InnoDB;
Query OK, 0 rows affected (0.09 sec)

mysql> INSERT INTO testtab (foo) VALUES ('row1');
Query OK, 1 row affected (0.07 sec)

mysql> INSERT INTO testtab (foo) VALUES ('row2');
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM testtab;
+----+------+
| id | foo  |
+----+------+
|  1 | row1 |
|  2 | row2 |
+----+------+
2 rows in set (0.00 sec)

mysql> DELETE FROM testtab WHERE id=2;
Query OK, 1 row affected (0.03 sec)

mysql> \! sudo /sbin/service mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]
mysql> INSERT INTO testtab (foo) VALUES ('row3');
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    2
Current database: testdb

Query OK, 1 row affected (0.04 sec)

mysql> SELECT * FROM testtab;
+----+------+
| id | foo  |
+----+------+
|  1 | row1 |
|  2 | row3 |
+----+------+
2 rows in set (0.00 sec)

Version 0, edited 10 years ago by Jun Omae (next)

in reply to: 4 comment:5 by ethan.jucovy@…, 10 years ago

Replying to jomae:

Propose a wontfix.

MySQL's InnoDB has the same behavior. See http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html.

InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.

Wow — I didn't know that. But note that this is not quite the same behavior that the sqlite backend currently has; it's actually a far more unlikely corner case. For mysql, auto-incrementing IDs will only be recycled if the database server restarts before the value of MAX(col) changes. In other words, in order for an ID to be reused, the sequence of events would have to be:

First ticket is created
Second ticket is created
Second ticket is deleted
Server is restarted
Third ticket is created

If a third ticket is created before the server is restarted, then the ID of the deleted ticket will remain reserved. In other words, the steps described in comment:description would not be sufficient to reproduce the surprising behavior on a mysql backend; a database server restart would have to happen at a very specific point in the sequence in order to reproduce the behavior.

Since it depends on the timing of a database server restart, this seems like a much more infrequent scenario than the sqlite backend's — even more so because a properly configured mysql server probably shouldn't need to be restarted very frequently anyway.

So in my opinion it would still be worth it for Trac to make the guarantee of unique IDs over time, with this one very specific caveat: "if you're using the MySQL/MariaDB backend with InnoDB engine, then auto-incrementing IDs may be reused if your database server restarts after the highest-valued row for a table is deleted and before a new row is inserted." I think the majority of users would never run into this problem, and the more common cases would work consistently and with less surprising behavior.

On the other hand, tracking down a bug related to that mysql edge case would be a nightmare … but that's already the case, and documenting it prominently as the lone exception to a rule might make it a little more visible. :-)

in reply to: 4 comment:6 by Ryan J Ollos, 10 years ago

Replying to jomae:

Propose a wontfix.

MySQL's InnoDB has the same behavior. See http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html.

So MySQL Innodb has similar behavior to SQLite, but both differ from the behavior of PostgreSQL, therefore we have an inconsistency that is undesirable.

My primary concern is to fix the following behavior that I've seen on SQLite:

Create a ticket - id is 1. Email and RSS notifications will be received that point to ticket 1.
Delete ticket with id 1.
Create ticket - id is 1.

The notifications sent in step (1) that have links intended to point to the original ticket #1 will point to a different ticket that has reused the id. It would be better if the links directed to a page that says "Ticket #1 no longer exists" and the ticket created in step (3) has the id 2.

Rather than a wontfix because the behavior of MySQL has some similarities to SQLite, I propose that we should make the behavior of ticket ids and report ids consistent for all databases. Specifically, the behavior I would like to see fixed is that ticket ids and report ids never be reused regardless of the database. If a fix for the behavior comes about through a schema change rather than a change in the implementation of the database API, then we could make a similar change to fix the behavior for the th:TracPastePlugin.

Is there a better way to implemented the auto-increment and never reuse behavior for all database backends?

comment:7 by Ryan J Ollos, 9 years ago

Milestone:	next-dev-1.1.x → next-major-releases

Retargetting tickets to narrow focus for milestone:1.2. Please move the ticket back to milestone:next-dev-1.1.x if you intend to resolve it by milestone:1.2.

comment:8 by joe@…, 4 years ago

I just stumbled on this issue while playing with Flask and SQLite. I am surprised to see the bug is still open. Has anyone figured out an easy to digest workaround yet? Thanks.

Modify Ticket

Change Properties

Summary:
Description:	When using the SQLite backend, Trac does not actually create auto-incrementing columns when generating `CREATE TABLE` statements for Table objects that contain a `Column(..., auto_increment=True)`. The [http://trac.edgewall.org/browser/trunk/trac/db/sqlite_backend.py#L118 relevant code is in the sqlite_backend's _to_sql method]: {{{ #!python if column.auto_increment: ctype = "integer PRIMARY KEY" }}} According to [http://www.sqlite.org/autoinc.html the SQLite documentation], this will assign a unique value to each existing row -- but a newly created row might reuse a value that was previously used by a deleted row. In order to prevent reuse of deleted values, the column must instead be created with type "integer PRIMARY KEY AUTOINCREMENT." Specifically, the behavior only differs in cases when a deleted row happens to have the largest existing value for its primary key. To see this behavior in Trac: 1. Start up an environment using the SQLite backend 2. Enable the `tracopt.ticket.deleter.ticketdeleter` component 3. Log in as an admin 4. Create a new ticket (`#1`) 5. Create a second new ticket (`#2`) 6. Delete ticket `#2` 7. Create a new ticket The newly created ticket will be `#2`, instead of `#3`. If you had instead deleted ticket `#1` in step 6 above, the newly created ticket in step 7 would be `#3`. I believe this behavior only occurs with the SQLite backend. This does not only affect tickets, and does not only affect Trac core -- any plugin that defines a database table with a `Column(..., auto_increment=True)` is also affected. For example I found out about this while looking into a !TracPastePlugin bug, th:#4265, which exhibits the same behavior -- the ID of a deleted paste will be reused by a new paste, if and only if the deleted paste is the most recently created, and the site is using the sqlite backend. The SQLite documentation does explain that using the AUTOINCREMENT keyword means slower INSERTs: > The behavior implemented by the AUTOINCREMENT keyword is subtly different from > the default behavior. With AUTOINCREMENT, rows with automatically selected ROWIDs > are guaranteed to have ROWIDs that have never been used before by the same table in > the same database. And the automatically generated ROWIDs are guaranteed to be > monotonically increasing. These are important properties in certain applications. But if > your application does not need these properties, you should probably stay with the > default behavior since the use of AUTOINCREMENT requires additional work to be done > as each row is inserted and thus causes INSERTs to run a little slower. Was the omission of AUTOINCREMENT a deliberate decision, or is this a bug? You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:
API Changes:
Internal Changes:

Action

leave as new The ticket will remain with no owner.

unassign The ticket will be disowned.

resolve as The resolution will be set. Next status will be 'closed'.

assign The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: