Edgewall Software
Modify

Opened 12 years ago

Closed 4 years ago

Last modified 3 years ago

#3676 closed defect (fixed)

MySQL: Primary keys are not well designed

Reported by: Martin Burger <mburger@…> Owned by: Jun Omae
Priority: normal Milestone: 1.1.4
Component: version control Version: 0.10b1
Severity: major Keywords: mysql utf8 primary key
Cc:
Release Notes:

Add an auto-increment primary key to node_change table and indices (repos, rev, path) and (repos, path, rev).

API Changes:

Description

The automated computation of primary key sizes in mysql_backend.py (compare #3673) is not well designed. For example: primary key of table node_change.

PRIMARY KEY  (`rev`(111),`path`(111),`change_type`(111)

In this way, all three keys have a maximal length of 111 characters. Now, we have a too strict constraint on the length of pathes in Subversion: A path cannot be longer than 111 characters. If a revision contains too long pathes, you will get an "1062 - Duplicate entry" error (the path is truncated).

The change_type is just one char long, and I have never seen a revision nummber that is 111 digits long. ;-)

In general, I think you should use ids instead of composed primary keys.

Attachments (3)

trac-mysql_backend.diff (1.1 KB ) - added by Dirk Datzert <dummy@…> 12 years ago.
My quick version of the fix
remove_node_change_pk-r5133.diff (2.9 KB ) - added by Christian Boos 12 years ago.
Simply remove the problematic index
node_change_index.txt (8.6 KB ) - added by Matthew Good 12 years ago.
comparison of EXPLAIN with and without key

Download all attachments as: .zip

Change History (34)

comment:1 Changed 12 years ago by Martin Burger <mburger@…>

Quick Fix

Just after running init via trac-admin execute SQL:

ALTER TABLE `node_change` DROP PRIMARY KEY ,
ADD PRIMARY KEY ( `rev` ( 39 ) , `path` ( 255 ) , `change_type` ( 39 ) ) ;

Now, path can contain 255 characters.

comment:2 Changed 12 years ago by Martin Burger <mburger@…>

Keywords: mysql utf8 primary key added

comment:3 Changed 12 years ago by Matthew Good

The rev column does need to be pretty long, since many patch-based version control systems (Darcs, Mercurial, etc.) use hashes as the unique identifier of a patch which can be quite long.

comment:4 Changed 12 years ago by Martin Burger <mburger@…>

Mmh, the MySQL developers classify that problem as a 'limitation', not as a bug.

So, the only solution may be another database schema? I know, in theory such a combined primary key (or multiple-column index in the term of MySQL) is fine, but in practice there may be other issues about that. However, AFAIK Oracle allows very long primary keys. But for the MySQL database provider (mysql_backend) this definitively poses a problem.

comment:5 Changed 12 years ago by Christian Boos

Milestone: 0.10.1
Owner: changed from Jonas Borgström to Christian Boos

If we have #3778 (conversion of PRIMARY KEY to a non-unique index), then we could also use partial columns for creating the index.

comment:6 Changed 12 years ago by Christian Boos

Severity: normalmajor

So I think I'm going to apply attachment:ticket:3778:node_change_upgrade-r3787.patch, even if this will imply a schema update for 0.10.1.

comment:7 Changed 12 years ago by Christian Boos

What we could perhaps do is a 'no-op' upgrade for the backends that don't need that change, and simply increment the database_schema number.

That way, the operation will be quick and painless for SQLite and PostgreSQL users, as there will be no need to resync afterwards.

comment:8 in reply to:  7 Changed 12 years ago by Jonas Borgström

Replying to cboos:

What we could perhaps do is a 'no-op' upgrade for the backends that don't need that change, and simply increment the database_schema number.

That way, the operation will be quick and painless for SQLite and PostgreSQL users, as there will be no need to resync afterwards.

Can this really be a no-op for the other db backends? I think we need to remove the primary key for all backends to avoid node_change having a different schema depending on if the env was created before or after this schema update.

On the other hand it sucks that we need to remove a perfectly functional primary key just because of this MySQL limitation. But mainting a backend-specific schema will probably be too much work.

comment:9 Changed 12 years ago by Matthew Good

Well, I think we could make more effective use of the primary key without dropping it entirely. Obviously the change_type and rev columns should use less bytes than the path, so we could either make the MySQL prefix length calculation better, or explicitly the prefix length in the schema definition.

We could also reduce the storage used by the rev column by changing the charset to latin1, which should be fine for storing the numeric or hexadecimal values used as version control identifiers. The hashes all seem to be based on SHA1 which can be represented as 40 hex digits. Darcs adds some other things to the hash like a timestamp bringing the size up to 64 digits. So, somewhere between 64 and 128 bytes should be reasonable.

So if we used 1 byte for change_type and 128 for rev we could increase the path prefix length to 290 which would be a big improvement.

comment:10 Changed 12 years ago by Matthew Good

Hrm, we could also get away without using the full length of the rev column in the key, since it's much less likely to have collisions if it's truncated than the path. So cutting the rev prefix to 39 would allow for almost the full length of a SHA1 hash, and allow 320 characters for the path key prefix.

Also if we could detect the database's character set we could use a longer prefix for the path on latin1 so that we used the full 1000 bytes for the key.

comment:11 Changed 12 years ago by Christian Boos

Well, no matter how you set the limit, you'll find users that have longer paths that you think is reasonable ;) So, to workaround this limitation of MySQL, the path can't be part of a fixed length primary key, no matter how you cut it.

OTOH, I don't see the reason why the other backends should be penalized from the deficiencies of MySQL in that respect. Having a pk on the path already helped to detect one subtle bug (#3778), so it's a good thing to have IMO.

My preferred solution would be that only the MySQL backend works around this limitation and uses an index (allowing duplicates) instead of a primary key here.

I don't really agree with jonas when he says that this approach would involve maintaining a backend-specific schema: the data model remains the same, it's just about an index, which is primarily here for performance reasons, the uniqueness constraint being only an (useful) side-effect, not something we rely on.

Anyway, when a table gets really modified, we usually blow away the table and recreate it, so the exact kind of index used won't matter.

comment:12 in reply to:  11 Changed 12 years ago by Jonas Borgström

Replying to cboos:

I don't really agree with jonas when he says that this approach would involve maintaining a backend-specific schema: the data model remains the same, it's just about an index, which is primarily here for performance reasons, the uniqueness constraint being only an (useful) side-effect, not something we rely on.

Anyway, when a table gets really modified, we usually blow away the table and recreate it, so the exact kind of index used won't matter.

True, as long as the backend-specific deviation is limited to different index/primary key configurations or other things that doesn't affect the data model I think we can handle it.

Changed 12 years ago by Dirk Datzert <dummy@…>

Attachment: trac-mysql_backend.diff added

My quick version of the fix

comment:13 Changed 12 years ago by Christian Boos

Resolution: fixed
Status: newclosed

patch applied in r4755 (trunk) and r4756 (0.10-stable).

For 0.12 and the VcRefactoring, care must be taken to avoid the problem of long primary keys, so that we can revert the special casing of column names (but not the s/500/333/ part, of course).

comment:14 Changed 12 years ago by Hans

Resolution: fixed
Status: closedreopened

Cboos its not a patch. we use long file names and still have it.

This ssue should be solved the correct way. Not the dirty one which is in the patch.

A patch can not be in a index.

comment:15 Changed 12 years ago by Christian Boos

OK, I think that for 0.10.4 / 0.11, we should simply remove that primary key:

  1. it's not used
  2. the argument I gave in comment:11 no longer applies I think, as we didn't get any other such subtle bug reported, so I think the code is OK now ;-)
  3. there's a new cache planned anyway

Changed 12 years ago by Christian Boos

Simply remove the problematic index

comment:16 Changed 12 years ago by Christian Boos

Please have a try of the attachment:remove_node_change_pk-r5133.diff.

If there's (positive) feedback in time, this may go in 0.10.4.

Changed 12 years ago by Matthew Good

Attachment: node_change_index.txt added

comparison of EXPLAIN with and without key

comment:17 Changed 12 years ago by Matthew Good

Well, the key certainly does impact the execution plan of the databases since it provides useful information for the database to index on. See attachment:node_change_index.txt comparing SQLite and PostgreSQL EXPLAIN output with and without the key. I tried EXPLAIN on MySQL as well, but the output didn't seem to contain enough information to be useful. So, we shouldn't be penalizing other databases' performance because of MySQL's limitations. The key has also caught bugs such as #4586 where without it we would have created duplicate records, so I'm not sure it's valid to say "it's not used".

comment:18 Changed 12 years ago by Christian Boos

There's an index on the rev field, which gets used instead of the primary key, as shown in the explain output. This is where the main cost reside, and for that there's no difference. There's possibly a small additional cost due to the ORDER BY path which can't get optimized without an index on the path, but that should be pretty marginal for anything but changesets containing thousands of entries. And for those, we certainly have rendering costs which marginalize the sorting penalty even more.

As for #4586, well that's a good point, though with the new sync code, there shouldn't be any possibility left for even trying to create duplicate node_change entries… but we never know.

Ok, maybe the right thing to do would be to scrap that pk at the mysql_backend.py level only, and leave the general model and the other backends alone. Trouble is, what do we do for the existing MySQL environments? Do we try to "migrate" them or consider that the problem will be fixed only for new environments?

comment:19 Changed 12 years ago by Christian Boos

Milestone: 0.10.40.10.5

No resolution for 0.10.4 found.

comment:20 Changed 12 years ago by anonymous

Note that this ticket it still mentioned on the 0.10.4 milestone (in the text) which is a little misleading.

comment:21 Changed 12 years ago by Christian Boos

Ok, I'll update this. Note that even if the current fix was not deemed to be enough (see comment:14), the current situation is a bit better than the 0.10.3 one (see comment:13).

Please note that supporting MySQL is not high on my priority list, so I won't make any further attempt to fix this, as I'll concentrate my efforts on the new cache for 0.12 which will make the issue go away. I guess this ticket will end up with a wontfix resolution some day, unless someone comes with a better proposal or there's finally some consensus on comment:15 and comment:16 ;-)

comment:22 Changed 12 years ago by Christian Boos

Component: generalversion control
Milestone: 0.10.50.12

The new repository cache scheduled for milestone:0.12 will fix this and #4378 as well.

Of course, if someone has a better idea in the meantime for 0.11, feel free to submit it. Preferably without involving any database upgrade, so more likely a mysql_backend.py fix only.

comment:23 Changed 6 years ago by anonymous

The solution I got is to change database schema in "mysql_backend.py" . If we drop all primary keys of "node_change" table then there will be no restriction for path size. Now my question is whether it will affect the other trac module functionality or other database functionality. Also is there any plan to fix this in future MYSQL version.

comment:25 Changed 4 years ago by Jun Omae

Milestone: next-major-releases1.1.3

#11782 was closed as duplicate.

I'll try to solve this issue and refresh remove_node_change_pk-r5133.diff on trunk. Also, I think we should add AUTO_INCREMENT as primary key rather than dropping primary key.

comment:26 Changed 4 years ago by Jun Omae

Owner: changed from Christian Boos to Jun Omae
Status: reopenedassigned

comment:27 Changed 4 years ago by Jun Omae

Milestone: 1.1.31.1.4

comment:28 Changed 4 years ago by Jun Omae

Proposed changes in jomae.git@t3676, as the result it would improve performance on MySQL.

  • Replace the primary key with auto-increment id in node_change table.
    • Unique index (repos,rev,path,change_type) would be removed.
  • Add index node_change (repos,rev,path).
  • Add index node_change (repos,path,rev).

SQLite:

1.1.1 1.1.2 1.1.3 r13774 t3676
0.286283s 0.278857s 0.272613s 0.256555s 0.280069s _next_prev_rev('<', 8802, 'trunk/tracopt/mimeview/tests/php.py')
0.070519s 0.067838s 0.066458s 0.056563s 0.057104s _next_prev_rev('>', 8802, 'trunk/tracopt/mimeview/tests/php.py')
0.021353s 0.014886s 0.014168s 0.007420s 0.001173s _next_prev_rev('<', 12000, 'trunk')
0.015980s 0.015816s 0.015060s 0.008035s 0.001219s _next_prev_rev('>', 12000, 'trunk')
0.015870s 0.015786s 0.017921s 0.007970s 0.000907s _next_prev_rev('<', 12345)
0.016744s 0.016615s 0.015915s 0.008493s 0.000902s _next_prev_rev('>', 12345)

PostgreSQL:

1.1.1 1.1.2 1.1.3 r13774 t3676
0.710516s 0.232353s 0.234274s 0.234219s 0.243118s _next_prev_rev('<', 8802, 'trunk/tracopt/mimeview/tests/php.py')
0.147532s 0.050900s 0.050583s 0.053642s 0.053488s _next_prev_rev('>', 8802, 'trunk/tracopt/mimeview/tests/php.py')
0.007082s 0.005189s 0.005201s 0.006669s 0.006331s _next_prev_rev('<', 12000, 'trunk')
0.007629s 0.005721s 0.005705s 0.007117s 0.006422s _next_prev_rev('>', 12000, 'trunk')
0.004330s 0.004323s 0.004324s 0.004459s 0.004288s _next_prev_rev('<', 12345)
0.004012s 0.004007s 0.004024s 0.004757s 0.004594s _next_prev_rev('>', 12345)

MySQL:

1.1.1 1.1.2 1.1.3 r13774 t3676
1.186757s 1.178240s 1.166688s 0.946206s 0.010578s _next_prev_rev('<', 8802, 'trunk/tracopt/mimeview/tests/php.py')
0.157969s 0.153565s 0.151270s 0.119920s 0.015570s _next_prev_rev('>', 8802, 'trunk/tracopt/mimeview/tests/php.py')
0.859452s 0.847304s 0.844246s 0.987854s 0.340267s _next_prev_rev('<', 12000, 'trunk')
0.106523s 0.097104s 0.097142s 0.043159s 0.068852s _next_prev_rev('>', 12000, 'trunk')
1.418827s 1.437706s 1.419752s 0.079679s 0.080730s _next_prev_rev('<', 12345)
0.103221s 0.097151s 0.096800s 0.010424s 0.010672s _next_prev_rev('>', 12345)
Last edited 4 years ago by Jun Omae (previous) (diff)

comment:29 Changed 4 years ago by Jun Omae

Release Notes: modified (diff)
Resolution: fixed
Status: assignedclosed

Committed in [13782].

comment:30 Changed 4 years ago by Anonymous

Hi "jomae",

we are currently struckling with this "behaviour" regarding "Primary_Key" and i was really happy that someone acts on this issue.

My question is: Can this "Change" already being downloaded or applied manually ?

Thanks in advance!

comment:31 in reply to:  30 Changed 4 years ago by Ryan J Ollos

Replying to Anonymous:

My question is: Can this "Change" already being downloaded or applied manually ?

The issue was fixed in milestone:1.1.4. You can download from TracDownload#LatestDevRelease, however please read the caveats about using a developer stable release. Trac 1.2 should be released in a few months, will be production stable, and include the fix.

comment:32 Changed 3 years ago by Ryan J Ollos

#12412 also closed as a duplicate.

I encountered IntegrityErrors on Trac 1.0.10 with MySQL due to repository paths that were not unique in the first 255 characters. A developer committed a library to the repository, which resulted in many very long paths in a single changeset (subversion).

While it looks like the node caching issue will be resolved in Trac 1.0.10, is it correct that paths will still be truncated to 255 characters? Will there be an impact on the changeset view?

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jun Omae.
The resolution will be deleted.
to The owner will be changed from Jun Omae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.