Opened 16 years ago
Closed 16 years ago
#8731 closed enhancement (fixed)
experiment with surrogate keys for the repository table
| Reported by: | Christian Boos | Owned by: | Remy Blank |
|---|---|---|---|
| Priority: | normal | Milestone: | 0.12-multirepos |
| Component: | version control | Version: | none |
| Severity: | normal | Keywords: | consider multirepos |
| Cc: | lele@… | Branch: | |
| Release Notes: | |||
| API Changes: | |||
| Internal Changes: | |||
Description
In MultipleRepositorySupport branch, we introduced a new table for storing information about repositories. This table uses the "traditional" natural key approach.
However, we could consider using surrogate keys instead. Some version control backends (e.g. TracDarcs) have to store extra informations for changesets in their own table, and changesets have to be associated with a repository. This works best if the repository key would be a surrogate key, so the relations won't be lost when changing the "natural" key of a repository (i.e. renaming the repository).
Attachments (5)
Change History (28)
comment:1 by , 16 years ago
| Owner: | set to |
|---|
by , 16 years ago
| Attachment: | 8731-surrogate-keys-r8662.patch added |
|---|
First experiment with surrogate keys for repositories.
comment:2 by , 16 years ago
| Cc: | added |
|---|
Here's a first experiment with surrogate keys for repositories. A few comments:
- The DB upgrade code is not yet included, so you'll have to create a new environment for testing.
- Repository ids are generated in
RepositoryManager.get_repository_id(). Basically, for each name passed to that method, a row is created in therepositorytable. This also applies to repositories defined intrac.ini.
- There's a small race condition in
get_repository_id(), but I don't know how to remove it. The same race is also present inAbstractEnum.insert().
- Every
Repositoryobject now has an.idattribute containing its unique id. It is set in the same way as the.reponameattribute, after retrieving the repository from the connector.
- Even with the surrogate keys, we will probably have to add some hooks for repositories that store data in auxiliary tables. For example, the
revisionandnode_changetables are cleared inVersionControlAdmin._sync()prior to callingRepository.sync(). This should probably be moved to the repository itself, in aclean()method.
Lele, it would be great to get some feedback from you about this approach, in particular, if it would be sufficient w.r.t. TracDarcs/Cache.
follow-up: 4 comment:3 by , 16 years ago
I did some initial testing, everything seems to be working fine except the repository alias command.
I think the race condition in get_repository_id is not too worrisome. However, as the repositories defined in the .ini file will also get assigned an id, there's the following scenario to consider: at the first startup, two concurrent requests for the /browser could trigger a conflict - if the only consequence is one of the requests failing, then it's not a serious problem.
The clean method sounds nice, but don't we already use that name? Ah, no, we use clear(). Maybe a longer name would be more explicit, like reset_before_sync.
follow-up: 5 comment:4 by , 16 years ago
Replying to cboos:
I did some initial testing, everything seems to be working fine except the
repository aliascommand.
Right, I hadn't tested that one.
I think the race condition in
get_repository_idis not too worrisome. However, as the repositories defined in the.inifile will also get assigned an id, there's the following scenario to consider: at the first startup, two concurrent requests for the /browser could trigger a conflict - if the only consequence is one of the requests failing, then it's not a serious problem.
That's what I would expect: both processes would try to insert a row with the same primary key (id, 'name'), and one of them will trigger a primary key violation.
I would still be interested in a general solution to this situation, namely inserting a new row atomically if it doesn't exist yet. The other "interesting" situation is inserting a new row if it doesn't exist, or update the existing row (sometimes called UPSERT). You'd think this is standard SQL, but it doesn't seem to be the case.
The
cleanmethod sounds nice, but don't we already use that name? Ah, no, we useclear(). Maybe a longer name would be more explicit, likereset_before_sync.
Makes sense.
So I assume you are convinced that this is the right approach? Then I'll continue with the fixes above and the DB upgrade code.
A side issue: you are going to break the Repository constructor signature in #5640/#7116 by removing the authz argument, right? Could we take this opportunity to pass the reponame and id as arguments to the constructor, instead of setting them afterwards in the RepositoryManager? That would make the code much cleaner (especially in CachedRepository).
follow-up: 6 comment:5 by , 16 years ago
Replying to rblank:
I would still be interested in a general solution to this situation, namely inserting a new row atomically if it doesn't exist yet.
Sure, we should eventually find a nice solution for that, but it's not a blocker as the problem should be very exceptional and have no damaging consequences if it ever happens.
The other "interesting" situation is inserting a new row if it doesn't exist, or update the existing row (sometimes called UPSERT). You'd think this is standard SQL, but it doesn't seem to be the case.
Well, what you did for this in trac/cache.py is perfect for me, are there other places that need a similar handling?
So I assume you are convinced that this is the right approach? Then I'll continue with the fixes above and the DB upgrade code.
Yes, that's the way I'd like to go. But we can wait a bit for additional feedback.
Since yesterday I also tested [repositories] and it seems that this provider is still working as usual (i.e. handy for quickly adding/removing temporary Mercurial repositories, for example).
A side issue: you are going to break the
Repositoryconstructor signature in #5640/#7116 by removing theauthzargument, right? Could we take this opportunity to pass thereponameandidas arguments to the constructor, instead of setting them afterwards in theRepositoryManager? That would make the code much cleaner (especially inCachedRepository).
Good idea, I'll do that.
comment:6 by , 16 years ago
Replying to anonymous (probably cboos):
Sure, we should eventually find a nice solution for that, but it's not a blocker as the problem should be very exceptional and have no damaging consequences if it ever happens.
Sure, that's not critical here. I'm just curious.
Well, what you did for this in trac/cache.py is perfect for me, are there other places that need a similar handling?
No, and the code in the cache is the best I could come up with. The only thing I'm not absolutely sure is whether an UPDATE that doesn't update any rows actually starts a transaction or not.
It's just that these two operations sound so common that I am amazed that SQL doesn't provide a definite answer to that. I have even asked on StackOverflow, and the replies were all DB-specific.
comment:7 by , 16 years ago
Here's a patch with (I believe) the full functionality. Changes compared to the previous one:
- Fixed repository aliases.
- Added the database upgrade code.
- Moved the repository cache clean code into
CachedRepository.sync(), triggered by a new optional argumentclean. - Added the
reponameandidarguments to theRepositoryconstructor.
Feedback and testing welcome, especially for database upgrades on backends other than SQLite.
by , 16 years ago
| Attachment: | 8731-mercurial-r8664.patch added |
|---|
Adds then new reponame and id arguments to MercurialRepository.__init__().
comment:8 by , 16 years ago
Upgrade from a 0.11-stable database for which the default repository was a mirror of Trac's repository worked great for PostgreSQL.
For MySQL unfortunately I had this issue:
17:23:38 Trac[util] DEBUG: SQL: "INSERT INTO revision (repos,rev,time,author,message) SELECT '',rev,time,author,message FROM rev_old" 17:23:38 Trac[util] DEBUG: execute exception: OperationalError(1366, "Incorrect string value: '\\xC5\\x88 thr...' for co lumn 'message' at row 5342") OperationalError: (1366, "Incorrect string value: '\\xC5\\x88 thr...' for column 'message' at row 5342")
Looks like this corresponds to r5806. I'm investigating (maybe it's only due to the way I copied the MySQL db).
comment:9 by , 16 years ago
Ok, my bad. Both the original db and copy db seemed to have the same content (by doing a mysqldump and comparing them), but actually I just did a create database <copy> instead of the whole incantation (create database <copy> default character set utf8 collate utf8_bin).
The upgrade worked flawlessly after fixing that.
I also added a teo alias to the default repository, and that worked as well.
Now doing a last test with a SQLite db and Mercurial.
comment:11 by , 16 years ago
Thanks for the extensive testing!
Are you also ok with the signature change on Repository.__init__()?
comment:12 by , 16 years ago
Some details in the docstring about what are each of reponame, id, name would be useful , other than that it's OK.
comment:13 by , 16 years ago
Patch applied in [8672], with the additional documentation.
I will document the signature change on ApiChanges/0.12 in the next few days.
comment:15 by , 16 years ago
Committed 8731-mercurial-r8664.patch in r8676. Anything else or can we close?
comment:16 by , 16 years ago
Only the documentation of the API change. I can probably do that tonight.
comment:17 by , 16 years ago
I'm sorry, while I do follow the thread I couldn't dedicate the needed effort to understand the details and adapt my TracDarcs backend to the new API. Thanks for doing it, anyway! :-)
comment:18 by , 16 years ago
I wonder about one thing: instead of passing the reponame and id to Repository.__init__(), hence requiring all repository connectors to extract them from the options, would it be a better idea to pass the options dict directly to Repository.__init__()? Something like the following patch.
by , 16 years ago
| Attachment: | 8731-repo-options-r8673.patch added |
|---|
Replace separate reponame and id arguments with an options dict in Repository.__init__().
follow-up: 20 comment:19 by , 16 years ago
I thought about that, but I didn't propose it because I don't like putting mandatory arguments in something called options ;-)
But the API simplification is appealing. Maybe rename options to params?
comment:20 by , 16 years ago
Replying to cboos:
But the API simplification is appealing. Maybe rename
optionstoparams?
Sounds good. I'll make the change and commit tonight.
comment:21 by , 16 years ago
Simplified API committed in [8677]. Patch for mercurial-plugin following shortly.
comment:23 by , 16 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
The last patch has been applied in [8692], so this can be closed.



As discussed on IRC.