Opened 7 years ago
Closed 5 years ago
#4378 closed defect (fixed)
MySQL structure case insensitive causes collision with file names
| Reported by: | jackie.m@… | Owned by: | jonas |
|---|---|---|---|
| Priority: | high | Milestone: | 0.11.1 |
| Component: | version control | Version: | 0.10.3rc1 |
| Severity: | critical | Keywords: | mysql, 4378, patch |
| Cc: | |||
| Release Notes: | |||
| API Changes: | |||
Description (last modified by cboos)
When running init with a MySQL backend on a subversion repository that has a case collision, the following error comes up. The repo in this case has two symlinks inside of it named "CTE" and "cte". This needs to be changed for case sensitive file systems.
Failed to initialize environment. (1062, "Duplicate entry '1-html/cte-A' for key 1")
Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/trac/scripts/admin.py", line 628, in do_initenv
repos = self.__env.get_repository()
File "/usr/lib/python2.3/site-packages/trac/env.py", line 195, in get_repository
return RepositoryManager(self).get_repository(authname)
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository
repos = self._connector.get_repository(rtype, rdir, authname)
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository
crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log)
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__
self.sync()
File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync
(str(current_rev), path, kind, action,
File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
return self.cursor.execute(sql_escape_percent(sql), args)
File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
return self.cursor.execute(sql_escape_percent(sql), args)
File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry '1-html/cte-A' for key 1")
Attachments (3)
Change History (21)
comment:1 in reply to: ↑ description Changed 7 years ago by cboos
- Component changed from trac-admin to general
- Description modified (diff)
- Milestone set to 0.10.4
- Owner changed from cmlenz to jonas
comment:2 Changed 7 years ago by jackie.m@…
(The case sensitive file systems comment was referring to what's in the subversion repo. If subversion was storing from something on a case insensitive file system, the file system used would not allow the prerequisite condition to exist, and no error would occur in trac either. The comment was not applying to how the database or trac is storing anything, but how it tracks the subversion repo)
We're doing some investigation, and though this is page talks about char/varchar, it appears to be true of text fields as well. From http://sql-info.de/mysql/gotchas.html#1_4:
The manual says: … Values in CHAR and VARCHAR columns are sorted and compared in case-insensitive fashion, unless the BINARY attribute was specified when the table was created. The BINARY attribute means that column values are sorted and compared in case-sensitive fashion according to the ASCII order of the machine where the MySQL server is running. BINARY doesn't affect how the column is stored or retrieved. … —http://dev.mysql.com/doc/mysql/en/CHAR.html
Note that MySQL's behaviour in this regard is the diametrical opposite of the default behaviour of most other databases. (Tested: DB2 8.1, Firebird 1.5.1, Oracle 8.1.7 and PostgreSQL 7.4.3, details of other databases welcome).
The database structure for trac in mysql would need to be changed to account for this, and since this doesn't happen in Postgres (assumed from above) or sqlite (confirmed by testing), it's definitely a MySQL issue that needs to be worked around.
comment:3 Changed 7 years ago by jackie.m@…
Workaround found, patched attached. It basically makes the path column in the node change table be a binary (latin1_bin) field. I'm not sure if this will effect unicode.
Changed 7 years ago by jackie.m@…
changes mysql_backend.py to use binary to store the path in node_change
Changed 7 years ago by jackie.m@…
changes mysql_backend.py to use binary to store the path in node_change
comment:4 Changed 6 years ago by cboos
- Keywords needinfo added
comment:5 follow-up: ↓ 6 Changed 6 years ago by jackie.m@…
CREATE DATABASE trac DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; and to avoid #3673 changed limit to 333, still get the same error.
Patching as the above attachement still fixes the problem. Text fields in MySQL are apparently still case insensitive even in unicode.
comment:6 in reply to: ↑ 5 Changed 6 years ago by cboos
Replying to jackie.m@iddl.vt.edu:
CREATE DATABASE trac DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; and to avoid #3673 changed limit to 333, still get the same error.
Patching as the above attachement still fixes the problem. Text fields in MySQL are apparently still case insensitive even in unicode.
And with COLLATE utf8_general_ci instead of COLLATE utf8_bin?
comment:7 Changed 6 years ago by cboos
- Component changed from general to version control
- Keywords needinfo removed
- Milestone changed from 0.10.5 to 0.12
- Owner changed from jonas to cboos
The new repository cache scheduled for milestone:0.12 will fix this.
Of course, if someone has a better idea in the meantime for 0.11, feel free to submit it.
comment:8 Changed 6 years ago by cboos
- Milestone 0.12 deleted
- Resolution set to duplicate
- Status changed from new to closed
oops, I intended to make the above comment on #3676, see there.
comment:9 Changed 6 years ago by cboos
- Milestone set to 0.10.5
- Resolution duplicate deleted
- Status changed from closed to reopened
As 0.11 is going to stay for a while, I'm going to apply the patch.
comment:10 Changed 5 years ago by arasm@…
- Keywords 4378 patch added
- Severity changed from normal to critical
Hi all:
After applying the patch, nothing has changed. I was wondering if anyone was able to figure out where the clashing keys are and a routine to just delete it?
comment:11 Changed 5 years ago by arasm@…
btw, the reason I classified this as "critical" is due to the fact that all web based functionality has stopped.
comment:12 Changed 5 years ago by cboos
- Milestone changed from 0.10.5 to 0.12
The new VcCache needs to take this limitation into account.
comment:13 Changed 5 years ago by jonas
Hmm, are you guys sure this patch really works, even with non-ascii text?
Even though I hate mysql as much as the next guy I took a quick look at this and this is how I understand the situation:
- Trac pretty much expect the database to be case insensitive and all other supported backends except mysql always are.
- There are two different ways to get a mysql database to be case insensitive:
2.1. Make the entire database case insensitive by specifying the collation at database creation time:
CREATE DATABASE foo CHARACTER SET utf8 COLLATION utf8_bin;
2.2. Specify the collation at the column level:
CREATE TABLE foo (x VARCHAR(10) BINARY);
Using either approach the end result is the same. MySQL will "tag" all text values returned from the database as having the collation "BINARY". This in turn triggers a python-mysqldb bug where python-mysqldb thinks all text is binary blobs and is returned as str objects instead of unicode.
So until that python-mysqldb bug is fixed I can't see any way to configure mysql to be case sensitive and at the same time get pyhon-mysqldb to return proper unicode strings. The only solution I can think of right now is that we take care of the utf8→unicode decoding ourself in mysql_backend.py.
Changed 5 years ago by jonas
Implements str→conversion in mysql_backend.py instead of relying on python-mysql
comment:14 Changed 5 years ago by jonas
Attached a sample patch that implements the str→unicode conversion in mysql_backend.py instead of relying on the currently broken (bug) implementation in the python-mysql driver.
With this patch Trac seems to work correctly with non-ascii text and a case sensitive database (CREATE DATABASE name CHARACTER SET utf8 COLLATE utf8_bin).
This change actually simplifies mysql_backend.py a bit but also makes the code more dependent on python-mysql internal structures (The Cursor class).
comment:15 Changed 5 years ago by jonas
- Owner changed from cboos to jonas
- Status changed from reopened to new
comment:16 Changed 5 years ago by jonas
- Milestone changed from 0.12 to 0.11.1
- Status changed from new to assigned
OK, I've now done some further testing and it seems to work ok so I've committed it to trunk (r7222).
I'll leave this ticket open for now since I hope to be able to include these changes in 0.11.1 unless some problems with this approach is found.
comment:17 Changed 5 years ago by cboos
- Milestone changed from 0.11.2 to 0.11.1
comment:18 Changed 5 years ago by jonas
- Resolution set to fixed
- Status changed from assigned to closed



Replying to jackie.m@iddl.vt.edu:
Well, the file system has nothing to do here, as we're taking the information straight from the repository. Looks like it would be MySQL which has this case sensitivity issue for its keys. There must be something to configure at the MySQL level, I can't believe it's that broken.
See also #3676, for a different but related problem for the same key.