Edgewall Software
Modify

Opened 17 years ago

Closed 16 years ago

#4378 closed defect (fixed)

MySQL structure case insensitive causes collision with file names

Reported by: jackie.m@… Owned by: Jonas Borgström
Priority: high Milestone: 0.11.1
Component: version control Version: 0.10.3rc1
Severity: critical Keywords: mysql, 4378, patch
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christian Boos)

When running init with a MySQL backend on a subversion repository that has a case collision, the following error comes up. The repo in this case has two symlinks inside of it named "CTE" and "cte". This needs to be changed for case sensitive file systems.

Failed to initialize environment. (1062, "Duplicate entry '1-html/cte-A' for key 1")
Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/trac/scripts/admin.py", line 628, in do_initenv
    repos = self.__env.get_repository()
  File "/usr/lib/python2.3/site-packages/trac/env.py", line 195, in get_repository
    return RepositoryManager(self).get_repository(authname)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository
    repos = self._connector.get_repository(rtype, rdir, authname)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository
    crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__
    self.sync()
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync
    (str(current_rev), path, kind, action,
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
    raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry '1-html/cte-A' for key 1")

Attachments (3)

mysql_backend.path.binary.diff (890 bytes ) - added by jackie.m@… 17 years ago.
changes mysql_backend.py to use binary to store the path in node_change
mysql_backend.path.binary.2.diff (704 bytes ) - added by jackie.m@… 17 years ago.
changes mysql_backend.py to use binary to store the path in node_change
mysql_unicode.patch (3.7 KB ) - added by Jonas Borgström 16 years ago.
Implements str→conversion in mysql_backend.py instead of relying on python-mysql

Download all attachments as: .zip

Change History (21)

in reply to:  description comment:1 by Christian Boos, 17 years ago

Component: trac-admingeneral
Description: modified (diff)
Milestone: 0.10.4
Owner: changed from Christopher Lenz to Jonas Borgström

Replying to jackie.m@iddl.vt.edu:

… This needs to be changed for case sensitive file systems.

Well, the file system has nothing to do here, as we're taking the information straight from the repository. Looks like it would be MySQL which has this case sensitivity issue for its keys. There must be something to configure at the MySQL level, I can't believe it's that broken.

See also #3676, for a different but related problem for the same key.

comment:2 by jackie.m@…, 17 years ago

(The case sensitive file systems comment was referring to what's in the subversion repo. If subversion was storing from something on a case insensitive file system, the file system used would not allow the prerequisite condition to exist, and no error would occur in trac either. The comment was not applying to how the database or trac is storing anything, but how it tracks the subversion repo)

We're doing some investigation, and though this is page talks about char/varchar, it appears to be true of text fields as well. From http://sql-info.de/mysql/gotchas.html#1_4:

The manual says: … Values in CHAR and VARCHAR columns are sorted and compared in case-insensitive fashion, unless the BINARY attribute was specified when the table was created. The BINARY attribute means that column values are sorted and compared in case-sensitive fashion according to the ASCII order of the machine where the MySQL server is running. BINARY doesn't affect how the column is stored or retrieved. … —http://dev.mysql.com/doc/mysql/en/CHAR.html

Note that MySQL's behaviour in this regard is the diametrical opposite of the default behaviour of most other databases. (Tested: DB2 8.1, Firebird 1.5.1, Oracle 8.1.7 and PostgreSQL 7.4.3, details of other databases welcome).

The database structure for trac in mysql would need to be changed to account for this, and since this doesn't happen in Postgres (assumed from above) or sqlite (confirmed by testing), it's definitely a MySQL issue that needs to be worked around.

comment:3 by jackie.m@…, 17 years ago

Workaround found, patched attached. It basically makes the path column in the node change table be a binary (latin1_bin) field. I'm not sure if this will effect unicode.

by jackie.m@…, 17 years ago

changes mysql_backend.py to use binary to store the path in node_change

by jackie.m@…, 17 years ago

changes mysql_backend.py to use binary to store the path in node_change

comment:4 by Christian Boos, 17 years ago

Keywords: needinfo added

The collation setting must be utf8_general_ci, as explained in #3884 and the MySqlDb page.

It would be very interesting to know if this fixes the original problem:

The repo in this case has two symlinks inside of it named "CTE" and "cte".

comment:5 by jackie.m@…, 17 years ago

CREATE DATABASE trac DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; and to avoid #3673 changed limit to 333, still get the same error.

Patching as the above attachement still fixes the problem. Text fields in MySQL are apparently still case insensitive even in unicode.

in reply to:  5 comment:6 by Christian Boos, 17 years ago

Replying to jackie.m@iddl.vt.edu:

CREATE DATABASE trac DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; and to avoid #3673 changed limit to 333, still get the same error.

Patching as the above attachement still fixes the problem. Text fields in MySQL are apparently still case insensitive even in unicode.

And with COLLATE utf8_general_ci instead of COLLATE utf8_bin?

comment:7 by Christian Boos, 17 years ago

Component: generalversion control
Keywords: needinfo removed
Milestone: 0.10.50.12
Owner: changed from Jonas Borgström to Christian Boos

The new repository cache scheduled for milestone:0.12 will fix this.

Of course, if someone has a better idea in the meantime for 0.11, feel free to submit it.

comment:8 by Christian Boos, 17 years ago

Milestone: 0.12
Resolution: duplicate
Status: newclosed

oops, I intended to make the above comment on #3676, see there.

comment:9 by Christian Boos, 16 years ago

Milestone: 0.10.5
Resolution: duplicate
Status: closedreopened

As 0.11 is going to stay for a while, I'm going to apply the patch.

comment:10 by arasm@…, 16 years ago

Keywords: 4378 patch added
Severity: normalcritical

Hi all:

After applying the patch, nothing has changed. I was wondering if anyone was able to figure out where the clashing keys are and a routine to just delete it?

comment:11 by arasm@…, 16 years ago

btw, the reason I classified this as "critical" is due to the fact that all web based functionality has stopped.

comment:12 by Christian Boos, 16 years ago

Milestone: 0.10.50.12

The new VcCache needs to take this limitation into account.

comment:13 by Jonas Borgström, 16 years ago

Hmm, are you guys sure this patch really works, even with non-ascii text?

Even though I hate mysql as much as the next guy I took a quick look at this and this is how I understand the situation:

  1. Trac pretty much expect the database to be case insensitive and all other supported backends except mysql always are.
  1. There are two different ways to get a mysql database to be case insensitive:

2.1. Make the entire database case insensitive by specifying the collation at database creation time:

CREATE DATABASE foo CHARACTER SET utf8 COLLATION utf8_bin;

2.2. Specify the collation at the column level:

CREATE TABLE foo (x VARCHAR(10) BINARY);

Using either approach the end result is the same. MySQL will "tag" all text values returned from the database as having the collation "BINARY". This in turn triggers a python-mysqldb bug where python-mysqldb thinks all text is binary blobs and is returned as str objects instead of unicode.

So until that python-mysqldb bug is fixed I can't see any way to configure mysql to be case sensitive and at the same time get pyhon-mysqldb to return proper unicode strings. The only solution I can think of right now is that we take care of the utf8→unicode decoding ourself in mysql_backend.py.

by Jonas Borgström, 16 years ago

Attachment: mysql_unicode.patch added

Implements str→conversion in mysql_backend.py instead of relying on python-mysql

comment:14 by Jonas Borgström, 16 years ago

Attached a sample patch that implements the str→unicode conversion in mysql_backend.py instead of relying on the currently broken (bug) implementation in the python-mysql driver.

With this patch Trac seems to work correctly with non-ascii text and a case sensitive database (CREATE DATABASE name CHARACTER SET utf8 COLLATE utf8_bin).

This change actually simplifies mysql_backend.py a bit but also makes the code more dependent on python-mysql internal structures (The Cursor class).

comment:15 by Jonas Borgström, 16 years ago

Owner: changed from Christian Boos to Jonas Borgström
Status: reopenednew

comment:16 by Jonas Borgström, 16 years ago

Milestone: 0.120.11.1
Status: newassigned

OK, I've now done some further testing and it seems to work ok so I've committed it to trunk (r7222).

I'll leave this ticket open for now since I hope to be able to include these changes in 0.11.1 unless some problems with this approach is found.

comment:17 by Christian Boos, 16 years ago

Milestone: 0.11.20.11.1

comment:18 by Jonas Borgström, 16 years ago

Resolution: fixed
Status: assignedclosed

Fixed in r7286 for 0.11-stable and r7222 for trunk.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Jonas Borgström.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Jonas Borgström to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.