Edgewall Software
Modify

Opened 8 years ago

Closed 6 years ago

#4378 closed defect (fixed)

MySQL structure case insensitive causes collision with file names

Reported by: jackie.m@… Owned by: jonas
Priority: high Milestone: 0.11.1
Component: version control Version: 0.10.3rc1
Severity: critical Keywords: mysql, 4378, patch
Cc:
Release Notes:
API Changes:

Description (last modified by cboos)

When running init with a MySQL backend on a subversion repository that has a case collision, the following error comes up. The repo in this case has two symlinks inside of it named "CTE" and "cte". This needs to be changed for case sensitive file systems.

Failed to initialize environment. (1062, "Duplicate entry '1-html/cte-A' for key 1")
Traceback (most recent call last):
  File "/usr/lib/python2.3/site-packages/trac/scripts/admin.py", line 628, in do_initenv
    repos = self.__env.get_repository()
  File "/usr/lib/python2.3/site-packages/trac/env.py", line 195, in get_repository
    return RepositoryManager(self).get_repository(authname)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/api.py", line 101, in get_repository
    repos = self._connector.get_repository(rtype, rdir, authname)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/svn_fs.py", line 260, in get_repository
    crepos = CachedRepository(self.env.get_db_cnx(), repos, None, self.log)
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 34, in __init__
    self.sync()
  File "/usr/lib/python2.3/site-packages/trac/versioncontrol/cache.py", line 100, in sync
    (str(current_rev), path, kind, action,
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/trac/db/util.py", line 50, in execute
    return self.cursor.execute(sql_escape_percent(sql), args)
  File "/usr/lib/python2.3/site-packages/MySQLdb/cursors.py", line 163, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.3/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
    raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry '1-html/cte-A' for key 1")

Attachments (3)

mysql_backend.path.binary.diff (890 bytes) - added by jackie.m@… 8 years ago.
changes mysql_backend.py to use binary to store the path in node_change
mysql_backend.path.binary.2.diff (704 bytes) - added by jackie.m@… 8 years ago.
changes mysql_backend.py to use binary to store the path in node_change
mysql_unicode.patch (3.7 KB) - added by jonas 7 years ago.
Implements str→conversion in mysql_backend.py instead of relying on python-mysql

Download all attachments as: .zip

Change History (21)

comment:1 in reply to: ↑ description Changed 8 years ago by cboos

  • Component changed from trac-admin to general
  • Description modified (diff)
  • Milestone set to 0.10.4
  • Owner changed from cmlenz to jonas

Replying to jackie.m@iddl.vt.edu:

… This needs to be changed for case sensitive file systems.

Well, the file system has nothing to do here, as we're taking the information straight from the repository. Looks like it would be MySQL which has this case sensitivity issue for its keys. There must be something to configure at the MySQL level, I can't believe it's that broken.

See also #3676, for a different but related problem for the same key.

comment:2 Changed 8 years ago by jackie.m@…

(The case sensitive file systems comment was referring to what's in the subversion repo. If subversion was storing from something on a case insensitive file system, the file system used would not allow the prerequisite condition to exist, and no error would occur in trac either. The comment was not applying to how the database or trac is storing anything, but how it tracks the subversion repo)

We're doing some investigation, and though this is page talks about char/varchar, it appears to be true of text fields as well. From http://sql-info.de/mysql/gotchas.html#1_4:

The manual says: … Values in CHAR and VARCHAR columns are sorted and compared in case-insensitive fashion, unless the BINARY attribute was specified when the table was created. The BINARY attribute means that column values are sorted and compared in case-sensitive fashion according to the ASCII order of the machine where the MySQL server is running. BINARY doesn't affect how the column is stored or retrieved. … —http://dev.mysql.com/doc/mysql/en/CHAR.html

Note that MySQL's behaviour in this regard is the diametrical opposite of the default behaviour of most other databases. (Tested: DB2 8.1, Firebird 1.5.1, Oracle 8.1.7 and PostgreSQL 7.4.3, details of other databases welcome).

The database structure for trac in mysql would need to be changed to account for this, and since this doesn't happen in Postgres (assumed from above) or sqlite (confirmed by testing), it's definitely a MySQL issue that needs to be worked around.

comment:3 Changed 8 years ago by jackie.m@…

Workaround found, patched attached. It basically makes the path column in the node change table be a binary (latin1_bin) field. I'm not sure if this will effect unicode.

Changed 8 years ago by jackie.m@…

changes mysql_backend.py to use binary to store the path in node_change

Changed 8 years ago by jackie.m@…

changes mysql_backend.py to use binary to store the path in node_change

comment:4 Changed 8 years ago by cboos

  • Keywords needinfo added

The collation setting must be utf8_general_ci, as explained in #3884 and the MySqlDb page.

It would be very interesting to know if this fixes the original problem:

The repo in this case has two symlinks inside of it named "CTE" and "cte".

comment:5 follow-up: Changed 8 years ago by jackie.m@…

CREATE DATABASE trac DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; and to avoid #3673 changed limit to 333, still get the same error.

Patching as the above attachement still fixes the problem. Text fields in MySQL are apparently still case insensitive even in unicode.

comment:6 in reply to: ↑ 5 Changed 8 years ago by cboos

Replying to jackie.m@iddl.vt.edu:

CREATE DATABASE trac DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; and to avoid #3673 changed limit to 333, still get the same error.

Patching as the above attachement still fixes the problem. Text fields in MySQL are apparently still case insensitive even in unicode.

And with COLLATE utf8_general_ci instead of COLLATE utf8_bin?

comment:7 Changed 8 years ago by cboos

  • Component changed from general to version control
  • Keywords needinfo removed
  • Milestone changed from 0.10.5 to 0.12
  • Owner changed from jonas to cboos

The new repository cache scheduled for milestone:0.12 will fix this.

Of course, if someone has a better idea in the meantime for 0.11, feel free to submit it.

comment:8 Changed 8 years ago by cboos

  • Milestone 0.12 deleted
  • Resolution set to duplicate
  • Status changed from new to closed

oops, I intended to make the above comment on #3676, see there.

comment:9 Changed 7 years ago by cboos

  • Milestone set to 0.10.5
  • Resolution duplicate deleted
  • Status changed from closed to reopened

As 0.11 is going to stay for a while, I'm going to apply the patch.

comment:10 Changed 7 years ago by arasm@…

  • Keywords 4378 patch added
  • Severity changed from normal to critical

Hi all:

After applying the patch, nothing has changed. I was wondering if anyone was able to figure out where the clashing keys are and a routine to just delete it?

comment:11 Changed 7 years ago by arasm@…

btw, the reason I classified this as "critical" is due to the fact that all web based functionality has stopped.

comment:12 Changed 7 years ago by cboos

  • Milestone changed from 0.10.5 to 0.12

The new VcCache needs to take this limitation into account.

comment:13 Changed 7 years ago by jonas

Hmm, are you guys sure this patch really works, even with non-ascii text?

Even though I hate mysql as much as the next guy I took a quick look at this and this is how I understand the situation:

  1. Trac pretty much expect the database to be case insensitive and all other supported backends except mysql always are.
  1. There are two different ways to get a mysql database to be case insensitive:

2.1. Make the entire database case insensitive by specifying the collation at database creation time:

CREATE DATABASE foo CHARACTER SET utf8 COLLATION utf8_bin;

2.2. Specify the collation at the column level:

CREATE TABLE foo (x VARCHAR(10) BINARY);

Using either approach the end result is the same. MySQL will "tag" all text values returned from the database as having the collation "BINARY". This in turn triggers a python-mysqldb bug where python-mysqldb thinks all text is binary blobs and is returned as str objects instead of unicode.

So until that python-mysqldb bug is fixed I can't see any way to configure mysql to be case sensitive and at the same time get pyhon-mysqldb to return proper unicode strings. The only solution I can think of right now is that we take care of the utf8→unicode decoding ourself in mysql_backend.py.

Changed 7 years ago by jonas

Implements str→conversion in mysql_backend.py instead of relying on python-mysql

comment:14 Changed 7 years ago by jonas

Attached a sample patch that implements the str→unicode conversion in mysql_backend.py instead of relying on the currently broken (bug) implementation in the python-mysql driver.

With this patch Trac seems to work correctly with non-ascii text and a case sensitive database (CREATE DATABASE name CHARACTER SET utf8 COLLATE utf8_bin).

This change actually simplifies mysql_backend.py a bit but also makes the code more dependent on python-mysql internal structures (The Cursor class).

comment:15 Changed 6 years ago by jonas

  • Owner changed from cboos to jonas
  • Status changed from reopened to new

comment:16 Changed 6 years ago by jonas

  • Milestone changed from 0.12 to 0.11.1
  • Status changed from new to assigned

OK, I've now done some further testing and it seems to work ok so I've committed it to trunk (r7222).

I'll leave this ticket open for now since I hope to be able to include these changes in 0.11.1 unless some problems with this approach is found.

comment:17 Changed 6 years ago by cboos

  • Milestone changed from 0.11.2 to 0.11.1

comment:18 Changed 6 years ago by jonas

  • Resolution set to fixed
  • Status changed from assigned to closed

Fixed in r7286 for 0.11-stable and r7222 for trunk.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain jonas.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from jonas to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.