Edgewall Software
Modify

Opened 11 years ago

Closed 9 years ago

#6466 closed defect (fixed)

Ticket.save_changes() timestamp precision is too low...

Reported by: tumma72@… Owned by: Remy Blank
Priority: normal Milestone: 0.12
Component: ticket system Version: 0.10-stable
Severity: normal Keywords: needmajor
Cc: felix.schwarz@…, direvus@… Branch:
Release Notes:
API Changes:

Description

When updating a Ticket, calling the save_changes method, the changed fields are stored into the table ticket_change, which has a key formed by (ticket, time, field). Consider the scenario where using a ITicketChangeListener you want to manipulate and change one of the just changed fields, and call again on the ticket the method save_changes…

Most likely it happen within the same second, and you get the error:
ERROR: duplicate key violates unique constraint "ticket_change_pk"
It would be better to use datetime.now() that has also microseconds has key…

Attachments (7)

6466-float-timestamps-r8757.patch (24.2 KB ) - added by Remy Blank 9 years ago.
Convert all timestamps in the database from int to real.
6466-float-timestamps-2-r8770.patch (34.0 KB ) - added by Remy Blank 9 years ago.
Improved patch, SQLite and PostgreSQL work.
6466-microsecond-times-r8770.patch (61.3 KB ) - added by Remy Blank 9 years ago.
Represent timestamps as microseconds since the epoch.
6466-alter-table-r8222.patch (70.1 KB ) - added by Remy Blank 9 years ago.
Use ALTER TABLE for the upgrade.
6466-microsecond-times-alter-table-r8829.patch (71.6 KB ) - added by Remy Blank 9 years ago.
Complete patch using ALTER TABLE and tested with all DB backends.
6466-microsecond-times-r9047.patch (73.3 KB ) - added by Remy Blank 9 years ago.
Updated patch for current trunk.
6466-microsecond-times-9198.patch (79.1 KB ) - added by Remy Blank 9 years ago.
Updated patch for current trunk (post multirepos and transaction merges).

Download all attachments as: .zip

Change History (47)

comment:1 by anonymous, 11 years ago

It would be better to use datetime.now() that has also microseconds as key…

Sorry :-(

comment:2 by ThurnerRupert, 11 years ago

this means this shows up in timeline, and humans should read that? i am not sure if this would be a very user friendly way to go … maybe there is an underlying design problem which requires so many saves?

in reply to:  2 comment:3 by anonymous, 11 years ago

Replying to ThurnerRupert:

this means this shows up in timeline, and humans should read that? i am not sure if this would be a very user friendly way to go … maybe there is an underlying design problem which requires so many saves?

May be it is the case, I hope too, consider to following… we have a custom field named Resources, where we want to put a comma "," separated list of resources which will work on the ticket. If the owner of a ticket is not set, but the resources are set, the first resource in the list is promoted to Owner of the ticket.

So far what we implemented is a ITicketChangeListener, that on the change event is getting the resources field and checking if the owner field is empty or not, and eventually update it… there will be 2 consequent save of the same ticket, and this applies to any other property you would like to rework on the commit of the ticket… got it?

The fact that a primary key constraint is enforced and that its granularity is at 1 sec. time, it is in this case very hard… it is most likely that you would have to change a field base on its content before saving the change… is there anyway to get the Ticket values "Before Change" and commit them later on?

In Timeline view should not be a problem to format a datatime object in a confortable way… or I am missing the point?

Thanks :-)

comment:4 by osimons, 11 years ago

Milestone: 0.12

I think actually what you want is the current ITicketManipulator interface to actually start using one of the methods defined in the API:

    def prepare_ticket(req, ticket, fields, actions):
        """Not currently called, but should be provided for future
        compatibility."""

The idea is that your plugin gets the ticket data and can modify it before it goes into the first save.

I'll put it for 0.12 to keep it on the radar - not making any promises.

comment:5 by Felix Schwarz <felix.schwarz@…>, 10 years ago

Cc: felix.schwarz@… added

comment:6 by smrobinson62@…, 10 years ago

We ran into similar problems trying to use BatchModifyPlugin to update multiple blocking tickets. The plugin ends up updating the same "blocked by" field in a master ticket more than once in a second, resulting in the constraint violation. We used a trigger in PostgreSQL to delete existing ticket_change rows with the same PK as a new row being inserted before allowing it to insert. See http://trac-hacks.org/ticket/5505.

comment:7 by direvus@…, 10 years ago

I just ran into this error using Git with Trac. I am using the git post-receive hook on my repo to call trac-post-commit-hook whenever someone pushes commits to the repo.

This fails when I push more than one commit at once to the repo which reference the same ticket. The git hook fires off one call to trac-post-commit-hook for each commit in the push, and the hook dies on the second one because the ticket change is occurring in the same second as the first.

The problem seems to be that the primary key is incorrectly defined. (ticket, time, field) doesn't uniquely identify a record in the ticket_change table. As we've just seen, it's entirely possible for two ticket_change records to have the same ticket and field, and occur in the same second of time, and yet be two totally different changes.

Therefore the PK is wrong.

The fix, if you really want to keep a natural key, would be to define the primary key as across all six columns (ticket, time, author, field, oldvalue, newvalue). Or you could simply add a synthetic key to the table instead.

comment:8 by direvus@…, 10 years ago

Cc: direvus@… added

in reply to:  7 comment:9 by Remy Blank, 10 years ago

Replying to direvus@…:

The fix, if you really want to keep a natural key, would be to define the primary key as across all six columns (ticket, time, author, field, oldvalue, newvalue). Or you could simply add a synthetic key to the table instead.

Or we could change the column type of the timestamp from integer to floating-point.

comment:10 by Christian Boos, 10 years ago

Good timing for the suggestion, as I imagine Remy is currently pulling his hairs seeing the weird grouping of ticket changes by date, author (maybe), permanent flag (?) ;-)

This, along with the work currently being done for #454, the recent discussion we had in #8592, and the long time #1890 issue, makes me think that maybe it's time to do an upgrade of the model?

Introducing a changeid surrogate key for grouping changes done together would be one step. Dissociating the property changes themselves from the metadata for the change would be the next (#1890).

The particular example discussed in comment:173:ticket:454, could be implemented more easily with such a model.

For example, we would have an initial ticket change:

ticket_change

id changeid name value
343 1001 priority high
343 1001 version 0.11.6

ticket_meta

id changeid name value
343 1001 author joe
343 1001 date 2131321321
343 1001 comment I did it
343 1001 authenticated 1

Later, someone else edits the original comment.

This operation in itself is a change of the original change, not of the ticket. Therefore we don't have a new ticket change, we just have some new metadata, that has to be associated with the ticket change (supersede kind of relation). We can reuse the idea of Remy in #454 of a _comment{n} virtual field:

ticket_change

id changeid name value
343 1001 priority high
343 1001 version 0.11.6
343 1001 _comment1 1002

ticket_meta

id changeid name value
343 1002 author me
343 1002 date 2131321321
343 1002 comment joe did it
343 1002 authenticated 1

This model would presumably make the implementation of the edit history of a comment a bit easier. Correction: this would presumably make the implementation of the whole ticket internals a lot easier.

in reply to:  10 ; comment:11 by Remy Blank, 10 years ago

Replying to cboos:

Good timing for the suggestion, as I imagine Remy is currently pulling his hairs seeing the weird grouping of ticket changes by date, author (maybe), permanent flag (?) ;-)

Exactly! Well, not exactly pulling my hair, I think I have a simple solution to that.

This model would presumably make the implementation of the edit history of a comment a bit easier. Correction: this would presumably make the implementation of the whole ticket internals a lot easier.

But won't it make all ticket operations much slower as well, as we're going to have to join the ticket_meta table several times for author, date, comment, etc?

Also, with the _comment{n} idea, retrieving the latest version of the ticket comment (for e.g. a query) becomes more difficult, as you have to search for the last _comment{n} for a given changeid, something you can't do in a single SQL query.

Maybe something like this would be easier?

ticket_change

idchangeidnamevalue
3431001priorityhigh
3431001version0.11.6
3431001meta124

ticket_meta

idnamevalue
123authorjoe
123date2131321321
123commentI did it
123authenticated1
124authorme
124date2131321788
124commentjoe did it
124authenticated1
124previous123

…and how about creating a new wiki page to discuss the new scheme for the ticket tables? At least I would have been able to copy-paste your tables ;-)

OTOH, this starts looking suspiciously similar to tracobject, doesn't it? The ticket_meta table structure could be used for wiki pages, attachments, actually everything that is versioned.

But still: what's wrong with using floating-point timestamps?

in reply to:  11 comment:12 by direvus@…, 10 years ago

Replying to rblank:

But still: what's wrong with using floating-point timestamps?

Nothing at all =)

It doesn't make the design problem go away (the PK would still be incorrect), but it does make conflicts less likely.

in reply to:  11 comment:13 by Christian Boos, 10 years ago

Milestone: 0.130.12

Replying to rblank:

But won't it make all ticket operations much slower as well, as we're going to have to join the ticket_meta table several times for author, date, comment, etc?

No, in the general case we would always get the changes wholesale, e.g.

SELECT changeid, name, value FROM ticket_change WHERE id=?
ORDER BY changeid

Similarly for ticket_meta for the original changes:

SELECT changeid, name, value FROM ticket_meta 
WHERE changeid IN (SELECT DISTINCT changeid FROM ticket_change WHERE id=?)
ORDER BY changeid

For specialized uses, we could could imagine querying a specific name but I don't think we would ever need to have JOINs.

Also, with the _comment{n} idea, retrieving the latest version of the ticket comment (for e.g. a query) becomes more difficult, as you have to search for the last _comment{n} for a given changeid, something you can't do in a single SQL query.

Well, with the approach above, we would have all the _comment* values in memory, so extracting the corresponding numbers and pick the max would be easy. Then we issue one extra query to get the ticket_meta values for changeid = max.

But there's a better approach. We actually don't need to use '_comment{n}' as a name here, as the value itself is the changeid. So we could simply use the single '_meta' name, multi-valued fields being allowed.

This would make it possible to get the last edit metadata for all changes in one query:

SELECT t.changeid AS original, m.changeid AS lastedit, name, value 
FROM ticket_meta AS m,
     (SELECT changeid, MAX(value) AS lastedit FROM ticket_change
      WHERE id=? AND name='_meta' GROUP BY changeid) AS t
WHERE m.changeid = t.lastedit
ORDER BY m.changeid

(something like that, untested)

This also means that for the history of edits for a specific change, we could get all the relevant ticket_meta information at once:

SELECT changeid, name, value FROM ticket_meta
WHERE changeid IN (SELECT value FROM ticket_change 
                   WHERE id=? AND changeid=? AND name='_meta')
ORDER BY changeid

Maybe something like this would be easier? (snip)

If you link revisions like that, you have no choice but to issue one query after the other until you find the original one. This will be less efficient.

…and how about creating a new wiki page to discuss the new scheme for the ticket tables? At least I would have been able to copy-paste your tables ;-)

Yes but…

OTOH, this starts looking suspiciously similar to tracobject, doesn't it? The ticket_meta table structure could be used for wiki pages, attachments, actually everything that is versioned.

… right, that's pretty close ;-)

I'll update the GenericTrac page with the ideas developed here.

But still: what's wrong with using floating-point timestamps?

Nothing a priori, except we currently don't use floating point columns in any of the tables, so we might come across new issues. Doing so would be OK for 0.12, so we can adapt the milestone and refocus this ticket to that solution.

I still think that in the longer term we really ought to do something along the lines I suggested above.

comment:14 by Remy Blank, 9 years ago

Owner: changed from Jonas Borgström to Remy Blank

I'd like to try the floating-point timestamps.

by Remy Blank, 9 years ago

Convert all timestamps in the database from int to real.

comment:15 by Remy Blank, 9 years ago

The patch above changes all timestamps in the database from int to float, and to_timestamp() returns a float as well.

All unit and functional tests pass on all database backends. The patch includes a database upgrade from 22 to 23, which has been tested on SQLite only (so far).

At this point, it seems that using float values for storing timestamps doesn't bring any special issues. Testing and comments welcome, as usual ;-)

(If we decide to apply this patch, I'll wait until after the multirepos merge, and move the DB upgrade to version 25.)

comment:16 by Remy Blank, 9 years ago

Scratch that, there's an issue with PostgreSQL, which seems to only keep 3 decimals. Very strange…

comment:17 by Remy Blank, 9 years ago

Weird. It seems that all real values sent by Trac are truncated to 12 digits instead of the 15 for doubles. Inserting the values by hand doesn't exhibit the same problem, so this could be an issue with psycopg2.

comment:18 by Remy Blank, 9 years ago

Ok, got it: it's a bug in psycopg2. The library uses str() to convert floats into strings, when it should rather use repr():

>>> v = 1234567890.12345678
>>> repr(v)
'1234567890.1234567'
>>> str(v)
'1234567890.12'

The following patch fixes the issue (patch against psycopg-2.0.12), and all unit tests pass afterwards:

  • psycopg/adapter_pfloat.c

    old new  
    4444    else if (isinf(n))
    4545        return PyString_FromString("'Infinity'::float");
    4646    else
    47         return PyObject_Str(self->wrapped);
     47        return PyObject_Repr(self->wrapped);
    4848}
    4949
    5050static PyObject *

Let's see if I can work around that in Trac itself…

comment:19 by Remy Blank, 9 years ago

Yay, registering a custom adapter for floats also seems to fix the issue:

register_adapter(float, lambda value: AsIs(repr(value)))

by Remy Blank, 9 years ago

Improved patch, SQLite and PostgreSQL work.

comment:20 by Remy Blank, 9 years ago

This is an improved patch that passes all tests on SQLite and PostgreSQL. Unfortunately, now MySQL fails the "float round-trip" test with the database. Looks like this is tricky after all.

Another solution would be to keep int columns and use milli- (or micro-) second timestamps instead. Thoughts?

in reply to:  20 ; comment:21 by Christian Boos, 9 years ago

Replying to rblank:

… Looks like this is tricky after all.

Another solution would be to keep int columns and use milli- (or micro-) second timestamps instead. Thoughts?

What about yes another solution, use a fixed-length text field and we handle the conversion from float ourselves? (something like "%015.4f", 0 padding for making sure the sorting is correct).

comment:22 by Christian Boos, 9 years ago

s/yes/yet ;-) When do we get ticket edit on t.e.o? ;-)

in reply to:  21 comment:23 by Remy Blank, 9 years ago

Replying to cboos:

What about yet another solution, use a fixed-length text field and we handle the conversion from float ourselves?

That would probably work, too. Or use the various (exact) decimal types, but SQLite doesn't have one, so that's actually a no-go. I have the feeling that numeric values (integer or float) would have better performance, but I don't have any data to back that up. With 64 bit integers, we should have enough for microsecond timestamps.

If you don't have a strong preference for either, I'll implement a short test program and compare the performance of both approaches.

When do we get ticket edit on t.e.o? ;-)

That, and the recent ticket entry improvements… Speaking about improvements, how's multirepos doing? ;-)

in reply to:  18 comment:24 by Remy Blank, 9 years ago

Replying to rblank:

Ok, got it: it's a bug in psycopg2.

For reference, the patch has been applied to psycopg and the issue should therefore be fixed in a future 2.0.14.

by Remy Blank, 9 years ago

Represent timestamps as microseconds since the epoch.

comment:25 by Remy Blank, 9 years ago

The patch above implements timestamps as microseconds since the epoch, stored as 64-bit integers. All unit and functional tests pass on all DB backends (yay!), and upgrading an environment has been tested on SQLite.

I thinks that's the most sensible approach to store datetime objects without loss. Testing (especially DB upgrades) most welcome.

Thoughts?

in reply to:  25 comment:26 by andrea Tomasini <andrea.tomasini@…>, 9 years ago

Replying to rblank:

The patch above implements timestamps as microseconds since the epoch, stored as 64-bit integers. All unit and functional tests pass on all DB backends (yay!), and upgrading an environment has been tested on SQLite.

I thinks that's the most sensible approach to store datetime objects without loss. Testing (especially DB upgrades) most welcome.

Hi Remi,

I think this is the best way we can do that. Also I like more the idea of having it as an integer, rather than a string, and timestamp conversion looks like the natural Trac way of storing that information, so go for it :-)

Thoughts?

I like this :-) I was the one submitting the issue 2 years ago :-(

comment:27 by Christian Boos, 9 years ago

Patch looks good, I'll test it this evening.

Also, the custom SQL reports involving time fields should be easier to convert from using seconds to microseconds than if string conversions were needed.

comment:28 by Remy Blank, 9 years ago

One comment: I have used the type tag 'time' in the database schema to indicate columns that contain times. I am considering changing this to 'int64' to avoid issues during upgrades if we change the representation of times in the future. Indeed, if at some point we changed times to e.g. floating-point or decimal, this would invalidate the upgrade scripts that use 'time'. So we may as well use the concrete description of the type, which is a 64-bit integer, hence 'int64'.

comment:29 by Remy Blank, 9 years ago

Also, I remember reading something about a sequence counter being reset when modifying tables with autoincrement columns on PostgreSQL during an upgrade, probably on the Bitten mailing list. I must find that thread again, and test for that specific case.

comment:30 by Christian Boos, 9 years ago

Keywords: needmajor added

Maybe we can add an utility method convert_column(table, col, type), backend specific.

In PostgreSQL (ref):

ALTER TABLE table ALTER COLUMN anycol TYPE anytype;

In MySQL (ref):

ALTER TABLE t1 MODIFY col1 BIGINT;

For SQLite, I'm not sure we need to do anything, but if so, we could use the usual way with the temporary table.

in reply to:  30 comment:31 by Remy Blank, 9 years ago

Replying to cboos:

Maybe we can add an utility method convert_column(table, col, type), backend specific.

That's an excellent idea, provided the data in the changed columns remains intact.

For SQLite, I'm not sure we need to do anything, but if so, we could use the usual way with the temporary table.

The standard integer type can already hold 64 bits in SQLite, so there's nothing to do in this particular case. For a generic convert_column(), I don't know.

by Remy Blank, 9 years ago

Use ALTER TABLE for the upgrade.

comment:32 by Remy Blank, 9 years ago

The patch above uses ALTER TABLE for upgrading the database. Unfortunately, trying to upgrade a PostgreSQL database yields a:

DataError: integer out of range

and the column type is not altered. I haven't found anything in the documentation, but does ALTER TABLE require a COMMIT for the column type to be changed, before new values can be written?

comment:33 by Remy Blank, 9 years ago

Ah, I have found the error: time values must be cast to bigint instead of int on PostgreSQL and MySQL, during the upgrade (when fixing ticket comment edit times) and in ticket queries.

With this fix, I have been able to upgrade an environment running with PostgreSQL, so I should have a complete patch ready tomorrow.

by Remy Blank, 9 years ago

Complete patch using ALTER TABLE and tested with all DB backends.

comment:34 by Remy Blank, 9 years ago

The patch above is complete and has been tested with all DB backends, both for environment creation and upgrading. It uses ALTER TABLE on PostgreSQL and MySQL, and therefore avoids the issue with sequences.

Feedback appreciated, especially for the parts in trac.db.

(Now going back to waiting impatiently for the multirepos merge, so that I can update and apply this :-)

by Remy Blank, 9 years ago

Updated patch for current trunk.

by Remy Blank, 9 years ago

Updated patch for current trunk (post multirepos and transaction merges).

comment:35 by Remy Blank, 9 years ago

The patch above is an update after the multirepos and transaction refactoring merges. It should be ready to apply, but some more testing (especially upgrades with MySQL and PostgreSQL) would be welcome.

comment:36 by Christian Boos, 9 years ago

Patch looks great, I've tested the upgrade here on t.e.o (demo-0.12), everything seems to work fine.

In the UTimestampTestCase.test_sub_second, I would have added:

self.assertEqual(981173106123456L, ts)

Otherwise you just test the round trip which could also succeed if there's no change ;-) Or is there some fragility to expect in this conversion?

in reply to:  36 comment:37 by Remy Blank, 9 years ago

Replying to cboos:

Otherwise you just test the round trip which could also succeed if there's no change ;-) Or is there some fragility to expect in this conversion?

The loss-less round-trip is indeed what I was interested in, that is, that we don't loose the microsecond part. But it also makes sense to test for the actual representation, so I'll add the check you suggest.

comment:38 by Remy Blank, 9 years ago

Resolution: fixed
Status: newclosed

Patch applied in [9210] (including the suggestion in comment:36), and added missing files in [9211].

comment:39 by Christian Boos, 9 years ago

Resolution: fixed
Status: closedreopened

I've seen that you added an elif in PostgreSQLConnector.to_sql, I think it's wrong:

Note: Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a serial column to be in a unique constraint or a primary key, it must now be specified, same as with any other data type. http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-SERIAL

As it stands, the ticket.id is no longer a primary key.

comment:40 by Remy Blank, 9 years ago

Resolution: fixed
Status: reopenedclosed

Good catch, thanks. I can't remember why I changed that. Fixed in [9213].

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Remy Blank.
The resolution will be deleted.
to The owner will be changed from Remy Blank to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.