Edgewall Software
Modify

Ticket #6466 (closed defect: fixed)

Opened 4 years ago

Last modified 2 years ago

Ticket.save_changes() timestamp precision is too low...

Reported by: tumma72@… Owned by: rblank
Priority: normal Milestone: 0.12
Component: ticket system Version: 0.10-stable
Severity: normal Keywords: needmajor
Cc: felix.schwarz@…, direvus@…
Release Notes:
API Changes:

Description

When updating a Ticket, calling the save_changes method, the changed fields are stored into the table ticket_change, which has a key formed by (ticket, time, field). Consider the scenario where using a ITicketChangeListener you want to manipulate and change one of the just changed fields, and call again on the ticket the method save_changes...

Most likely it happen within the same second, and you get the error:
ERROR: duplicate key violates unique constraint "ticket_change_pk"

It would be better to use datetime.now() that has also microseconds has key...

Attachments

6466-float-timestamps-r8757.patch (24.2 KB) - added by rblank 2 years ago.
Convert all timestamps in the database from int to real.
6466-float-timestamps-2-r8770.patch (34.0 KB) - added by rblank 2 years ago.
Improved patch, SQLite and PostgreSQL work.
6466-microsecond-times-r8770.patch (61.3 KB) - added by rblank 2 years ago.
Represent timestamps as microseconds since the epoch.
6466-alter-table-r8222.patch (70.1 KB) - added by rblank 2 years ago.
Use ALTER TABLE for the upgrade.
6466-microsecond-times-alter-table-r8829.patch (71.6 KB) - added by rblank 2 years ago.
Complete patch using ALTER TABLE and tested with all DB backends.
6466-microsecond-times-r9047.patch (73.3 KB) - added by rblank 2 years ago.
Updated patch for current trunk.
6466-microsecond-times-9198.patch (79.1 KB) - added by rblank 2 years ago.
Updated patch for current trunk (post multirepos and transaction merges).

Download all attachments as: .zip

Change History

comment:1 Changed 4 years ago by anonymous

It would be better to use datetime.now() that has also microseconds as key...

Sorry :-(

comment:2 follow-up: Changed 4 years ago by ThurnerRupert

this means this shows up in timeline, and humans should read that? i am not sure if this would be a very user friendly way to go ... maybe there is an underlying design problem which requires so many saves?

comment:3 in reply to: ↑ 2 Changed 4 years ago by anonymous

Replying to ThurnerRupert:

this means this shows up in timeline, and humans should read that? i am not sure if this would be a very user friendly way to go ... maybe there is an underlying design problem which requires so many saves?

May be it is the case, I hope too, consider to following... we have a custom field named Resources, where we want to put a comma "," separated list of resources which will work on the ticket. If the owner of a ticket is not set, but the resources are set, the first resource in the list is promoted to Owner of the ticket.

So far what we implemented is a ITicketChangeListener, that on the change event is getting the resources field and checking if the owner field is empty or not, and eventually update it... there will be 2 consequent save of the same ticket, and this applies to any other property you would like to rework on the commit of the ticket... got it?

The fact that a primary key constraint is enforced and that its granularity is at 1 sec. time, it is in this case very hard... it is most likely that you would have to change a field base on its content before saving the change... is there anyway to get the Ticket values "Before Change" and commit them later on?

In Timeline view should not be a problem to format a datatime object in a confortable way... or I am missing the point?

Thanks :-)

comment:4 Changed 4 years ago by osimons

  • Milestone set to 0.12

I think actually what you want is the current ITicketManipulator interface to actually start using one of the methods defined in the API:

    def prepare_ticket(req, ticket, fields, actions):
        """Not currently called, but should be provided for future
        compatibility."""

The idea is that your plugin gets the ticket data and can modify it before it goes into the first save.

I'll put it for 0.12 to keep it on the radar - not making any promises.

comment:5 Changed 3 years ago by Felix Schwarz <felix.schwarz@…>

  • Cc felix.schwarz@… added

comment:6 Changed 3 years ago by smrobinson62@…

We ran into similar problems trying to use BatchModifyPlugin? to update multiple blocking tickets. The plugin ends up updating the same "blocked by" field in a master ticket more than once in a second, resulting in the constraint violation. We used a trigger in PostgreSQL to delete existing ticket_change rows with the same PK as a new row being inserted before allowing it to insert. See http://trac-hacks.org/ticket/5505.

comment:7 follow-up: Changed 2 years ago by direvus@…

I just ran into this error using Git with Trac. I am using the git post-receive hook on my repo to call trac-post-commit-hook whenever someone pushes commits to the repo.

This fails when I push more than one commit at once to the repo which reference the same ticket. The git hook fires off one call to trac-post-commit-hook for each commit in the push, and the hook dies on the second one because the ticket change is occurring in the same second as the first.

The problem seems to be that the primary key is incorrectly defined. (ticket, time, field) doesn't uniquely identify a record in the ticket_change table. As we've just seen, it's entirely possible for two ticket_change records to have the same ticket and field, and occur in the same second of time, and yet be two totally different changes.

Therefore the PK is wrong.

The fix, if you really want to keep a natural key, would be to define the primary key as across all six columns (ticket, time, author, field, oldvalue, newvalue). Or you could simply add a synthetic key to the table instead.

comment:8 Changed 2 years ago by direvus@…

  • Cc direvus@… added

comment:9 in reply to: ↑ 7 Changed 2 years ago by rblank

Replying to direvus@…:

The fix, if you really want to keep a natural key, would be to define the primary key as across all six columns (ticket, time, author, field, oldvalue, newvalue). Or you could simply add a synthetic key to the table instead.

Or we could change the column type of the timestamp from integer to floating-point.

comment:10 follow-up: Changed 2 years ago by cboos

Good timing for the suggestion, as I imagine Remy is currently pulling his hairs seeing the weird grouping of ticket changes by date, author (maybe), permanent flag (?) ;-)

This, along with the work currently being done for #454, the recent discussion we had in #8592, and the long time #1890 issue, makes me think that maybe it's time to do an upgrade of the model?

Introducing a changeid surrogate key for grouping changes done together would be one step. Dissociating the property changes themselves from the metadata for the change would be the next (#1890).

The particular example discussed in comment:173:ticket:454, could be implemented more easily with such a model.

For example, we would have an initial ticket change:

ticket_change

id changeid name value
343 1001 priority high
343 1001 version 0.11.6

ticket_meta

id changeid name value
343 1001 author joe
343 1001 date 2131321321
343 1001 comment I did it
343 1001 authenticated 1

Later, someone else edits the original comment.

This operation in itself is a change of the original change, not of the ticket.
Therefore we don't have a new ticket change, we just have some new metadata, that has to be associated with the ticket change (supersede kind of relation).
We can reuse the idea of Remy in #454 of a _comment{n} virtual field:

ticket_change

id changeid name value
343 1001 priority high
343 1001 version 0.11.6
343 1001 _comment1 1002

ticket_meta

id changeid name value
343 1002 author me
343 1002 date 2131321321
343 1002 comment joe did it
343 1002 authenticated 1

This model would presumably make the implementation of the edit history of a comment a bit easier. Correction: this would presumably make the implementation of the whole ticket internals a lot easier.

comment:11 in reply to: ↑ 10 ; follow-ups: Changed 2 years ago by rblank

Replying to cboos:

Good timing for the suggestion, as I imagine Remy is currently pulling his hairs seeing the weird grouping of ticket changes by date, author (maybe), permanent flag (?) ;-)

Exactly! Well, not exactly pulling my hair, I think I have a simple solution to that.

This model would presumably make the implementation of the edit history of a comment a bit easier. Correction: this would presumably make the implementation of the whole ticket internals a lot easier.

But won't it make all ticket operations much slower as well, as we're going to have to join the ticket_meta table several times for author, date, comment, etc?

Also, with the _comment{n} idea, retrieving the latest version of the ticket comment (for e.g. a query) becomes more difficult, as you have to search for the last _comment{n} for a given changeid, something you can't do in a single SQL query.

Maybe something like this would be easier?

ticket_change

idchangeidnamevalue
3431001priorityhigh
3431001version0.11.6
3431001meta124

ticket_meta

idnamevalue
123authorjoe
123date2131321321
123commentI did it
123authenticated1
124authorme
124date2131321788
124commentjoe did it
124authenticated1
124previous123

...and how about creating a new wiki page to discuss the new scheme for the ticket tables? At least I would have been able to copy-paste your tables ;-)

OTOH, this starts looking suspiciously similar to tracobject, doesn't it? The ticket_meta table structure could be used for wiki pages, attachments, actually everything that is versioned.

But still: what's wrong with using floating-point timestamps?

comment:12 in reply to: ↑ 11 Changed 2 years ago by direvus@…

Replying to rblank:

But still: what's wrong with using floating-point timestamps?

Nothing at all =)

It doesn't make the design problem go away (the PK would still be incorrect), but it does make conflicts less likely.

comment:13 in reply to: ↑ 11 Changed 2 years ago by cboos

  • Milestone changed from 0.13 to 0.12

Replying to rblank:

But won't it make all ticket operations much slower as well, as we're going to have to join the ticket_meta table several times for author, date, comment, etc?

No, in the general case we would always get the changes wholesale, e.g.

SELECT changeid, name, value FROM ticket_change WHERE id=?
ORDER BY changeid

Similarly for ticket_meta for the original changes:

SELECT changeid, name, value FROM ticket_meta 
WHERE changeid IN (SELECT DISTINCT changeid FROM ticket_change WHERE id=?)
ORDER BY changeid

For specialized uses, we could could imagine querying a specific name
but I don't think we would ever need to have JOINs.

Also, with the _comment{n} idea, retrieving the latest version of the ticket comment (for e.g. a query) becomes more difficult, as you have to search for the last _comment{n} for a given changeid, something you can't do in a single SQL query.

Well, with the approach above, we would have all the _comment* values in memory, so extracting the corresponding numbers and pick the max would be easy. Then we issue one extra query to get the ticket_meta values for changeid = max.

But there's a better approach. We actually don't need to use '_comment{n}' as a name here, as the value itself is the changeid. So we could simply use the single '_meta' name, multi-valued fields being allowed.

This would make it possible to get the last edit metadata for all changes in one query:

SELECT t.changeid AS original, m.changeid AS lastedit, name, value 
FROM ticket_meta AS m,
     (SELECT changeid, MAX(value) AS lastedit FROM ticket_change
      WHERE id=? AND name='_meta' GROUP BY changeid) AS t
WHERE m.changeid = t.lastedit
ORDER BY m.changeid

(something like that, untested)

This also means that for the history of edits for a specific change, we could get all the relevant ticket_meta information at once:

SELECT changeid, name, value FROM ticket_meta
WHERE changeid IN (SELECT value FROM ticket_change 
                   WHERE id=? AND changeid=? AND name='_meta')
ORDER BY changeid


Maybe something like this would be easier?
(snip)

If you link revisions like that, you have no choice but to issue one query after the other until you find the original one. This will be less efficient.

...and how about creating a new wiki page to discuss the new scheme for the ticket tables? At least I would have been able to copy-paste your tables ;-)

Yes but...

OTOH, this starts looking suspiciously similar to tracobject, doesn't it? The ticket_meta table structure could be used for wiki pages, attachments, actually everything that is versioned.

... right, that's pretty close ;-)

I'll update the GenericTrac page with the ideas developed here.

But still: what's wrong with using floating-point timestamps?

Nothing a priori, except we currently don't use floating point columns in any of the tables, so we might come across new issues. Doing so would be OK for 0.12, so we can adapt the milestone and refocus this ticket to that solution.

I still think that in the longer term we really ought to do something along the lines I suggested above.

comment:14 Changed 2 years ago by rblank

  • Owner changed from jonas to rblank

I'd like to try the floating-point timestamps.

Changed 2 years ago by rblank

Convert all timestamps in the database from int to real.

comment:15 Changed 2 years ago by rblank

The patch above changes all timestamps in the database from int to float, and to_timestamp() returns a float as well.

All unit and functional tests pass on all database backends. The patch includes a database upgrade from 22 to 23, which has been tested on SQLite only (so far).

At this point, it seems that using float values for storing timestamps doesn't bring any special issues. Testing and comments welcome, as usual ;-)

(If we decide to apply this patch, I'll wait until after the multirepos merge, and move the DB upgrade to version 25.)

comment:16 Changed 2 years ago by rblank

Scratch that, there's an issue with PostgreSQL, which seems to only keep 3 decimals. Very strange...

comment:17 Changed 2 years ago by rblank

Weird. It seems that all real values sent by Trac are truncated to 12 digits instead of the 15 for doubles. Inserting the values by hand doesn't exhibit the same problem, so this could be an issue with psycopg2.

comment:18 follow-up: Changed 2 years ago by rblank

Ok, got it: it's a bug in psycopg2. The library uses str() to convert floats into strings, when it should rather use repr():

>>> v = 1234567890.12345678
>>> repr(v)
'1234567890.1234567'
>>> str(v)
'1234567890.12'

The following patch fixes the issue (patch against psycopg-2.0.12), and all unit tests pass afterwards:

  • psycopg/adapter_pfloat.c

    old new  
    4444    else if (isinf(n)) 
    4545        return PyString_FromString("'Infinity'::float"); 
    4646    else 
    47         return PyObject_Str(self->wrapped); 
     47        return PyObject_Repr(self->wrapped); 
    4848} 
    4949 
    5050static PyObject * 

Let's see if I can work around that in Trac itself...

comment:19 Changed 2 years ago by rblank

Yay, registering a custom adapter for floats also seems to fix the issue:

register_adapter(float, lambda value: AsIs(repr(value)))

Changed 2 years ago by rblank

Improved patch, SQLite and PostgreSQL work.

comment:20 follow-up: Changed 2 years ago by rblank

This is an improved patch that passes all tests on SQLite and PostgreSQL. Unfortunately, now MySQL fails the "float round-trip" test with the database. Looks like this is tricky after all.

Another solution would be to keep int columns and use milli- (or micro-) second timestamps instead. Thoughts?

comment:21 in reply to: ↑ 20 ; follow-up: Changed 2 years ago by cboos

Replying to rblank:

... Looks like this is tricky after all.

Another solution would be to keep int columns and use milli- (or micro-) second timestamps instead. Thoughts?

What about yes another solution, use a fixed-length text field and we handle the conversion from float ourselves? (something like "%015.4f", 0 padding for making sure the sorting is correct).

comment:22 Changed 2 years ago by cboos

s/yes/yet ;-) When do we get ticket edit on t.e.o? ;-)

comment:23 in reply to: ↑ 21 Changed 2 years ago by rblank

Replying to cboos:

What about yet another solution, use a fixed-length text field and we handle the conversion from float ourselves?

That would probably work, too. Or use the various (exact) decimal types, but SQLite doesn't have one, so that's actually a no-go. I have the feeling that numeric values (integer or float) would have better performance, but I don't have any data to back that up. With 64 bit integers, we should have enough for microsecond timestamps.

If you don't have a strong preference for either, I'll implement a short test program and compare the performance of both approaches.

When do we get ticket edit on t.e.o? ;-)

That, and the recent ticket entry improvements... Speaking about improvements, how's multirepos doing? ;-)

comment:24 in reply to: ↑ 18 Changed 2 years ago by rblank

Replying to rblank:

Ok, got it: it's a bug in psycopg2.

For reference, the patch has been applied to psycopg and the issue should therefore be fixed in a future 2.0.14.

Changed 2 years ago by rblank

Represent timestamps as microseconds since the epoch.

comment:25 follow-up: Changed 2 years ago by rblank

The patch above implements timestamps as microseconds since the epoch, stored as 64-bit integers. All unit and functional tests pass on all DB backends (yay!), and upgrading an environment has been tested on SQLite.

I thinks that's the most sensible approach to store datetime objects without loss. Testing (especially DB upgrades) most welcome.

Thoughts?

comment:26 in reply to: ↑ 25 Changed 2 years ago by andrea Tomasini <andrea.tomasini@…>

Replying to rblank:

The patch above implements timestamps as microseconds since the epoch, stored as 64-bit integers. All unit and functional tests pass on all DB backends (yay!), and upgrading an environment has been tested on SQLite.

I thinks that's the most sensible approach to store datetime objects without loss. Testing (especially DB upgrades) most welcome.

Hi Remi,

I think this is the best way we can do that. Also I like more the idea of having it as an integer, rather than a string, and timestamp conversion looks like the natural Trac way of storing that information, so go for it :-)

Thoughts?

I like this :-) I was the one submitting the issue 2 years ago :-(

comment:27 Changed 2 years ago by cboos

Patch looks good, I'll test it this evening.

Also, the custom SQL reports involving time fields should be easier to convert from using seconds to microseconds than if string conversions were needed.

comment:28 Changed 2 years ago by rblank

One comment: I have used the type tag 'time' in the database schema to indicate columns that contain times. I am considering changing this to 'int64' to avoid issues during upgrades if we change the representation of times in the future. Indeed, if at some point we changed times to e.g. floating-point or decimal, this would invalidate the upgrade scripts that use 'time'. So we may as well use the concrete description of the type, which is a 64-bit integer, hence 'int64'.

comment:29 Changed 2 years ago by rblank

Also, I remember reading something about a sequence counter being reset when modifying tables with autoincrement columns on PostgreSQL during an upgrade, probably on the Bitten mailing list. I must find that thread again, and test for that specific case.

comment:30 follow-up: Changed 2 years ago by cboos

  • Keywords needmajor added

Maybe we can add an utility method convert_column(table, col, type), backend specific.

In PostgreSQL (ref):

ALTER TABLE table ALTER COLUMN anycol TYPE anytype;

In MySQL (ref):

ALTER TABLE t1 MODIFY col1 BIGINT;

For SQLite, I'm not sure we need to do anything, but if so, we could use the usual way with the temporary table.

comment:31 in reply to: ↑ 30 Changed 2 years ago by rblank

Replying to cboos:

Maybe we can add an utility method convert_column(table, col, type), backend specific.

That's an excellent idea, provided the data in the changed columns remains intact.

For SQLite, I'm not sure we need to do anything, but if so, we could use the usual way with the temporary table.

The standard integer type can already hold 64 bits in SQLite, so there's nothing to do in this particular case. For a generic convert_column(), I don't know.

Changed 2 years ago by rblank

Use ALTER TABLE for the upgrade.

comment:32 Changed 2 years ago by rblank

The patch above uses ALTER TABLE for upgrading the database. Unfortunately, trying to upgrade a PostgreSQL database yields a:

DataError: integer out of range

and the column type is not altered. I haven't found anything in the documentation, but does ALTER TABLE require a COMMIT for the column type to be changed, before new values can be written?

comment:33 Changed 2 years ago by rblank

Ah, I have found the error: time values must be cast to bigint instead of int on PostgreSQL and MySQL, during the upgrade (when fixing ticket comment edit times) and in ticket queries.

With this fix, I have been able to upgrade an environment running with PostgreSQL, so I should have a complete patch ready tomorrow.

Changed 2 years ago by rblank

Complete patch using ALTER TABLE and tested with all DB backends.

comment:34 Changed 2 years ago by rblank

The patch above is complete and has been tested with all DB backends, both for environment creation and upgrading. It uses ALTER TABLE on PostgreSQL and MySQL, and therefore avoids the issue with sequences.

Feedback appreciated, especially for the parts in trac.db.

(Now going back to waiting impatiently for the multirepos merge, so that I can update and apply this :-)

Changed 2 years ago by rblank

Updated patch for current trunk.

Changed 2 years ago by rblank

Updated patch for current trunk (post multirepos and transaction merges).

comment:35 Changed 2 years ago by rblank

The patch above is an update after the multirepos and transaction refactoring merges. It should be ready to apply, but some more testing (especially upgrades with MySQL and PostgreSQL) would be welcome.

comment:36 follow-up: Changed 2 years ago by cboos

Patch looks great, I've tested the upgrade here on t.e.o (demo-0.12), everything seems to work fine.

In the UTimestampTestCase.test_sub_second, I would have added:

self.assertEqual(981173106123456L, ts)

Otherwise you just test the round trip which could also succeed if there's no change ;-)
Or is there some fragility to expect in this conversion?

comment:37 in reply to: ↑ 36 Changed 2 years ago by rblank

Replying to cboos:

Otherwise you just test the round trip which could also succeed if there's no change ;-)
Or is there some fragility to expect in this conversion?

The loss-less round-trip is indeed what I was interested in, that is, that we don't loose the microsecond part. But it also makes sense to test for the actual representation, so I'll add the check you suggest.

comment:38 Changed 2 years ago by rblank

  • Resolution set to fixed
  • Status changed from new to closed

Patch applied in [9210] (including the suggestion in comment:36), and added missing files in [9211].

comment:39 Changed 2 years ago by cboos

  • Resolution fixed deleted
  • Status changed from closed to reopened

I've seen that you added an elif in PostgreSQLConnector.to_sql, I think it's wrong:

Note: Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a serial column to be in a unique constraint or a primary key, it must now be specified, same as with any other data type.
http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-SERIAL

As it stands, the ticket.id is no longer a primary key.

comment:40 Changed 2 years ago by rblank

  • Resolution set to fixed
  • Status changed from reopened to closed

Good catch, thanks. I can't remember why I changed that. Fixed in [9213].

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
to The owner will be changed from rblank. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.