Opened 17 years ago
Closed 15 years ago
#6466 closed defect (fixed)
Ticket.save_changes() timestamp precision is too low...
Reported by: | Owned by: | Remy Blank | |
---|---|---|---|
Priority: | normal | Milestone: | 0.12 |
Component: | ticket system | Version: | 0.10-stable |
Severity: | normal | Keywords: | needmajor |
Cc: | felix.schwarz@…, direvus@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
When updating a Ticket, calling the save_changes method, the changed fields are stored into the table ticket_change, which has a key formed by (ticket, time, field). Consider the scenario where using a ITicketChangeListener you want to manipulate and change one of the just changed fields, and call again on the ticket the method save_changes…
Most likely it happen within the same second, and you get the error:
ERROR: duplicate key violates unique constraint "ticket_change_pk"
It would be better to use datetime.now() that has also microseconds has key…
Attachments (7)
Change History (47)
comment:1 by , 17 years ago
follow-up: 3 comment:2 by , 17 years ago
this means this shows up in timeline, and humans should read that? i am not sure if this would be a very user friendly way to go … maybe there is an underlying design problem which requires so many saves?
comment:3 by , 17 years ago
Replying to ThurnerRupert:
this means this shows up in timeline, and humans should read that? i am not sure if this would be a very user friendly way to go … maybe there is an underlying design problem which requires so many saves?
May be it is the case, I hope too, consider to following… we have a custom field named Resources, where we want to put a comma "," separated list of resources which will work on the ticket. If the owner of a ticket is not set, but the resources are set, the first resource in the list is promoted to Owner of the ticket.
So far what we implemented is a ITicketChangeListener, that on the change event is getting the resources field and checking if the owner field is empty or not, and eventually update it… there will be 2 consequent save of the same ticket, and this applies to any other property you would like to rework on the commit of the ticket… got it?
The fact that a primary key constraint is enforced and that its granularity is at 1 sec. time, it is in this case very hard… it is most likely that you would have to change a field base on its content before saving the change… is there anyway to get the Ticket values "Before Change" and commit them later on?
In Timeline view should not be a problem to format a datatime object in a confortable way… or I am missing the point?
Thanks :-)
comment:4 by , 17 years ago
Milestone: | → 0.12 |
---|
I think actually what you want is the current ITicketManipulator
interface to actually start using one of the methods defined in the API:
def prepare_ticket(req, ticket, fields, actions): """Not currently called, but should be provided for future compatibility."""
The idea is that your plugin gets the ticket data and can modify it before it goes into the first save.
I'll put it for 0.12 to keep it on the radar - not making any promises.
comment:5 by , 15 years ago
Cc: | added |
---|
comment:6 by , 15 years ago
We ran into similar problems trying to use BatchModifyPlugin to update multiple blocking tickets. The plugin ends up updating the same "blocked by" field in a master ticket more than once in a second, resulting in the constraint violation. We used a trigger in PostgreSQL to delete existing ticket_change rows with the same PK as a new row being inserted before allowing it to insert. See http://trac-hacks.org/ticket/5505.
follow-up: 9 comment:7 by , 15 years ago
I just ran into this error using Git with Trac. I am using the git post-receive hook on my repo to call trac-post-commit-hook whenever someone pushes commits to the repo.
This fails when I push more than one commit at once to the repo which reference the same ticket. The git hook fires off one call to trac-post-commit-hook for each commit in the push, and the hook dies on the second one because the ticket change is occurring in the same second as the first.
The problem seems to be that the primary key is incorrectly defined. (ticket, time, field) doesn't uniquely identify a record in the ticket_change table. As we've just seen, it's entirely possible for two ticket_change records to have the same ticket and field, and occur in the same second of time, and yet be two totally different changes.
Therefore the PK is wrong.
The fix, if you really want to keep a natural key, would be to define the primary key as across all six columns (ticket, time, author, field, oldvalue, newvalue). Or you could simply add a synthetic key to the table instead.
comment:8 by , 15 years ago
Cc: | added |
---|
comment:9 by , 15 years ago
Replying to direvus@…:
The fix, if you really want to keep a natural key, would be to define the primary key as across all six columns (ticket, time, author, field, oldvalue, newvalue). Or you could simply add a synthetic key to the table instead.
Or we could change the column type of the timestamp from integer to floating-point.
follow-up: 11 comment:10 by , 15 years ago
Good timing for the suggestion, as I imagine Remy is currently pulling his hairs seeing the weird grouping of ticket changes by date, author (maybe), permanent flag (?) ;-)
This, along with the work currently being done for #454, the recent discussion we had in #8592, and the long time #1890 issue, makes me think that maybe it's time to do an upgrade of the model?
Introducing a changeid
surrogate key for grouping changes done together would be one step. Dissociating the property changes themselves from the metadata for the change would be the next (#1890).
The particular example discussed in comment:173:ticket:454, could be implemented more easily with such a model.
For example, we would have an initial ticket change:
ticket_change
id | changeid | name | value |
343 | 1001 | priority | high |
343 | 1001 | version | 0.11.6 |
ticket_meta
id | changeid | name | value |
343 | 1001 | author | joe |
343 | 1001 | date | 2131321321 |
343 | 1001 | comment | I did it |
343 | 1001 | authenticated | 1 |
Later, someone else edits the original comment.
This operation in itself is a change of the original change, not of the ticket.
Therefore we don't have a new ticket change, we just have some new metadata, that has to be associated with the ticket change (supersede kind of relation).
We can reuse the idea of Remy in #454 of a _comment{n}
virtual field:
ticket_change
id | changeid | name | value |
343 | 1001 | priority | high |
343 | 1001 | version | 0.11.6 |
343 | 1001 | _comment1 | 1002 |
ticket_meta
id | changeid | name | value |
343 | 1002 | author | me |
343 | 1002 | date | 2131321321 |
343 | 1002 | comment | joe did it |
343 | 1002 | authenticated | 1 |
This model would presumably make the implementation of the edit history of a comment a bit easier. Correction: this would presumably make the implementation of the whole ticket internals a lot easier.
follow-ups: 12 13 comment:11 by , 15 years ago
Replying to cboos:
Good timing for the suggestion, as I imagine Remy is currently pulling his hairs seeing the weird grouping of ticket changes by date, author (maybe), permanent flag (?) ;-)
Exactly! Well, not exactly pulling my hair, I think I have a simple solution to that.
This model would presumably make the implementation of the edit history of a comment a bit easier. Correction: this would presumably make the implementation of the whole ticket internals a lot easier.
But won't it make all ticket operations much slower as well, as we're going to have to join the ticket_meta table several times for author, date, comment, etc?
Also, with the _comment{n}
idea, retrieving the latest version of the ticket comment (for e.g. a query) becomes more difficult, as you have to search for the last _comment{n}
for a given changeid, something you can't do in a single SQL query.
Maybe something like this would be easier?
ticket_change
id | changeid | name | value |
343 | 1001 | priority | high |
343 | 1001 | version | 0.11.6 |
343 | 1001 | meta | 124 |
ticket_meta
id | name | value |
123 | author | joe |
123 | date | 2131321321 |
123 | comment | I did it |
123 | authenticated | 1 |
124 | author | me |
124 | date | 2131321788 |
124 | comment | joe did it |
124 | authenticated | 1 |
124 | previous | 123 |
…and how about creating a new wiki page to discuss the new scheme for the ticket tables? At least I would have been able to copy-paste your tables ;-)
OTOH, this starts looking suspiciously similar to tracobject, doesn't it? The ticket_meta
table structure could be used for wiki pages, attachments, actually everything that is versioned.
But still: what's wrong with using floating-point timestamps?
comment:12 by , 15 years ago
Replying to rblank:
But still: what's wrong with using floating-point timestamps?
Nothing at all =)
It doesn't make the design problem go away (the PK would still be incorrect), but it does make conflicts less likely.
comment:13 by , 15 years ago
Milestone: | 0.13 → 0.12 |
---|
Replying to rblank:
But won't it make all ticket operations much slower as well, as we're going to have to join the ticket_meta table several times for author, date, comment, etc?
No, in the general case we would always get the changes wholesale, e.g.
SELECT changeid, name, value FROM ticket_change WHERE id=? ORDER BY changeid
Similarly for ticket_meta
for the original changes:
SELECT changeid, name, value FROM ticket_meta WHERE changeid IN (SELECT DISTINCT changeid FROM ticket_change WHERE id=?) ORDER BY changeid
For specialized uses, we could could imagine querying a specific name
but I don't think we would ever need to have JOINs.
Also, with the
_comment{n}
idea, retrieving the latest version of the ticket comment (for e.g. a query) becomes more difficult, as you have to search for the last_comment{n}
for a given changeid, something you can't do in a single SQL query.
Well, with the approach above, we would have all the _comment* values in memory, so extracting the corresponding numbers and pick the max would be easy. Then we issue one extra query to get the ticket_meta
values for changeid = max
.
But there's a better approach. We actually don't need to use '_comment{n}' as a name
here, as the value
itself is the changeid
. So we could simply use the single '_meta' name, multi-valued fields being allowed.
This would make it possible to get the last edit metadata for all changes in one query:
SELECT t.changeid AS original, m.changeid AS lastedit, name, value FROM ticket_meta AS m, (SELECT changeid, MAX(value) AS lastedit FROM ticket_change WHERE id=? AND name='_meta' GROUP BY changeid) AS t WHERE m.changeid = t.lastedit ORDER BY m.changeid
(something like that, untested)
This also means that for the history of edits for a specific change, we could get all the relevant ticket_meta
information at once:
SELECT changeid, name, value FROM ticket_meta WHERE changeid IN (SELECT value FROM ticket_change WHERE id=? AND changeid=? AND name='_meta') ORDER BY changeid
Maybe something like this would be easier? (snip)
If you link revisions like that, you have no choice but to issue one query after the other until you find the original one. This will be less efficient.
…and how about creating a new wiki page to discuss the new scheme for the ticket tables? At least I would have been able to copy-paste your tables ;-)
Yes but…
OTOH, this starts looking suspiciously similar to tracobject, doesn't it? The
ticket_meta
table structure could be used for wiki pages, attachments, actually everything that is versioned.
… right, that's pretty close ;-)
I'll update the GenericTrac page with the ideas developed here.
But still: what's wrong with using floating-point timestamps?
Nothing a priori, except we currently don't use floating point columns in any of the tables, so we might come across new issues. Doing so would be OK for 0.12, so we can adapt the milestone and refocus this ticket to that solution.
I still think that in the longer term we really ought to do something along the lines I suggested above.
by , 15 years ago
Attachment: | 6466-float-timestamps-r8757.patch added |
---|
Convert all timestamps in the database from int
to real
.
comment:15 by , 15 years ago
The patch above changes all timestamps in the database from int
to float
, and to_timestamp()
returns a float
as well.
All unit and functional tests pass on all database backends. The patch includes a database upgrade from 22 to 23, which has been tested on SQLite only (so far).
At this point, it seems that using float
values for storing timestamps doesn't bring any special issues. Testing and comments welcome, as usual ;-)
(If we decide to apply this patch, I'll wait until after the multirepos merge, and move the DB upgrade to version 25.)
comment:16 by , 15 years ago
Scratch that, there's an issue with PostgreSQL, which seems to only keep 3 decimals. Very strange…
comment:17 by , 15 years ago
Weird. It seems that all real
values sent by Trac are truncated to 12 digits instead of the 15 for double
s. Inserting the values by hand doesn't exhibit the same problem, so this could be an issue with psycopg2.
follow-up: 24 comment:18 by , 15 years ago
Ok, got it: it's a bug in psycopg2. The library uses str()
to convert float
s into strings, when it should rather use repr()
:
>>> v = 1234567890.12345678 >>> repr(v) '1234567890.1234567' >>> str(v) '1234567890.12'
The following patch fixes the issue (patch against psycopg-2.0.12), and all unit tests pass afterwards:
-
psycopg/adapter_pfloat.c
old new 44 44 else if (isinf(n)) 45 45 return PyString_FromString("'Infinity'::float"); 46 46 else 47 return PyObject_ Str(self->wrapped);47 return PyObject_Repr(self->wrapped); 48 48 } 49 49 50 50 static PyObject *
Let's see if I can work around that in Trac itself…
comment:19 by , 15 years ago
Yay, registering a custom adapter for float
s also seems to fix the issue:
register_adapter(float, lambda value: AsIs(repr(value)))
by , 15 years ago
Attachment: | 6466-float-timestamps-2-r8770.patch added |
---|
Improved patch, SQLite and PostgreSQL work.
follow-up: 21 comment:20 by , 15 years ago
This is an improved patch that passes all tests on SQLite and PostgreSQL. Unfortunately, now MySQL fails the "float round-trip" test with the database. Looks like this is tricky after all.
Another solution would be to keep int
columns and use milli- (or micro-) second timestamps instead. Thoughts?
follow-up: 23 comment:21 by , 15 years ago
Replying to rblank:
… Looks like this is tricky after all.
Another solution would be to keep
int
columns and use milli- (or micro-) second timestamps instead. Thoughts?
What about yes another solution, use a fixed-length text field and we handle the conversion from float ourselves? (something like "%015.4f", 0 padding for making sure the sorting is correct).
comment:23 by , 15 years ago
Replying to cboos:
What about yet another solution, use a fixed-length text field and we handle the conversion from float ourselves?
That would probably work, too. Or use the various (exact) decimal types, but SQLite doesn't have one, so that's actually a no-go. I have the feeling that numeric values (integer or float) would have better performance, but I don't have any data to back that up. With 64 bit integers, we should have enough for microsecond timestamps.
If you don't have a strong preference for either, I'll implement a short test program and compare the performance of both approaches.
When do we get ticket edit on t.e.o? ;-)
That, and the recent ticket entry improvements… Speaking about improvements, how's multirepos doing? ;-)
comment:24 by , 15 years ago
by , 15 years ago
Attachment: | 6466-microsecond-times-r8770.patch added |
---|
Represent timestamps as microseconds since the epoch.
follow-up: 26 comment:25 by , 15 years ago
The patch above implements timestamps as microseconds since the epoch, stored as 64-bit integers. All unit and functional tests pass on all DB backends (yay!), and upgrading an environment has been tested on SQLite.
I thinks that's the most sensible approach to store datetime
objects without loss. Testing (especially DB upgrades) most welcome.
Thoughts?
comment:26 by , 15 years ago
Replying to rblank:
The patch above implements timestamps as microseconds since the epoch, stored as 64-bit integers. All unit and functional tests pass on all DB backends (yay!), and upgrading an environment has been tested on SQLite.
I thinks that's the most sensible approach to store
datetime
objects without loss. Testing (especially DB upgrades) most welcome.
Hi Remi,
I think this is the best way we can do that. Also I like more the idea of having it as an integer, rather than a string, and timestamp conversion looks like the natural Trac way of storing that information, so go for it :-)
Thoughts?
I like this :-) I was the one submitting the issue 2 years ago :-(
comment:27 by , 15 years ago
Patch looks good, I'll test it this evening.
Also, the custom SQL reports involving time fields should be easier to convert from using seconds to microseconds than if string conversions were needed.
comment:28 by , 15 years ago
One comment: I have used the type tag 'time'
in the database schema to indicate columns that contain times. I am considering changing this to 'int64'
to avoid issues during upgrades if we change the representation of times in the future. Indeed, if at some point we changed times to e.g. floating-point or decimal, this would invalidate the upgrade scripts that use 'time'
. So we may as well use the concrete description of the type, which is a 64-bit integer, hence 'int64'
.
comment:29 by , 15 years ago
Also, I remember reading something about a sequence counter being reset when modifying tables with autoincrement columns on PostgreSQL during an upgrade, probably on the Bitten mailing list. I must find that thread again, and test for that specific case.
follow-up: 31 comment:30 by , 15 years ago
Keywords: | needmajor added |
---|
Maybe we can add an utility method convert_column(table, col, type)
, backend specific.
In PostgreSQL (ref):
ALTER TABLE table ALTER COLUMN anycol TYPE anytype;
In MySQL (ref):
ALTER TABLE t1 MODIFY col1 BIGINT;
For SQLite, I'm not sure we need to do anything, but if so, we could use the usual way with the temporary table.
comment:31 by , 15 years ago
Replying to cboos:
Maybe we can add an utility method
convert_column(table, col, type)
, backend specific.
That's an excellent idea, provided the data in the changed columns remains intact.
For SQLite, I'm not sure we need to do anything, but if so, we could use the usual way with the temporary table.
The standard integer type can already hold 64 bits in SQLite, so there's nothing to do in this particular case. For a generic convert_column()
, I don't know.
comment:32 by , 15 years ago
The patch above uses ALTER TABLE
for upgrading the database. Unfortunately, trying to upgrade a PostgreSQL database yields a:
DataError: integer out of range
and the column type is not altered. I haven't found anything in the documentation, but does ALTER TABLE
require a COMMIT
for the column type to be changed, before new values can be written?
comment:33 by , 15 years ago
Ah, I have found the error: time values must be cast to bigint
instead of int
on PostgreSQL and MySQL, during the upgrade (when fixing ticket comment edit times) and in ticket queries.
With this fix, I have been able to upgrade an environment running with PostgreSQL, so I should have a complete patch ready tomorrow.
by , 15 years ago
Attachment: | 6466-microsecond-times-alter-table-r8829.patch added |
---|
Complete patch using ALTER TABLE
and tested with all DB backends.
comment:34 by , 15 years ago
The patch above is complete and has been tested with all DB backends, both for environment creation and upgrading. It uses ALTER TABLE
on PostgreSQL and MySQL, and therefore avoids the issue with sequences.
Feedback appreciated, especially for the parts in trac.db
.
(Now going back to waiting impatiently for the multirepos merge, so that I can update and apply this :-)
by , 15 years ago
Attachment: | 6466-microsecond-times-r9047.patch added |
---|
Updated patch for current trunk.
by , 15 years ago
Attachment: | 6466-microsecond-times-9198.patch added |
---|
Updated patch for current trunk (post multirepos and transaction merges).
comment:35 by , 15 years ago
The patch above is an update after the multirepos and transaction refactoring merges. It should be ready to apply, but some more testing (especially upgrades with MySQL and PostgreSQL) would be welcome.
follow-up: 37 comment:36 by , 15 years ago
Patch looks great, I've tested the upgrade here on t.e.o (demo-0.12), everything seems to work fine.
In the UTimestampTestCase.test_sub_second
, I would have added:
self.assertEqual(981173106123456L, ts)
Otherwise you just test the round trip which could also succeed if there's no change ;-) Or is there some fragility to expect in this conversion?
comment:37 by , 15 years ago
Replying to cboos:
Otherwise you just test the round trip which could also succeed if there's no change ;-) Or is there some fragility to expect in this conversion?
The loss-less round-trip is indeed what I was interested in, that is, that we don't loose the microsecond part. But it also makes sense to test for the actual representation, so I'll add the check you suggest.
comment:38 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Patch applied in [9210] (including the suggestion in comment:36), and added missing files in [9211].
comment:39 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I've seen that you added an elif
in PostgreSQLConnector.to_sql, I think it's wrong:
Note: Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a serial column to be in a unique constraint or a primary key, it must now be specified, same as with any other data type. http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-SERIAL
As it stands, the ticket.id
is no longer a primary key.
comment:40 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Good catch, thanks. I can't remember why I changed that. Fixed in [9213].
It would be better to use datetime.now() that has also microseconds as key…
Sorry :-(