#10530 closed defect (fixed)
Fix pagination for reports
Reported by: | Christian Boos | Owned by: | Christian Boos |
---|---|---|---|
Priority: | normal | Milestone: | 0.12.3 |
Component: | general | Version: | 0.11 |
Severity: | blocker | Keywords: | report query |
Cc: | osimons, peter@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
We just noticed that the way we implemented paginated reports (#216) simply makes some buggy assumptions about SQL subqueries.
In short, when one does:
SELECT * (SELECT ... FROM ... ORDER BY <criterion>) LIMIT x OFFSET y
there's absolutely no guarantee that the order defined by <criterion>
in the inner query will be kept in the final result…
See this discussion thread in trac-dev.
Pagination for custom queries is not affected by this problem.
Attachments (9)
Change History (30)
comment:1 by , 13 years ago
Cc: | added |
---|
by , 13 years ago
Attachment: | 10530-rework-report-pagination-r10914.patch added |
---|
follow-up: 4 comment:3 by , 13 years ago
Good going. Ordering seems to work, at least for the default reports and some other rather basic reports I had access to. Will be interesting to see how the strategy performs in the wild, though.
That said, no doubt any report can be changed to work even if this change makes it fail. Perhaps if the original query work (like when we get columns) but final rewritten query fails, we can raise a useful error message that points to the changes and asks for the report to be "tuned".
by , 13 years ago
Attachment: | 10530-nested-c-comments-r10917.patch added |
---|
sql_skeleton
now also strips nested C comments (applies on top of attachment:10530-rework-report-pagination-r10914.patch)
comment:4 by , 13 years ago
Replying to osimons:
if the original query work (like when we get columns) but final rewritten query fails, we can raise a useful error message that points to the changes and asks for the report to be "tuned".
Sure, good idea! I'll add that tomorrow.
by , 13 years ago
Attachment: | 10530-rework-report-pagination-r10914.2.patch added |
---|
second take, sql_skeleton
now uses a single regexp which is simpler and more correct…
comment:5 by , 13 years ago
For the "Mine Ticket First" issue (r10892), I propose that we simply document that the __group__
(or __group_asc__
) column, if given, will be used as the first column in the ORDER BY clause and will have the ASC qualifier. Likewise a new __group_desc__
column alias can be used for specifying a group which will have a DESC qualifier. Patch follows.
by , 13 years ago
Attachment: | 10530-__group_desc__-support-r10921.patch added |
---|
use __group_desc__
to specify DESC group ordering (applies on 10530-rework-report-pagination-r10914.2.patch
comment:6 by , 13 years ago
I really don't like adding things to the Report module and documentation. It just complicates. Having split everything so far, surely splitting the ORDER BY
clause should be reasonable?
If __group__
is in columns, all we need to do is cut out the first clause and re-inject it as-is in first position when re-sorting. We don't need to know if it is ASC
or DESC
or what and how it orders at all - we just need to put the first clause first, then add the new sort as requested by the user.
With your other patch for correct ordering, r10892 needs to be reverted so that we only inject it when there is a custom sort - and then we just use the first clause whatever it is. We should not write our own criteria as we do now (by adding __group__
).
comment:7 by , 13 years ago
I haven't much considered the pre-r10892 code, but you're right: before that, people wouldn't rely on having the __group__
column being injected in the ORDER BY clause, so they most certainly already have it. So I tried this approach (patch attached) but it will break for some report: look at {11} for example. There, the sorting order for the group spreads on two expressions.
by , 13 years ago
Attachment: | 10530-only-insert-sort_col-r10921.diff added |
---|
experiment alternative approach, only inject the sort_col (applies on 10530-__group_desc__-support-r10921.patch) [attachment:
by , 13 years ago
Attachment: | 10530-only-insert-sort_col-r10921.2.diff added |
---|
experiment alternative approach, only inject the sort_col - take 2, more cases handled
comment:8 by , 13 years ago
And it's not just {11}, a good deal of our default reports use this approach (source:trunk/trac/db_default.py).
follow-up: 12 comment:9 by , 13 years ago
Hmm. You are right. That also shoots down the idea of using __group_desc__
and similar as that will not be able to pick up the complexity of ordering either.
Is there any syntax in SQL that allows the two clauses of ORDER BY (milestone IS NULL), milestone DESC, ...
to be seen as one - or changed to be just one clause? Or otherwise grouped using a marker/comment that allows us to detect the ordering related to groups?
follow-up: 11 comment:10 by , 13 years ago
We could use a comment marker like /*group_end*/
to mark end of grouping clauses - if more than one clause is in use.
ORDER BY (milestone IS NULL), milestone DESC /*group_end*/, ...
We cannot reliably detect it, but we can default just pick the first clause. For multi-clause group ordering some changes must be done to the SQL to support it correctly - regardless of strategy. A marker makes as much sense as anything else.
Oh, and at no point has this ever worked correctly before, so anything better is a bonus.
comment:11 by , 13 years ago
Replying to osimons:
Oh, and at no point has this ever worked correctly before, so anything better is a bonus.
If think I understand now from where the trouble comes…
It's all about this 'sorting_enabled': len(row_groups)==1
test which replaced a test if sql.find('__group__') == -1: req.hdf['report.sorting.enabled'] = 1
(r3816). That was wrong as the intent was really to disable user sorting if a __group__
was specified.
Before paging both tests would achieve the same effect, but no longer with paging: if a page contains only rows from a single group, then suddenly custom sorting becomes possible…
So the obvious fix would be to really disable custom sorting when there's a __group__
specified. But given all the work I've done now to make grouping and custom sorting work, I hope we can find a better solution ;-) (not to mention that theoretically we could still have a SELECT * FROM (...) LIMIT ...
to scramble the order given by the inner query).
follow-up: 13 comment:12 by , 13 years ago
Replying to osimons:
[…] That also shoots down the idea of using
__group_desc__
and similar as that will not be able to pick up the complexity of ordering either.
That seems cleaner than the alternative you suggested, though; it's a pretty straightforward extension to the __group__
notation.
Is there any syntax in SQL that allows the two clauses of
ORDER BY (milestone IS NULL), milestone DESC, ...
to be seen as one
Precisely the alias!
If you have a complex way to specify the grouping and create a __group__
alias for this expression then reusing __group__
in the ORDER BY will give you a predictable order.
Or otherwise grouped using a marker/comment that allows us to detect the ordering related to groups?
Probably also doable, but looks more clumsy to me than just having another special column name.
follow-up: 14 comment:13 by , 13 years ago
Replying to cboos:
Replying to osimons:
[…] That also shoots down the idea of using
__group_desc__
and similar as that will not be able to pick up the complexity of ordering either.That's seems cleaner than the alternative though, pretty straightforward extension to the
__group__
notation.Is there any syntax in SQL that allows the two clauses of
ORDER BY (milestone IS NULL), milestone DESC, ...
to be seen as onePrecisely the alias!
If you have a complex way to specify the grouping and create a
__group__
alias for this expression then reusing__group__
in the ORDER BY will give you a predictable order.
But the alias won't say anything at all about the order - the multi-clause select order won't be preserved when put into the ORDER BY
as a single-column alias? At that point the column can sort it's values ASC
or DESC
, but that is really all it can do. Or, is it me not understanding?
So the effect of just using the first clause in ORDER BY
could be just as 'correct', depending on query. It has the advantage of no actual changes.
comment:14 by , 13 years ago
Replying to osimons:
But the alias won't say anything at all about the order - the multi-clause select order won't be preserved when put into the
ORDER BY
as a single-column alias?
It's a meaningful approximation to the order. The __group__
is what is being displayed as group titles, so if those are sorted (ASC
or DESC
), I think that makes sense.
Granted, perhaps that won't match a carefully crafted custom ordering for the groups, in which case we would need a marker.
So the effect of just using the first clause in
ORDER BY
could be just as 'correct', depending on query. It has the advantage of no actual changes.
You only need to look at the default reports to see the difference it would make.
What we have there is that the first clause is simply coping with null or empty values:
Report | __group__ | ORDER BY
|
---|---|---|
Active Tickets by Version | version AS __group__ | ORDER BY (version IS NULL),version,
|
Active Tickets by Milestone | ('Milestone ' || milestone) AS group | ORDER BY (milestone IS NULL),milestone,
|
All Tickets By Milestone (Including closed)' | t.milestone AS __group__ | ORDER BY (milestone IS NULL), milestone DESC,
|
For these ones, taking only the first group will not work at all, while using __group__
would work.
And also for these two:
My Tickets | (CASE status WHEN 'accepted' THEN 'Accepted' ELSE 'Owned' END) AS __group__ | (status = 'accepted') DESC,
|
Active Tickets, Mine first | (CASE owner WHEN $USER THEN 'My Tickets' ELSE 'Active Tickets' END) AS __group__` | ORDER BY (COALESCE(owner, '') = $USER) DESC,
|
While the __group__
is not strictly matching the first order criterion, using it would also work (ok, __group_desc
for the second).
Bottom line:
- using
__group__
/__group_desc__
would cover most of the cases, its syntax is in line with what we have today - using
/* group_end */
would cover all the cases, but feels more at odds with our current "syntax" … and the code is much more complex
comment:15 by , 13 years ago
SELECT * FROM t WHERE a = '\\' ORDER BY b,''
SELECT * FROM t WHERE a = '\\' ORDER BY b --'
And other cases with escaped \
inside string literals, with another '
somewhere later seems to break the regex parsing. Changing the (\\'|[^'])*
regex to (\\.|[^'\\])*
seems to fix it.
follow-up: 17 comment:16 by , 13 years ago
Cc: | added |
---|
Hi, I'm the original reporter of this problem, as observed after migrating libusb.org from sqlite to MariaDB, which is MySQL compatible.
Parsing SQL with a regex is not a very robust solution, among other things because each supported SQL implementation supports slightly different dialect, and a Trac user should be able to use the full SQL of the underlying database. This might even be a reason for the user to switch database from sqlite in the first place. (Although it wasn't for me. I wanted to move the database workload to another physical machine.)
Parsing SQL with one regex per SQL backend might work out. I would prefer a simpler and completely robust solution.
As cboos identified, before this issue was looked at there was never any user sorting, if an SQL report specified ordering using a group column. I think this is perfectly acceptable, and a good compromise!
When SQL reports are used it is perhaps easy enough for TRAC_ADMIN to customize and/or create new reports for users as neccessary. In addition, since this bug has existed for nearly forever without an avalance of ticket, there is probably not a big immediate demand for the parsing of report SQL and injecting of extra order by fields.
Parsing SQL by regex is a bad idea IMO. I'm happy with the design decision that SQL queries will not be user orderable if an explicit group column is selected by the report.
group_desc is a nice idea, although I don't need it. :)
comment:17 by , 13 years ago
Replying to peter@…:
As cboos identified, before this issue was looked at there was never any user sorting, if an SQL report specified ordering using a group column. I think this is perfectly acceptable, and a good compromise!
When SQL reports are used it is perhaps easy enough for TRAC_ADMIN to customize and/or create new reports for users as neccessary. In addition, since this bug has existed for nearly forever without an avalance of ticket, there is probably not a big immediate demand for the parsing of report SQL and injecting of extra order by fields.
Parsing SQL by regex is a bad idea IMO. I'm happy with the design decision that SQL queries will not be user orderable if an explicit group column is selected by the report.
FWIW, I pretty much agree with all that Peter wrote above. Especially the part about not having received (m)any reports about this issue despite its being around for a long time, and the user not being able to change the order being acceptable.
comment:18 by , 13 years ago
Ok, so it seems we're heading towards a simple fix for 0.12-stable, disable user sorting when __group__
is specified.
However, I'm very much interested in finishing this topic on a more satisfying note (for trunk probably), by:
- not using a subquery for paging
- having the possibility to manually sort the
__group__
reports (which I found to be a quite useful feature while working on this topic!)
My first solution (10530-rework-report-pagination-r10914.2.patch) mostly works, but as discussed above doesn't cover all the cases.
So in situations where the above is not good enough, I propose a more explicit way to specify how the report should be edited, which doesn't involve parsing the SQL:
- substitute
@SORT_COLUMN@
with the user selected sort column; for example, the Active Tickets, Mine first report becomes:... ORDER BY (COALESCE(owner, '') = $USER) DESC, @SORT_COLUMN@, ...
- substitute
@LIMIT_OFFEST@
with the LIMIT/OFFSET from the paging parameters (this also provides a way to disable automatic paging by placing that in a comment, for example)
This is implemented in:
- 10530-fix-SQL-quote.patch (shesek's fix from comment:15)
- 10530-SORT_COLUMN.patch (the "explicit" editing)
And, besides, the quickfix for 0.12-stable would be to do only the following, on top of r10921:
by , 13 years ago
Attachment: | 10530-fix-SQL-quote.patch added |
---|
fix SQL string literal parsing (applies on top of 10530-rework-report-pagination-r10914.2.patch)
by , 13 years ago
Attachment: | 10530-SORT_COLUMN.patch added |
---|
add support for explicit sort column insertion with @SORT_COLUMN@ (applies on 10530-fix-SQL-quote.patch)
by , 13 years ago
Attachment: | 10530-sorting-disabled-r10921.patch added |
---|
comment:19 by , 13 years ago
OK. Let's go with "simple" for 0.12 (revert + disable), and leave the improvements for trunk.
comment:20 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:21 by , 13 years ago
Owner: | set to |
---|
rework report pagination to not use a subquery.