Edgewall Software
Modify

Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#10530 closed defect (fixed)

Fix pagination for reports

Reported by: Christian Boos Owned by: Christian Boos
Priority: normal Milestone: 0.12.3
Component: general Version: 0.11
Severity: blocker Keywords: report query
Cc: osimons, peter@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

We just noticed that the way we implemented paginated reports (#216) simply makes some buggy assumptions about SQL subqueries.

In short, when one does:

SELECT * (SELECT ... FROM ... ORDER BY <criterion>) LIMIT x OFFSET y

there's absolutely no guarantee that the order defined by <criterion> in the inner query will be kept in the final result…

See this discussion thread in trac-dev.

Pagination for custom queries is not affected by this problem.

Attachments (9)

10530-rework-report-pagination-r10914.patch (5.2 KB ) - added by Christian Boos 13 years ago.
rework report pagination to not use a subquery.
10530-nested-c-comments-r10917.patch (1.5 KB ) - added by Christian Boos 13 years ago.
sql_skeleton now also strips nested C comments (applies on top of attachment:10530-rework-report-pagination-r10914.patch)
10530-rework-report-pagination-r10914.2.patch (5.6 KB ) - added by Christian Boos 13 years ago.
second take, sql_skeleton now uses a single regexp which is simpler and more correct…
10530-__group_desc__-support-r10921.patch (3.8 KB ) - added by Christian Boos 13 years ago.
use __group_desc__ to specify DESC group ordering (applies on 10530-rework-report-pagination-r10914.2.patch
10530-only-insert-sort_col-r10921.diff (3.3 KB ) - added by Christian Boos 13 years ago.
experiment alternative approach, only inject the sort_col (applies on 10530-__group_desc__-support-r10921.patch) [attachment:
10530-only-insert-sort_col-r10921.2.diff (3.5 KB ) - added by Christian Boos 13 years ago.
experiment alternative approach, only inject the sort_col - take 2, more cases handled
10530-fix-SQL-quote.patch (1.4 KB ) - added by Christian Boos 13 years ago.
fix SQL string literal parsing (applies on top of 10530-rework-report-pagination-r10914.2.patch)
10530-SORT_COLUMN.patch (9.7 KB ) - added by Christian Boos 13 years ago.
add support for explicit sort column insertion with @SORT_COLUMN@ (applies on 10530-fix-SQL-quote.patch)
10530-sorting-disabled-r10921.patch (702 bytes ) - added by Christian Boos 13 years ago.
the simple fix for 0.12.3 (applies on r10921, with r10892 reverted)

Download all attachments as: .zip

Change History (30)

comment:1 by osimons, 13 years ago

Cc: osimons added

by Christian Boos, 13 years ago

rework report pagination to not use a subquery.

comment:2 by Christian Boos, 13 years ago

Above patch currently installed on demo-0.12.

comment:3 by osimons, 13 years ago

Good going. Ordering seems to work, at least for the default reports and some other rather basic reports I had access to. Will be interesting to see how the strategy performs in the wild, though.

That said, no doubt any report can be changed to work even if this change makes it fail. Perhaps if the original query work (like when we get columns) but final rewritten query fails, we can raise a useful error message that points to the changes and asks for the report to be "tuned".

by Christian Boos, 13 years ago

sql_skeleton now also strips nested C comments (applies on top of attachment:10530-rework-report-pagination-r10914.patch)

in reply to:  3 comment:4 by Christian Boos, 13 years ago

Replying to osimons:

if the original query work (like when we get columns) but final rewritten query fails, we can raise a useful error message that points to the changes and asks for the report to be "tuned".

Sure, good idea! I'll add that tomorrow.

Last edited 13 years ago by Christian Boos (previous) (diff)

by Christian Boos, 13 years ago

second take, sql_skeleton now uses a single regexp which is simpler and more correct…

comment:5 by Christian Boos, 13 years ago

For the "Mine Ticket First" issue (r10892), I propose that we simply document that the __group__ (or __group_asc__) column, if given, will be used as the first column in the ORDER BY clause and will have the ASC qualifier. Likewise a new __group_desc__ column alias can be used for specifying a group which will have a DESC qualifier. Patch follows.

by Christian Boos, 13 years ago

use __group_desc__ to specify DESC group ordering (applies on 10530-rework-report-pagination-r10914.2.patch

comment:6 by osimons, 13 years ago

I really don't like adding things to the Report module and documentation. It just complicates. Having split everything so far, surely splitting the ORDER BY clause should be reasonable?

If __group__ is in columns, all we need to do is cut out the first clause and re-inject it as-is in first position when re-sorting. We don't need to know if it is ASC or DESC or what and how it orders at all - we just need to put the first clause first, then add the new sort as requested by the user.

With your other patch for correct ordering, r10892 needs to be reverted so that we only inject it when there is a custom sort - and then we just use the first clause whatever it is. We should not write our own criteria as we do now (by adding __group__).

comment:7 by Christian Boos, 13 years ago

I haven't much considered the pre-r10892 code, but you're right: before that, people wouldn't rely on having the __group__ column being injected in the ORDER BY clause, so they most certainly already have it. So I tried this approach (patch attached) but it will break for some report: look at {11} for example. There, the sorting order for the group spreads on two expressions.

by Christian Boos, 13 years ago

experiment alternative approach, only inject the sort_col (applies on 10530-__group_desc__-support-r10921.patch) [attachment:

by Christian Boos, 13 years ago

experiment alternative approach, only inject the sort_col - take 2, more cases handled

comment:8 by Christian Boos, 13 years ago

And it's not just {11}, a good deal of our default reports use this approach (source:trunk/trac/db_default.py).

comment:9 by osimons, 13 years ago

Hmm. You are right. That also shoots down the idea of using __group_desc__ and similar as that will not be able to pick up the complexity of ordering either.

Is there any syntax in SQL that allows the two clauses of ORDER BY (milestone IS NULL), milestone DESC, ... to be seen as one - or changed to be just one clause? Or otherwise grouped using a marker/comment that allows us to detect the ordering related to groups?

comment:10 by osimons, 13 years ago

We could use a comment marker like /*group_end*/ to mark end of grouping clauses - if more than one clause is in use.

ORDER BY (milestone IS NULL), milestone DESC /*group_end*/, ...

We cannot reliably detect it, but we can default just pick the first clause. For multi-clause group ordering some changes must be done to the SQL to support it correctly - regardless of strategy. A marker makes as much sense as anything else.

Oh, and at no point has this ever worked correctly before, so anything better is a bonus.

in reply to:  10 comment:11 by Christian Boos, 13 years ago

Replying to osimons:

Oh, and at no point has this ever worked correctly before, so anything better is a bonus.

If think I understand now from where the trouble comes…

It's all about this 'sorting_enabled': len(row_groups)==1 test which replaced a test if sql.find('__group__') == -1: req.hdf['report.sorting.enabled'] = 1 (r3816). That was wrong as the intent was really to disable user sorting if a __group__ was specified.

Before paging both tests would achieve the same effect, but no longer with paging: if a page contains only rows from a single group, then suddenly custom sorting becomes possible…

So the obvious fix would be to really disable custom sorting when there's a __group__ specified. But given all the work I've done now to make grouping and custom sorting work, I hope we can find a better solution ;-) (not to mention that theoretically we could still have a SELECT * FROM (...) LIMIT ... to scramble the order given by the inner query).

in reply to:  9 ; comment:12 by Christian Boos, 13 years ago

Replying to osimons:

[…] That also shoots down the idea of using __group_desc__ and similar as that will not be able to pick up the complexity of ordering either.

That seems cleaner than the alternative you suggested, though; it's a pretty straightforward extension to the __group__ notation.

Is there any syntax in SQL that allows the two clauses of ORDER BY (milestone IS NULL), milestone DESC, ... to be seen as one

Precisely the alias!

If you have a complex way to specify the grouping and create a __group__ alias for this expression then reusing __group__ in the ORDER BY will give you a predictable order.

Or otherwise grouped using a marker/comment that allows us to detect the ordering related to groups?

Probably also doable, but looks more clumsy to me than just having another special column name.

Last edited 13 years ago by Christian Boos (previous) (diff)

in reply to:  12 ; comment:13 by osimons, 13 years ago

Replying to cboos:

Replying to osimons:

[…] That also shoots down the idea of using __group_desc__ and similar as that will not be able to pick up the complexity of ordering either.

That's seems cleaner than the alternative though, pretty straightforward extension to the __group__ notation.

Is there any syntax in SQL that allows the two clauses of ORDER BY (milestone IS NULL), milestone DESC, ... to be seen as one

Precisely the alias!

If you have a complex way to specify the grouping and create a __group__ alias for this expression then reusing __group__ in the ORDER BY will give you a predictable order.

But the alias won't say anything at all about the order - the multi-clause select order won't be preserved when put into the ORDER BY as a single-column alias? At that point the column can sort it's values ASC or DESC, but that is really all it can do. Or, is it me not understanding?

So the effect of just using the first clause in ORDER BY could be just as 'correct', depending on query. It has the advantage of no actual changes.

in reply to:  13 comment:14 by Christian Boos, 13 years ago

Replying to osimons:

But the alias won't say anything at all about the order - the multi-clause select order won't be preserved when put into the ORDER BY as a single-column alias?

It's a meaningful approximation to the order. The __group__ is what is being displayed as group titles, so if those are sorted (ASC or DESC), I think that makes sense.

Granted, perhaps that won't match a carefully crafted custom ordering for the groups, in which case we would need a marker.

So the effect of just using the first clause in ORDER BY could be just as 'correct', depending on query. It has the advantage of no actual changes.

You only need to look at the default reports to see the difference it would make.

What we have there is that the first clause is simply coping with null or empty values:

Report __group__ ORDER BY
Active Tickets by Version version AS __group__ ORDER BY (version IS NULL),version,
Active Tickets by Milestone ('Milestone ' || milestone) AS group ORDER BY (milestone IS NULL),milestone,
All Tickets By Milestone (Including closed)' t.milestone AS __group__ ORDER BY (milestone IS NULL), milestone DESC,

For these ones, taking only the first group will not work at all, while using __group__ would work.

And also for these two:

My Tickets (CASE status WHEN 'accepted' THEN 'Accepted' ELSE 'Owned' END) AS __group__ (status = 'accepted') DESC,
Active Tickets, Mine first
(CASE owner 
     WHEN $USER THEN 'My Tickets' 
     ELSE 'Active Tickets' 
    END) AS __group__` 

ORDER BY (COALESCE(owner, '') = $USER) DESC,

While the __group__ is not strictly matching the first order criterion, using it would also work (ok, __group_desc for the second).

Bottom line:

  • using __group__ / __group_desc__ would cover most of the cases, its syntax is in line with what we have today
  • using /* group_end */ would cover all the cases, but feels more at odds with our current "syntax" … and the code is much more complex

comment:15 by shesek <nadav@…>, 13 years ago

SELECT * FROM t WHERE a = '\\' ORDER BY b,'' SELECT * FROM t WHERE a = '\\' ORDER BY b --'

And other cases with escaped \ inside string literals, with another ' somewhere later seems to break the regex parsing. Changing the (\\'|[^'])* regex to (\\.|[^'\\])* seems to fix it.

comment:16 by peter@…, 13 years ago

Cc: peter@… added

Hi, I'm the original reporter of this problem, as observed after migrating libusb.org from sqlite to MariaDB, which is MySQL compatible.

Parsing SQL with a regex is not a very robust solution, among other things because each supported SQL implementation supports slightly different dialect, and a Trac user should be able to use the full SQL of the underlying database. This might even be a reason for the user to switch database from sqlite in the first place. (Although it wasn't for me. I wanted to move the database workload to another physical machine.)

Parsing SQL with one regex per SQL backend might work out. I would prefer a simpler and completely robust solution.

As cboos identified, before this issue was looked at there was never any user sorting, if an SQL report specified ordering using a group column. I think this is perfectly acceptable, and a good compromise!

When SQL reports are used it is perhaps easy enough for TRAC_ADMIN to customize and/or create new reports for users as neccessary. In addition, since this bug has existed for nearly forever without an avalance of ticket, there is probably not a big immediate demand for the parsing of report SQL and injecting of extra order by fields.

Parsing SQL by regex is a bad idea IMO. I'm happy with the design decision that SQL queries will not be user orderable if an explicit group column is selected by the report.

group_desc is a nice idea, although I don't need it. :)

in reply to:  16 comment:17 by Remy Blank, 13 years ago

Replying to peter@…:

As cboos identified, before this issue was looked at there was never any user sorting, if an SQL report specified ordering using a group column. I think this is perfectly acceptable, and a good compromise!

When SQL reports are used it is perhaps easy enough for TRAC_ADMIN to customize and/or create new reports for users as neccessary. In addition, since this bug has existed for nearly forever without an avalance of ticket, there is probably not a big immediate demand for the parsing of report SQL and injecting of extra order by fields.

Parsing SQL by regex is a bad idea IMO. I'm happy with the design decision that SQL queries will not be user orderable if an explicit group column is selected by the report.

FWIW, I pretty much agree with all that Peter wrote above. Especially the part about not having received (m)any reports about this issue despite its being around for a long time, and the user not being able to change the order being acceptable.

comment:18 by Christian Boos, 13 years ago

Ok, so it seems we're heading towards a simple fix for 0.12-stable, disable user sorting when __group__ is specified.

However, I'm very much interested in finishing this topic on a more satisfying note (for trunk probably), by:

  • not using a subquery for paging
  • having the possibility to manually sort the __group__ reports (which I found to be a quite useful feature while working on this topic!)

My first solution (10530-rework-report-pagination-r10914.2.patch) mostly works, but as discussed above doesn't cover all the cases.

So in situations where the above is not good enough, I propose a more explicit way to specify how the report should be edited, which doesn't involve parsing the SQL:

  • substitute @SORT_COLUMN@ with the user selected sort column; for example, the Active Tickets, Mine first report becomes:
    ... ORDER BY (COALESCE(owner, '') = $USER) DESC, @SORT_COLUMN@, ...
    
  • substitute @LIMIT_OFFEST@ with the LIMIT/OFFSET from the paging parameters (this also provides a way to disable automatic paging by placing that in a comment, for example)

This is implemented in:

And, besides, the quickfix for 0.12-stable would be to do only the following, on top of r10921:

by Christian Boos, 13 years ago

Attachment: 10530-fix-SQL-quote.patch added

fix SQL string literal parsing (applies on top of 10530-rework-report-pagination-r10914.2.patch)

by Christian Boos, 13 years ago

Attachment: 10530-SORT_COLUMN.patch added

add support for explicit sort column insertion with @SORT_COLUMN@ (applies on 10530-fix-SQL-quote.patch)

by Christian Boos, 13 years ago

the simple fix for 0.12.3 (applies on r10921, with r10892 reverted)

comment:19 by osimons, 13 years ago

OK. Let's go with "simple" for 0.12 (revert + disable), and leave the improvements for trunk.

comment:20 by Christian Boos, 13 years ago

Resolution: fixed
Status: newclosed

Fine, did that in r10922 and r10923 and followed-up for trunk in #10540.

comment:21 by Christian Boos, 13 years ago

Owner: set to Christian Boos

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.