Opened 18 years ago
Closed 18 years ago
#3837 closed enhancement (duplicate)
Inefficient sql query
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | version control | Version: | 0.10 |
Severity: | major | Keywords: | svn |
Cc: | jeremy@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
The sql query in get_youngest_rev_in_cache is quite inefficient and does a full table scan in the revision table which scales quite bad. The test repository used for the following numbers is the repository from crystalspace3d.org which has around 25000 revisions and is 1.5gb big.
I benchmarked with apachebench with 3 times 20 requests. Everything one with a concurrency of 4 and without. As the concurrency didnt make any significant difference and as they vary much more over each series of requests, i didnt wrote the numbers down.
What i requested was the timeline with a history depth of 5 days. Hardware is a pentium 4 system, with a raid 1 disk system and 1 gig of ram.
The following numbers are for sqlite. With postgres its very much more visible, but i thought that sqlite is more wildly used.
Original query: 1.01 requests/s
SELECT MAX(CAST(rev as int)) FROM revision 1.2 requests/s
SELECT MAX(rev_int) FROM revision with rev_int beeing a column which copied rev as an int and an seperate index over it. The problem is, that sqlite doesnt seems to support indices over expressions which eg postgres does. 3.66 requests/s
The effect is also visible when using a history depth of 90 days but the differences arent _that_ big as before. I didnt really measure those as its sometimes below 0.11 requests/s …
The problem with this is, that the CAST() syntax is only supported with sqlite3 and not 2. Although not tested mysql seems to support that for some time.
And indices over expressions are not supported in sqlite at all. Which is a pity because else you could just add an index like CREATE INDEX revision_rev_asint ON revision ((CAST(rev as int)). The only even halfway practical solution i currently see is an extra column + trigger. But thats very hackish. Any other idea?
Attachments (0)
Change History (9)
comment:1 by , 18 years ago
follow-up: 3 comment:2 by , 18 years ago
Component: | general → version control |
---|---|
Keywords: | svn added |
Milestone: | → 0.10.1 |
Owner: | changed from | to
comment:3 by , 18 years ago
Replying to cboos:
How do you plan to fix this? It sounds like any fix would require a change to the DB schema, which we have avoided doing for bugfix releases. I'm also not sure that changing the revision cache to use ints would be appropriate since version control systems other than SVN may not use numeric identifiers for revisions.
comment:4 by , 18 years ago
Well, for 0.10.1, I wanted to put at least the CAST
, as this is cleaner than the current hack, and apparently a bit faster.
I also wanted to check if there's really no way with SQLite to create an index based on a computed value. It looks like we could do it with user-defined collating sequences.
The first change wouldn't affect the DB at all, the second would create a new index if svn
is the repository_type
. Ideally, this should not be done by a DB upgrade, since you can decide to set for the first time or change the repository_type
at anytime. This should rather be done just after a resync
, by the version control specific code. So probably this is better seen as a TODO item for the VcRefactoring…
About making a DB schema in 0.10.1, we already discussed one: the current PRIMARY KEY
constraint on the node_change
table is problematic for MySQL (see #3676 and attachment:ticket:3778:node_change_upgrade-r3787.patch).
follow-up: 7 comment:5 by , 18 years ago
Severity: | normal → major |
---|
I just ran into this on a beefy, dedicated Trac server during heavy web spidering.
Each youngest rev query was taking 3-4 seconds.
After indexing cast(rev as integer) in PG and fixing the query in Trac my server load went from 30 to 2 and the query is scant milliseconds.
If the table is meant to support non-integer revisions, why would you be using this query which bends over backward to sort the revisions as integers? Using an integer provides significant speedup to all depoyments which exist today.
Please consider fixing with an eye to what exists now rather than what may occur in the future, considering these future developments will require significant query changes anyway.
comment:6 by , 18 years ago
Cc: | added |
---|
comment:7 by , 18 years ago
Priority: | normal → high |
---|
Replying to Jeremy Kemper <jeremy@bitsweat.net>:
I just ran into this on a beefy, dedicated Trac server during heavy web spidering.
Each youngest rev query was taking 3-4 seconds.
Oops, I didn't realize it could be that inefficient… so slow even with the #4732 fix?
… Using an integer provides significant speedup to all deployments which exist today.
… all SVN deployments.
Please consider fixing with an eye to what exists now rather than what may occur in the future, considering these future developments will require significant query changes anyway.
True, the fix for #4586 could be done in such a way that this query shouldn't be needed anymore. Right now (0.10-stable and 0.11) this could be done by storing the latest synced rev in the system
table and in the future (0.12 - VcRefactoring) that could be stored in a specific table (repository_head
).
comment:8 by , 18 years ago
Before #4732 we could only see FETCH 1 from <cursor name>
in the postgresql log. Fixing it is what allowed us to see the actual SELECT queries and thus identify and fix the slow ones.
Luckily this was the only slow query. Alone, it had brought a dual dual-core opteron 285 16GB RAM to its knees under web spider load! Using an index on cast(rev as integer) or simply changing rev to integer both brought the box to ~20% idle.
Understood re. non-SVN deployments. Caching the latest synced revision elsewhere sounds like a great alternative that'd obviate the problem entirely.
comment:9 by , 18 years ago
Milestone: | 0.10.4 |
---|---|
Resolution: | → duplicate |
Status: | new → closed |
btw, if somebody suggests a way to go forward here i will happily provide a patch.