Edgewall Software
Modify

Opened 13 years ago

Closed 13 years ago

#3837 closed enhancement (duplicate)

Inefficient sql query

Reported by: andres@… Owned by: Christian Boos
Priority: high Milestone:
Component: version control Version: 0.10
Severity: major Keywords: svn
Cc: jeremy@… Branch:
Release Notes:
API Changes:

Description

The sql query in get_youngest_rev_in_cache is quite inefficient and does a full table scan in the revision table which scales quite bad. The test repository used for the following numbers is the repository from crystalspace3d.org which has around 25000 revisions and is 1.5gb big.

I benchmarked with apachebench with 3 times 20 requests. Everything one with a concurrency of 4 and without. As the concurrency didnt make any significant difference and as they vary much more over each series of requests, i didnt wrote the numbers down.

What i requested was the timeline with a history depth of 5 days. Hardware is a pentium 4 system, with a raid 1 disk system and 1 gig of ram.

The following numbers are for sqlite. With postgres its very much more visible, but i thought that sqlite is more wildly used.

Original query: 1.01 requests/s

SELECT MAX(CAST(rev as int)) FROM revision 1.2 requests/s

SELECT MAX(rev_int) FROM revision with rev_int beeing a column which copied rev as an int and an seperate index over it. The problem is, that sqlite doesnt seems to support indices over expressions which eg postgres does. 3.66 requests/s

The effect is also visible when using a history depth of 90 days but the differences arent _that_ big as before. I didnt really measure those as its sometimes below 0.11 requests/s …

The problem with this is, that the CAST() syntax is only supported with sqlite3 and not 2. Although not tested mysql seems to support that for some time.

And indices over expressions are not supported in sqlite at all. Which is a pity because else you could just add an index like CREATE INDEX revision_rev_asint ON revision ((CAST(rev as int)). The only even halfway practical solution i currently see is an extra column + trigger. But thats very hackish. Any other idea?

Attachments (0)

Change History (9)

in reply to:  description comment:1 by anonymous, 13 years ago

btw, if somebody suggests a way to go forward here i will happily provide a patch.

comment:2 by Christian Boos, 13 years ago

Component: generalversion control
Keywords: svn added
Milestone: 0.10.1
Owner: changed from Jonas Borgström to Christian Boos

in reply to:  2 comment:3 by Matthew Good, 13 years ago

Replying to cboos:

How do you plan to fix this? It sounds like any fix would require a change to the DB schema, which we have avoided doing for bugfix releases. I'm also not sure that changing the revision cache to use ints would be appropriate since version control systems other than SVN may not use numeric identifiers for revisions.

comment:4 by Christian Boos, 13 years ago

Well, for 0.10.1, I wanted to put at least the CAST, as this is cleaner than the current hack, and apparently a bit faster. I also wanted to check if there's really no way with SQLite to create an index based on a computed value. It looks like we could do it with user-defined collating sequences.

The first change wouldn't affect the DB at all, the second would create a new index if svn is the repository_type. Ideally, this should not be done by a DB upgrade, since you can decide to set for the first time or change the repository_type at anytime. This should rather be done just after a resync, by the version control specific code. So probably this is better seen as a TODO item for the VcRefactoring

About making a DB schema in 0.10.1, we already discussed one: the current PRIMARY KEY constraint on the node_change table is problematic for MySQL (see #3676 and attachment:ticket:3778:node_change_upgrade-r3787.patch).

comment:5 by Jeremy Kemper <jeremy@…>, 13 years ago

Severity: normalmajor

I just ran into this on a beefy, dedicated Trac server during heavy web spidering.

Each youngest rev query was taking 3-4 seconds.

After indexing cast(rev as integer) in PG and fixing the query in Trac my server load went from 30 to 2 and the query is scant milliseconds.

If the table is meant to support non-integer revisions, why would you be using this query which bends over backward to sort the revisions as integers? Using an integer provides significant speedup to all depoyments which exist today.

Please consider fixing with an eye to what exists now rather than what may occur in the future, considering these future developments will require significant query changes anyway.

comment:6 by Jeremy Kemper <jeremy@…>, 13 years ago

Cc: jeremy@… added

in reply to:  5 comment:7 by Christian Boos, 13 years ago

Priority: normalhigh

Replying to Jeremy Kemper <jeremy@bitsweat.net>:

I just ran into this on a beefy, dedicated Trac server during heavy web spidering.

Each youngest rev query was taking 3-4 seconds.

Oops, I didn't realize it could be that inefficient… so slow even with the #4732 fix?

… Using an integer provides significant speedup to all deployments which exist today.

… all SVN deployments.

Please consider fixing with an eye to what exists now rather than what may occur in the future, considering these future developments will require significant query changes anyway.

True, the fix for #4586 could be done in such a way that this query shouldn't be needed anymore. Right now (0.10-stable and 0.11) this could be done by storing the latest synced rev in the system table and in the future (0.12 - VcRefactoring) that could be stored in a specific table (repository_head).

comment:8 by Jeremy Kemper <jeremy@…>, 13 years ago

Before #4732 we could only see FETCH 1 from <cursor name> in the postgresql log. Fixing it is what allowed us to see the actual SELECT queries and thus identify and fix the slow ones.

Luckily this was the only slow query. Alone, it had brought a dual dual-core opteron 285 16GB RAM to its knees under web spider load! Using an index on cast(rev as integer) or simply changing rev to integer both brought the box to ~20% idle.

Understood re. non-SVN deployments. Caching the latest synced revision elsewhere sounds like a great alternative that'd obviate the problem entirely.

comment:9 by Christian Boos, 13 years ago

Milestone: 0.10.4
Resolution: duplicate
Status: newclosed

This ticket is superseded by #4586, which now has a patch.

Testing and feedback appreciated!

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to as closed The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.