Edgewall Software

Ticket #3067 (closed task: fixed)

Opened 2 years ago

Last modified 2 years ago

p.e.c. slowness issues after 0.10dev upgrade

Reported by: cboos Owned by: jonas
Priority: highest Milestone:
Component: general Version: devel
Severity: critical Keywords: pec
Cc: jonas@…

Description

As many may have noticed, there are some issues on this Trac after the upgrade to 0.10dev.

While usually 0.10dev works quite well, we nevertheless have some issues here that are currently being investigated.

This ticket can be used to discuss the issue and for gathering related data.

PS: I just had a database is locked error while submitting this the first time :)

Attachments

create_newticket.png (44.0 kB) - added by cboos 2 years ago.
These are the timings of the first failed ticket creation attempt (numbers are milliseconds). Created by the Tamper Data extension for Firefox.
check_timeline_request.png (62.9 kB) - added by cboos 2 years ago.
Just after the failed attempt (ending up in a database is locked error), I checked the timeline …
check_latest_changeset.png (37.5 kB) - added by cboos 2 years ago.
Then I wanted to have a look at the latest changeset…
view_ticket_3067.2.png (48.8 kB) - added by cboos 2 years ago.
And finally, a graph of the timing of viewing #3067 just after I completed the last three uploads, as I noticed that the server is reasonably fast now (second upload attempt for this one, the first never completed...)
load_test.2.png (33.0 kB) - added by cboos 2 years ago.
After some changes on p.e.c, here's the result of a load test (3 wiki pages, 4 attachement previews, 1 ticket view and 4 timelines)
session.patch (13.5 kB) - added by jonas 2 years ago.
A more optimized session schema (updated).
session3.patch (23.3 kB) - added by jonas 2 years ago.
A new version of the patch with a more normalized table schema

Change History

Changed 2 years ago by cboos

These are the timings of the first failed ticket creation attempt (numbers are milliseconds). Created by the Tamper Data extension for Firefox.

Changed 2 years ago by cboos

Just after the failed attempt (ending up in a database is locked error), I checked the timeline ...

Changed 2 years ago by cboos

Then I wanted to have a look at the latest changeset...

Changed 2 years ago by cboos

And finally, a graph of the timing of viewing #3067 just after I completed the last three uploads, as I noticed that the server is reasonably fast now (second upload attempt for this one, the first never completed...)

Changed 2 years ago by cboos

  • keywords pec added

The first graph shows some long latency for the main create request, which fails (500) after 22 seconds. It might be a normal database lock...

But what I don't understand is that all the first three graphs show a very unusual latency for the static resources, the /trac_chrome/ ones, but also the root /gfx/ or /css/. That's quite surprising. In the third graph (attachment:check_latest_changeset.png), we see in particular that /trac_chrome/css/trac.css takes 5 minutes and /css/header.css takes 10 minutes to load...

Now, when things went back to "normal" (attachment:view_ticket_3067.2.png),

(by the way, if there's a .2, it's that the first upload has at least partially succeeded: the file was received and written on the filesystem, but the attachment entry couldn't be written in the db... certainly another db lock).

the timings seems pretty decent: 406 ms for the main request, about 170ms for each secondary request. For those, there seem to be at first a sequential access pattern, then 8 or so are started simultaneously and finish also in a roughly sequential way, being consistent with a 170ms access time.

So it seems that after a while, there are no available server thread available, all must be caught in some kind of database lock, waiting for a timeout. Even the requests for secondary resources can then take an enormous amount of time (graph 3) as apache control process must wait for a child to come back...

I think the problem is two fold:

  • those static resources should be cached by the browser, and not reloaded every time
  • the MaxClients parameter is too low (or there's some other misconfiguration)

Plus the fact that we have too many database locks, of course.

Changed 2 years ago by cboos

After some changes on p.e.c, here's the result of a load test (3 wiki pages, 4 attachement previews, 1 ticket view and 4 timelines)

Changed 2 years ago by jonas

The slowness seems to come from the way sessions are stored and accessed. The session table on p.e.c contains more than 240k rows. This in combination with the lack of indicies on this table resulted in very poor performance of a couple of queries, especially Env.get_known_users and the query that purges old sessions.

The solution I'm proposing is to use a slightly less normalized session table schema where the "last_visit" variable is stored in a separate column (time). The table will also have indicies on the "sid", "time" and "authenticated" columns.

The patch also adds an index for the time column of the ticket_change table.

attachment:session.patch

Changed 2 years ago by jonas

A more optimized session schema (updated).

Changed 2 years ago by jonas

A new version of the patch with a more normalized table schema

Changed 2 years ago by jonas

  • status changed from new to closed
  • resolution set to fixed

The attachment:session3.patch is now applied so this issue is hopefully resolved.

Changed 2 years ago by jonas

just a test, ignore

Changed 2 years ago by jonas

  • cc jonas@… added

More testing

Add/Change #3067 (p.e.c. slowness issues after 0.10dev upgrade)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.