Edgewall Software
Modify

Opened 14 years ago

Closed 14 years ago

#9025 closed defect (fixed)

Trac hangs with certain text

Reported by: anonymous Owned by: Christian Boos
Priority: highest Milestone: 0.11.7
Component: wiki system Version: 0.11-stable
Severity: critical Keywords: performance
Cc: srl@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Our server hung with the attached text as the body of a ticket. (actually it is from a 'tab' export of the ticket, which obviously worked OK).

I pasted the same text into a comment and it also caused a hang.

this is r9125 of 0.11.7-stabler0

sqlite3 database 3.6.22, red hat 7, python 2.6.2

Attachments (4)

2593.txt (1.8 KB ) - added by srl@… 14 years ago.
2593.txt
t9025-unicode-CamelCase-r9250.patch (4.0 KB ) - added by Christian Boos 14 years ago.
use explicit list of lower/upper unicode characters for identifying CamelCase words
t9025-mgood-t230-r9250.patch (27.7 KB ) - added by Christian Boos 14 years ago.
refreshed patch from mgood on #230, for comparison
t9025-unicode-CamelCase-r9250.2.patch (3.9 KB ) - added by Christian Boos 14 years ago.
A slightly faster version, albeit with a twist to the WikiPageNames rules (Page/Sub is now a valid wiki name)

Download all attachments as: .zip

Change History (13)

by srl@…, 14 years ago

Attachment: 2593.txt added

2593.txt

comment:1 by Steven R. Loomis <srl@…>, 14 years ago

Cc: srl@… added

I am the author of this ticket.

comment:2 by Steven R. Loomis <srl@…>, 14 years ago

Milestone: 0.11.7

comment:3 by Christian Boos, 14 years ago

Component: generalwiki system
Keywords: performance added
Milestone: 0.11.7
Priority: normalhigh

Reduced test cases and the approximate time they take:

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞ] > 1s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠ] > 3s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢ] > 6s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ] > 17s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤሥሦ] > 35s

Note that every other form seem to be fine:

ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ
wiki:ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ
[wiki:ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ] 
["ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ"]
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ with label] 

I've identified the guilty regexp, which is the one for wikipagename_with_label_link.

When a label is actually given, the regexp "works" as expected.

comment:4 by Christian Boos, 14 years ago

Priority: highhighest

comment:5 by Christian Boos, 14 years ago

The regexp for CamelCase page names tries to deal with unicode strings (#230) and looks like this:

...(?:\w(?<![a-z0-9_])(?:\w(?<![A-Z0-9_]))*[\w/](?<![a-z0-9_]\))...
      ----------------   ================
      a word character   a word character
      but no lower case  but no upper case
      digit or _         digit or _

In presence of a string like ሀሁሂሃሄህሆለሉሊ... which contains neither a-z nor A-Z characters but otherwise only \w characters, the above regexp becomes equivalent to:

(\w(\w)*)+

When it has to backtrack (failed match), a regexp like that is not performing very well…

>>> r = re.compile(r'(\w(\w)*)+ ')
>>> t = time.time(); r.search('[abcdefghijklmnopqrstu]'); time.time() - t
1.2868900299072266
>>> t = time.time(); r.search('[abcdefghijklmnopqrstuv]'); time.time() - t
2.3317339420318604
>>> t = time.time(); r.search('[abcdefghijklmnopqrstuvw]'); time.time() - t
4.4360249042510986

comment:6 by Christian Boos, 14 years ago

Owner: set to Christian Boos
Status: newassigned

I reworked the CamelCase regexp to come to a patch which is quite close to mgood's fix for #230 (attachment:unicode_wiki_links.diff:ticket:230).

In this t9025-unicode-CamelCase-r9250.patch I have two variants, one with the full list of characters, the other with regexp ranges, so it's easy to comment out one and test the other.

For me using the full list variant was faster (this is the one enabled in the patch). This is also a tad bit faster than mgood's patch (refreshed version in t9025-mgood-t230-r9250.patch).

This is still unfortunately slower than my original solution using \w and look-behinds, but at least there's no pathological case with this one.

Even better solutions welcomed ;-)

by Christian Boos, 14 years ago

use explicit list of lower/upper unicode characters for identifying CamelCase words

by Christian Boos, 14 years ago

refreshed patch from mgood on #230, for comparison

comment:7 by Remy Blank, 14 years ago

Here are the results on my machine, for running:

PYTHONPATH=. python trac/tests/allwiki.py

So we're not that much slower with the first variant. I have tried to find a better solution, but I have to admit that my regexp-fu is probably deficient here. Anyway, a performance loss of about 7% sounds ok to me (compared to the alternative, i.e. a DOS), so I would say apply and close.

comment:8 by Christian Boos, 14 years ago

Thanks for testing. Would you mind timing that last one: t9025-unicode-CamelCase-r9250.2.patch?

I think the slight twist to the WikiPageNames it introduces is OK.

by Christian Boos, 14 years ago

A slightly faster version, albeit with a twist to the WikiPageNames rules (Page/Sub is now a valid wiki name)

comment:9 by Christian Boos, 14 years ago

Resolution: fixed
Status: assignedclosed

Last iteration of the patch + some tests committed as r9252.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.