Opened 15 years ago
Closed 15 years ago
#9025 closed defect (fixed)
Trac hangs with certain text
Reported by: | anonymous | Owned by: | Christian Boos |
---|---|---|---|
Priority: | highest | Milestone: | 0.11.7 |
Component: | wiki system | Version: | 0.11-stable |
Severity: | critical | Keywords: | performance |
Cc: | srl@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Our server hung with the attached text as the body of a ticket. (actually it is from a 'tab' export of the ticket, which obviously worked OK).
I pasted the same text into a comment and it also caused a hang.
this is r9125 of 0.11.7-stabler0
sqlite3 database 3.6.22, red hat 7, python 2.6.2
Attachments (4)
Change History (13)
by , 15 years ago
comment:2 by , 15 years ago
Milestone: | 0.11.7 |
---|
comment:3 by , 15 years ago
Component: | general → wiki system |
---|---|
Keywords: | performance added |
Milestone: | → 0.11.7 |
Priority: | normal → high |
Reduced test cases and the approximate time they take:
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞ]
> 1s
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠ]
> 3s
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢ]
> 6s
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ]
> 17s
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤሥሦ]
> 35s
…
Note that every other form seem to be fine:
ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ wiki:ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ [wiki:ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ] ["ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ"] [ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ with label]
I've identified the guilty regexp, which is the one for wikipagename_with_label_link.
When a label is actually given, the regexp "works" as expected.
comment:4 by , 15 years ago
Priority: | high → highest |
---|
comment:5 by , 15 years ago
The regexp for CamelCase page names tries to deal with unicode strings (#230) and looks like this:
...(?:\w(?<![a-z0-9_])(?:\w(?<![A-Z0-9_]))*[\w/](?<![a-z0-9_]\))... ---------------- ================ a word character a word character but no lower case but no upper case digit or _ digit or _
In presence of a string like ሀሁሂሃሄህሆለሉሊ...
which contains neither a-z nor A-Z characters but otherwise only \w characters, the above regexp becomes equivalent to:
(\w(\w)*)+
When it has to backtrack (failed match), a regexp like that is not performing very well…
>>> r = re.compile(r'(\w(\w)*)+ ') >>> t = time.time(); r.search('[abcdefghijklmnopqrstu]'); time.time() - t 1.2868900299072266 >>> t = time.time(); r.search('[abcdefghijklmnopqrstuv]'); time.time() - t 2.3317339420318604 >>> t = time.time(); r.search('[abcdefghijklmnopqrstuvw]'); time.time() - t 4.4360249042510986
comment:6 by , 15 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
I reworked the CamelCase regexp to come to a patch which is quite close to mgood's fix for #230 (attachment:unicode_wiki_links.diff:ticket:230).
In this t9025-unicode-CamelCase-r9250.patch I have two variants, one with the full list of characters, the other with regexp ranges, so it's easy to comment out one and test the other.
For me using the full list variant was faster (this is the one enabled in the patch). This is also a tad bit faster than mgood's patch (refreshed version in t9025-mgood-t230-r9250.patch).
This is still unfortunately slower than my original solution using \w and look-behinds, but at least there's no pathological case with this one.
Even better solutions welcomed ;-)
by , 15 years ago
Attachment: | t9025-unicode-CamelCase-r9250.patch added |
---|
use explicit list of lower/upper unicode characters for identifying CamelCase words
by , 15 years ago
Attachment: | t9025-mgood-t230-r9250.patch added |
---|
refreshed patch from mgood on #230, for comparison
comment:7 by , 15 years ago
Here are the results on my machine, for running:
PYTHONPATH=. python trac/tests/allwiki.py
- Current trunk: 1.18 s
- t9025-unicode-CamelCase-r9250.patch, full list: 1.27 s
- t9025-unicode-CamelCase-r9250.patch, ranges: 1.72 s
- t9025-mgood-t230-r9250.patch: 1.30 s
So we're not that much slower with the first variant. I have tried to find a better solution, but I have to admit that my regexp-fu is probably deficient here. Anyway, a performance loss of about 7% sounds ok to me (compared to the alternative, i.e. a DOS), so I would say apply and close.
comment:8 by , 15 years ago
Thanks for testing. Would you mind timing that last one: t9025-unicode-CamelCase-r9250.2.patch?
I think the slight twist to the WikiPageNames it introduces is OK.
by , 15 years ago
Attachment: | t9025-unicode-CamelCase-r9250.2.patch added |
---|
A slightly faster version, albeit with a twist to the WikiPageNames rules (Page/Sub is now a valid wiki name)
comment:9 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Last iteration of the patch + some tests committed as r9252.
2593.txt