Edgewall Software
Modify

Ticket #9025 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

Trac hangs with certain text

Reported by: anonymous Owned by: cboos
Priority: highest Milestone: 0.11.7
Component: wiki system Version: 0.11-stable
Severity: critical Keywords: performance
Cc: srl@…
Release Notes:
API Changes:

Description

Our server hung with the attached text as the body of a ticket. (actually it is from a 'tab' export of the ticket, which obviously worked OK).

I pasted the same text into a comment and it also caused a hang.

this is r9125 of 0.11.7-stabler0

sqlite3 database 3.6.22, red hat 7, python 2.6.2

Attachments

2593.txt (1.8 KB) - added by srl@… 2 years ago.
2593.txt
t9025-unicode-CamelCase-r9250.patch (4.0 KB) - added by cboos 2 years ago.
use explicit list of lower/upper unicode characters for identifying CamelCase words
t9025-mgood-t230-r9250.patch (27.7 KB) - added by cboos 2 years ago.
refreshed patch from mgood on #230, for comparison
t9025-unicode-CamelCase-r9250.2.patch (3.9 KB) - added by cboos 2 years ago.
A slightly faster version, albeit with a twist to the WikiPageNames rules (Page/Sub? is now a valid wiki name)

Download all attachments as: .zip

Change History

Changed 2 years ago by srl@…

2593.txt

comment:1 Changed 2 years ago by Steven R. Loomis <srl@…>

  • Cc srl@… added

I am the author of this ticket.

comment:2 Changed 2 years ago by Steven R. Loomis <srl@…>

  • Milestone 0.11.7 deleted

comment:3 Changed 2 years ago by cboos

  • Component changed from general to wiki system
  • Keywords performance added
  • Milestone set to 0.11.7
  • Priority changed from normal to high

Reduced test cases and the approximate time they take:

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞ] > 1s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠ] > 3s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢ] > 6s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ] > 17s

[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤሥሦ] > 35s

...

Note that every other form seem to be fine:

ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ
wiki:ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ
[wiki:ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ] 
["ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ"]
[ሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕሖመሙሚማሜምሞሟሠሡሢሣሤ with label] 

I've identified the guilty regexp, which is the one for wikipagename_with_label_link.

When a label is actually given, the regexp "works" as expected.

comment:4 Changed 2 years ago by cboos

  • Priority changed from high to highest

comment:5 Changed 2 years ago by cboos

The regexp for CamelCase page names tries to deal with unicode strings (#230) and looks like this:

...(?:\w(?<![a-z0-9_])(?:\w(?<![A-Z0-9_]))*[\w/](?<![a-z0-9_]\))...
      ----------------   ================
      a word character   a word character
      but no lower case  but no upper case
      digit or _         digit or _

In presence of a string like ሀሁሂሃሄህሆለሉሊ... which contains neither a-z nor A-Z characters but otherwise only \w characters, the above regexp becomes equivalent to:

(\w(\w)*)+

When it has to backtrack (failed match), a regexp like that is not performing very well...

>>> r = re.compile(r'(\w(\w)*)+ ')
>>> t = time.time(); r.search('[abcdefghijklmnopqrstu]'); time.time() - t
1.2868900299072266
>>> t = time.time(); r.search('[abcdefghijklmnopqrstuv]'); time.time() - t
2.3317339420318604
>>> t = time.time(); r.search('[abcdefghijklmnopqrstuvw]'); time.time() - t
4.4360249042510986

comment:6 Changed 2 years ago by cboos

  • Owner set to cboos
  • Status changed from new to assigned

I reworked the CamelCase regexp to come to a patch which is quite close to mgood's fix for #230 (attachment:unicode_wiki_links.diff:ticket:230).

In this t9025-unicode-CamelCase-r9250.patch I have two variants, one with the full list of characters, the other with regexp ranges, so it's easy to comment out one and test the other.

For me using the full list variant was faster (this is the one enabled in the patch).
This is also a tad bit faster than mgood's patch (refreshed version in t9025-mgood-t230-r9250.patch).

This is still unfortunately slower than my original solution using \w and look-behinds, but at least there's no pathological case with this one.

Even better solutions welcomed ;-)

Changed 2 years ago by cboos

use explicit list of lower/upper unicode characters for identifying CamelCase words

Changed 2 years ago by cboos

refreshed patch from mgood on #230, for comparison

comment:7 Changed 2 years ago by rblank

Here are the results on my machine, for running:

PYTHONPATH=. python trac/tests/allwiki.py

So we're not that much slower with the first variant. I have tried to find a better solution, but I have to admit that my regexp-fu is probably deficient here. Anyway, a performance loss of about 7% sounds ok to me (compared to the alternative, i.e. a DOS), so I would say apply and close.

comment:8 Changed 2 years ago by cboos

Thanks for testing. Would you mind timing that last one: t9025-unicode-CamelCase-r9250.2.patch?

I think the slight twist to the WikiPageNames it introduces is OK.

Changed 2 years ago by cboos

A slightly faster version, albeit with a twist to the WikiPageNames rules (Page/Sub? is now a valid wiki name)

comment:9 Changed 2 years ago by cboos

  • Resolution set to fixed
  • Status changed from assigned to closed

Last iteration of the patch + some tests committed as r9252.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
to The owner will be changed from cboos. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.