Edgewall Software
Modify

Opened 2 months ago

Last modified 2 months ago

#13786 assigned defect

Issues with long words in description load very slowly

Reported by: lnicola@… Owned by: Jun Omae
Priority: normal Milestone: 1.6.1
Component: wiki system Version: 1.6
Severity: normal Keywords:
Cc: Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by anonymous)

We have an instance of Trac with one issue that's very slow to load. We can reproduce this by running:

update ticket set description = (select string_agg('A', '') from generate_series(1, 40000)) where id = 4281;

and opening the ticket.

Updating the description with a sequence of 40k 'A's, makes it load in about 18 seconds. With more (60k or so), it doesn't load at all.

You can see the original issue live on https://trac.osgeo.org/postgis/ticket/ [remove this] 5665, but note that it sometimes fails to load.

We are aware of ticket.max_description_size, and will try to set it to a reasonable value, but 40k isn't that much IMHO.

Attachments (0)

Change History (6)

comment:1 by anonymous, 2 months ago

Description: modified (diff)

comment:2 by Jun Omae, 2 months ago

Component: generalticket system
Milestone: 1.6.1
Owner: set to Jun Omae
Status: newassigned

comment:3 by Jun Omae, 2 months ago

Component: ticket systemwiki system

It is able to generate long string on SQLite like this:

sqlite> update ticket set description=replace(printf('%040000d', 0), '0', 'a') where id=42;

Wiki page that has long word has the same issue. Wiki rendering engine has something wrong.

>>> from trac.test import MockRequest, EnvironmentStub
>>> from trac.loader import load_components
>>> from trac.wiki.formatter import format_to_html
>>> from trac.web.chrome import web_context
>>> import time
>>> env = EnvironmentStub()
>>> load_components(env)
>>> req = MockRequest(env)
>>> context = web_context(req)
>>> def f(text):
...   ts = time.time()
...   rendered = format_to_html(env, context, text)
...   print('%.3g seconds' % (time.time() - ts))
...
>>> f('a' * 5000)
0.149 seconds
>>> f('a' * 10000)
0.575 seconds
>>> f('a' * 20000)
2.27 seconds
>>> f('a' * 40000)
8.93 seconds
>>>
Last edited 2 months ago by Jun Omae (previous) (diff)

comment:4 by Jun Omae, 2 months ago

The root cause is backtracking of regular expression in wiki rendering engine. The following patch avoids the backtracking. I'll push the changes with unit tests.

  • trac/wiki/parser.py

    diff --git a/trac/wiki/parser.py b/trac/wiki/parser.py
    index 23d22ccf2..1218952c6 100644
    a b class WikiParser(Component):  
    9494        # WikiCreole line breaks
    9595        r"(?P<linebreak_wc>!?\\\\)",
    9696        # e-mails
    97         r"(?P<email>!?%s)" % EMAIL_LOOKALIKE_PATTERN,
     97        r"(?P<email>(?:(?<![a-zA-Z0-9.'+_-])|!)%s)" % EMAIL_LOOKALIKE_PATTERN,
    9898        # <wiki:Trac bracket links>
    9999        r"(?P<shrefbr>!?<(?P<snsbr>%s):(?P<stgtbr>[^>]+)>)" % LINK_SCHEME,
    100100        # &, < and > to &amp;, &lt; and &gt;
    101101        r"(?P<htmlescape>[&<>])",
    102102        # wiki:TracLinks or intertrac:wiki:TracLinks
    103         r"(?P<shref>!?((?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \
     103        r"(?P<shref>(?:(?<![a-zA-Z])|!)"
     104        r"(?:(?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \
    104105        % (LINK_SCHEME, LINK_SCHEME, QUOTED_STRING, QUOTED_STRING,
    105106           SHREF_TARGET_FIRST, SHREF_TARGET_MIDDLE, SHREF_TARGET_LAST),
    106107        # [wiki:TracLinks with optional label] or [/relative label]

After the patch, processing time for 40,000 alphabet characters changes from 8.93 seconds to 0.0698 seconds.

>>> from trac.test import MockRequest, EnvironmentStub
>>> from trac.loader import load_components
>>> from trac.wiki.formatter import Formatter, format_to_html
>>> from trac.web.chrome import web_context
>>> import time
>>> env = EnvironmentStub()
>>> load_components(env)
>>> req = MockRequest(env)
>>> context = web_context(req)
>>> def f(text):
...   ts = time.time()
...   rendered = format_to_html(env, context, text)
...   print('%.3g seconds' % (time.time() - ts))
...
>>> f('a' * 5000)
0.0216 seconds
>>> f('a' * 10000)
0.0179 seconds
>>> f('a' * 20000)
0.0355 seconds
>>> f('a' * 40000)
0.0698 seconds
>>> f('a' * 80000)
0.139 seconds
>>> f('a' * 160000)
0.279 seconds
Last edited 2 months ago by Jun Omae (previous) (diff)

comment:5 by Jun Omae, 2 months ago

Hm, description of the ticket page has 125,136 alphanumeric characters as longest word.

$ curl -s 'https://trac.osgeo.org/postgis/ticket/5665?format=tab' | python3 -c 'import sys, re; print(max(len(m) for m in re.findall(r"\w+", sys.stdin.read())))'
125136

After the patch, processing time of the description improves 13s from 42s, however it is pretty long yet.

Last edited 2 months ago by Jun Omae (previous) (diff)

comment:6 by Jun Omae, 2 months ago

Revised the patch. Processing time for the description is less than 1 second.

  • trac/wiki/parser.py

    diff --git a/trac/wiki/parser.py b/trac/wiki/parser.py
    index 23d22ccf2..4249d426b 100644
    a b class WikiParser(Component):  
    9494        # WikiCreole line breaks
    9595        r"(?P<linebreak_wc>!?\\\\)",
    9696        # e-mails
    97         r"(?P<email>!?%s)" % EMAIL_LOOKALIKE_PATTERN,
     97        r"(?P<email>(?:(?<![a-zA-Z0-9.'+-])|!)%s)" % EMAIL_LOOKALIKE_PATTERN,
    9898        # <wiki:Trac bracket links>
    9999        r"(?P<shrefbr>!?<(?P<snsbr>%s):(?P<stgtbr>[^>]+)>)" % LINK_SCHEME,
    100100        # &, < and > to &amp;, &lt; and &gt;
    101101        r"(?P<htmlescape>[&<>])",
    102102        # wiki:TracLinks or intertrac:wiki:TracLinks
    103         r"(?P<shref>!?((?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \
     103        r"(?P<shref>(?:(?<![-a-zA-Z0-9+.])|!)"
     104        r"(?:(?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \
    104105        % (LINK_SCHEME, LINK_SCHEME, QUOTED_STRING, QUOTED_STRING,
    105106           SHREF_TARGET_FIRST, SHREF_TARGET_MIDDLE, SHREF_TARGET_LAST),
    106107        # [wiki:TracLinks with optional label] or [/relative label]

Modify Ticket

Change Properties
Set your email in Preferences
Action
as assigned The owner will remain Jun Omae.
The ticket will be disowned. Next status will be 'new'.
as The resolution will be set. Next status will be 'closed'.
to The owner will be changed from Jun Omae to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.