Opened 3 months ago
Last modified 3 months ago
#13786 assigned defect
Issues with long words in description load very slowly
Reported by: | Owned by: | Jun Omae | |
---|---|---|---|
Priority: | normal | Milestone: | 1.6.1 |
Component: | wiki system | Version: | 1.6 |
Severity: | normal | Keywords: | |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
We have an instance of Trac with one issue that's very slow to load. We can reproduce this by running:
update ticket set description = (select string_agg('A', '') from generate_series(1, 40000)) where id = 4281;
and opening the ticket.
Updating the description with a sequence of 40k 'A's, makes it load in about 18 seconds. With more (60k or so), it doesn't load at all.
You can see the original issue live on https://trac.osgeo.org/postgis/ticket/ [remove this] 5665, but note that it sometimes fails to load.
We are aware of ticket.max_description_size
, and will try to set it to a reasonable value, but 40k isn't that much IMHO.
Attachments (0)
Change History (6)
comment:1 by , 3 months ago
Description: | modified (diff) |
---|
comment:2 by , 3 months ago
Component: | general → ticket system |
---|---|
Milestone: | → 1.6.1 |
Owner: | set to |
Status: | new → assigned |
comment:3 by , 3 months ago
Component: | ticket system → wiki system |
---|
comment:4 by , 3 months ago
The root cause is backtracking of regular expression in wiki rendering engine. The following patch avoids the backtracking. I'll push the changes with unit tests.
-
trac/wiki/parser.py
diff --git a/trac/wiki/parser.py b/trac/wiki/parser.py index 23d22ccf2..1218952c6 100644
a b class WikiParser(Component): 94 94 # WikiCreole line breaks 95 95 r"(?P<linebreak_wc>!?\\\\)", 96 96 # e-mails 97 r"(?P<email> !?%s)" % EMAIL_LOOKALIKE_PATTERN,97 r"(?P<email>(?:(?<![a-zA-Z0-9.'+_-])|!)%s)" % EMAIL_LOOKALIKE_PATTERN, 98 98 # <wiki:Trac bracket links> 99 99 r"(?P<shrefbr>!?<(?P<snsbr>%s):(?P<stgtbr>[^>]+)>)" % LINK_SCHEME, 100 100 # &, < and > to &, < and > 101 101 r"(?P<htmlescape>[&<>])", 102 102 # wiki:TracLinks or intertrac:wiki:TracLinks 103 r"(?P<shref>!?((?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \ 103 r"(?P<shref>(?:(?<![a-zA-Z])|!)" 104 r"(?:(?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \ 104 105 % (LINK_SCHEME, LINK_SCHEME, QUOTED_STRING, QUOTED_STRING, 105 106 SHREF_TARGET_FIRST, SHREF_TARGET_MIDDLE, SHREF_TARGET_LAST), 106 107 # [wiki:TracLinks with optional label] or [/relative label]
After the patch, processing time for 40,000 alphabet characters changes from 8.93 seconds to 0.0698 seconds.
>>> from trac.test import MockRequest, EnvironmentStub >>> from trac.loader import load_components >>> from trac.wiki.formatter import Formatter, format_to_html >>> from trac.web.chrome import web_context >>> import time >>> env = EnvironmentStub() >>> load_components(env) >>> req = MockRequest(env) >>> context = web_context(req) >>> def f(text): ... ts = time.time() ... rendered = format_to_html(env, context, text) ... print('%.3g seconds' % (time.time() - ts)) ... >>> f('a' * 5000) 0.0216 seconds >>> f('a' * 10000) 0.0179 seconds >>> f('a' * 20000) 0.0355 seconds >>> f('a' * 40000) 0.0698 seconds >>> f('a' * 80000) 0.139 seconds >>> f('a' * 160000) 0.279 seconds
comment:5 by , 3 months ago
Hm, description of the ticket page has 125,136 alphanumeric characters as longest word.
$ curl -s 'https://trac.osgeo.org/postgis/ticket/5665?format=tab' | python3 -c 'import sys, re; print(max(len(m) for m in re.findall(r"\w+", sys.stdin.read())))' 125136
After the patch, processing time of the description improves 13s from 42s, however it is pretty long yet.
comment:6 by , 3 months ago
Revised the patch. Processing time for the description is less than 1 second.
-
trac/wiki/parser.py
diff --git a/trac/wiki/parser.py b/trac/wiki/parser.py index 23d22ccf2..4249d426b 100644
a b class WikiParser(Component): 94 94 # WikiCreole line breaks 95 95 r"(?P<linebreak_wc>!?\\\\)", 96 96 # e-mails 97 r"(?P<email> !?%s)" % EMAIL_LOOKALIKE_PATTERN,97 r"(?P<email>(?:(?<![a-zA-Z0-9.'+-])|!)%s)" % EMAIL_LOOKALIKE_PATTERN, 98 98 # <wiki:Trac bracket links> 99 99 r"(?P<shrefbr>!?<(?P<snsbr>%s):(?P<stgtbr>[^>]+)>)" % LINK_SCHEME, 100 100 # &, < and > to &, < and > 101 101 r"(?P<htmlescape>[&<>])", 102 102 # wiki:TracLinks or intertrac:wiki:TracLinks 103 r"(?P<shref>!?((?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \ 103 r"(?P<shref>(?:(?<![-a-zA-Z0-9+.])|!)" 104 r"(?:(?P<sns>%s):(?P<stgt>%s:(?:%s)|%s|%s(?:%s*%s)?)))" \ 104 105 % (LINK_SCHEME, LINK_SCHEME, QUOTED_STRING, QUOTED_STRING, 105 106 SHREF_TARGET_FIRST, SHREF_TARGET_MIDDLE, SHREF_TARGET_LAST), 106 107 # [wiki:TracLinks with optional label] or [/relative label]
It is able to generate long string on SQLite like this:
Wiki page that has long word has the same issue. Wiki rendering engine has something wrong.