#2759 closed defect (fixed)
Search query is too simplistic
Reported by: | Owned by: | Matthew Good | |
---|---|---|---|
Priority: | normal | Milestone: | 0.10 |
Component: | search system | Version: | none |
Severity: | major | Keywords: | |
Cc: | dserodio@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
Hi,
While looking through Trac's Search stuff in trunk, I've noticed that the search logic is far too easily fooled (some requires malicious intent, some does not. Here are some examples (I have quoted what I actually searched for; that does not mean I used quotes in the text input field unless there are quotes inside the string):
- searching for "'core feature'" returns 1 result (not highlighted)
- " 'core feature'" returns 0 results; the problem is the query_to_sql() function only matches quotes at the beginning or end of the query. If someone wants to do a moderately complex search (ie, "foo 'bar baz'"), they will not get the results they expect.
- ""core feature"" returns 1 result, and replaces the " with " in the text input field.
- ""core feature"" (which is what the text input window contains after searching for ""core feature"") returns 0 results
- "a" fails the 'too short' test.
- "%a%" happily returns every result in the database.
- "_a_" also will return every result in the database.
- ditto for "'a'"
- "\%a" doesn't actually tell me how many results it finds (i assume the sql query it's running is invalid, but I'm not sure how to turn on query logging in sqlite). It returns 4 wiki pages, however.
These are all done on a fresh, default Trac install (r2894). 9 is worrisome, as if there is unquoted sql making its way into the database, that's a security risk. I will follow up shortly w/ a patch to rework this search stuff.
Attachments (5)
Change History (23)
comment:1 by , 19 years ago
Description: | modified (diff) |
---|
by , 19 years ago
Attachment: | trac-search-r2894.patch added |
---|
handle search quoting more intelligently; against r2894.
by , 19 years ago
Attachment: | trac-search2-r2894.patch added |
---|
don't replace quotes w/ html entities; against r2894
comment:2 by , 19 years ago
The first patch fixes a bunch of issues; the second patch fixes the third and fourth issues on that list. The highlighting is still a bit screwy, but I'm not sure what's actually implementing that (pointers would be helpful!). Upon further inspection, the ninth entry on the list above isn't an actual problem; I just hadn't realized that the number of search results was only provided if there's more than one page worth of results.
As far as things like '%a%' returning too many results; I think the best solution in this case (instead of pretending that we can enforce some sort of search policy on the user) would be to limit results to 50 (via a SQL modifier, LIMIT 50). The upper limit on search results could be configurable via config file, if that sort of flexibility is desired.
If you agree w/ that last bit, I'll whip up a patch.
comment:3 by , 19 years ago
Cc: | added |
---|
by , 19 years ago
Attachment: | trac-search3-r2894.patch added |
---|
javascript highlighting fix; against r2894
comment:4 by , 19 years ago
patch trac-search3 fixes the highlighting of quoted search terms in the results. Previously, the code was doing a simplistic split(∧s+/) on the search terms, which means that something like "'core feature'" would be split into "'core" and "feature'", which wouldn't match anything (variations on that, such as "'core feature '", would partially match).
This new patch uses the same regex that's used by the python code (in the first patch) to group search terms by quotes, and then proceeds to strip quotes when looking for matches.
comment:5 by , 19 years ago
Regarding the third patch, I also have to wonder why this is being done in (*cringe*) javascript instead of being handled by the backend? It seems like the shorten_*() functions already have most of the necessary logic…
comment:6 by , 19 years ago
The Javascript is used to highlight search terms on any page, either from Trac's own search results or if you were referred from a search on another search engine. The shorten_result
method is meant to take an excerpt of a string that contains some of the words from the query. Javascript is appropriate for the search term highlighting since this makes it easy to post-process every page and highlight any search terms found in the body.
by , 19 years ago
Attachment: | trac-search-combined.diff added |
---|
combination of previous patches applied on source:/trunk@2904
comment:7 by , 19 years ago
The patches look good, though it's easier to apply a single patch than individual diffs, so I went ahead and attached a combined version. I've also added an updated patch that escapes '%' and '_' characters found in the search terms. I'm not sure if PostgreSQL will work right with "ESCAPE '\'"
. Since backslash is already an escape character in PostgreSQL, it may need escaped for that string literal, so it's possible that another character will need to be used for the escape. Can someone test this out?
What is the comment in the sqlite backend referring to? Is there something more you wanted to do there, or can it be ignored?
comment:8 by , 19 years ago
I held off on escaping % and _ because my tests showed it doing weird things when I tried it. For example, if you try searching for "%a%", it will return all results in the database, and the javascript highlighting will not match anything. If you escape the %'s (in the code itself, not in the search query), it will return no results. If you try searching for "99%" (and actually have a wiki page w/ that includes the string "99%"), it will return just that page, and the javascript highlighting will highlight the "99%" string. If you escape the %'s, it will return no results, which is clearly not what is intended.
Thus, I held off on the % and _ escaping for now; I would rather deal w/ that issue separately.
comment:9 by , 19 years ago
Oh, and the comment in the sqlite backend is the same bit of documentation that's in the postgres backend; it snuck in there. :)
The postgres backend mentions that the like() function is a temporary hack; the sqlite backend should probably do the same.
comment:10 by , 19 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Andres: have you tried the last patch I attached? It works for my tests on SQLite searching for text including '%' or '_' (now searching for Python methods like __init__
should work).
Let me know if you have problems with it, but I think that I'll commit it once I get a chance to test it on PostgreSQL.
comment:12 by , 19 years ago
Matt:
The escaping patch works perfectly, I've included it in my bzr repository as well.
comment:14 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ok, as I was afraid, trying to use "\" as the ESCAPE
character in the SQL was a problem for PostgreSQL, since it's already an escape character on that DB. So, I've changed it to "_" and committed as [2940].
Thanks Andres.
comment:15 by , 19 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Hm. Matt, that last change doesn't seem to work properly for me; at least not w/ sqlite. At least, searching for "%a%" and "_a_" returns incorrect results. I haven't looked at why this is, but the last patch is pretty different in terms of what actually gets escaped.
comment:16 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Ok, apparently SQLite doesn't do the escaping properly if you use a SQL wildcard as the ESCAPR character. So, [2944] should resolve this.
follow-up: 18 comment:17 by , 11 years ago
I think the docu needs to be fixed to explain all these?
And, how do I search for "not use this" literally, but without the quotes? And with the quotes?
This is not explained in the manual, and I am getting every page containing "not", "use" or "this" if I try without the quotes.
comment:18 by , 11 years ago
Replying to w.pasman@…:
… I am getting every page containing "not", "use" or "this" if I try without the quotes.
That shouldn't be the case, at least in Trac 1.0.x, though there is defect in Trac 1.0.1 that results in milestones being returned in the results when only one term is matched: #11545.
Please ask on the MailingList if you continue to have issues.
(fixed description)