Edgewall Software
Modify

Opened 18 years ago

Closed 18 years ago

Last modified 10 years ago

#2759 closed defect (fixed)

Search query is too simplistic

Reported by: Andres Salomon <dilinger@…> Owned by: Matthew Good
Priority: normal Milestone: 0.10
Component: search system Version: none
Severity: major Keywords:
Cc: dserodio@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Christopher Lenz)

Hi,

While looking through Trac's Search stuff in trunk, I've noticed that the search logic is far too easily fooled (some requires malicious intent, some does not. Here are some examples (I have quoted what I actually searched for; that does not mean I used quotes in the text input field unless there are quotes inside the string):

  1. searching for "'core feature'" returns 1 result (not highlighted)
  2. " 'core feature'" returns 0 results; the problem is the query_to_sql() function only matches quotes at the beginning or end of the query. If someone wants to do a moderately complex search (ie, "foo 'bar baz'"), they will not get the results they expect.
  3. ""core feature"" returns 1 result, and replaces the " with &#34; in the text input field.
  4. "&#34;core feature&#34;" (which is what the text input window contains after searching for ""core feature"") returns 0 results
  5. "a" fails the 'too short' test.
  6. "%a%" happily returns every result in the database.
  7. "_a_" also will return every result in the database.
  8. ditto for "'a'"
  9. "\%a" doesn't actually tell me how many results it finds (i assume the sql query it's running is invalid, but I'm not sure how to turn on query logging in sqlite). It returns 4 wiki pages, however.

These are all done on a fresh, default Trac install (r2894). 9 is worrisome, as if there is unquoted sql making its way into the database, that's a security risk. I will follow up shortly w/ a patch to rework this search stuff.

Attachments (5)

trac-search-r2894.patch (9.0 KB ) - added by Andres Salomon <dilinger@…> 18 years ago.
handle search quoting more intelligently; against r2894.
trac-search2-r2894.patch (997 bytes ) - added by Andres Salomon <dilinger@…> 18 years ago.
don't replace quotes w/ html entities; against r2894
trac-search3-r2894.patch (1.1 KB ) - added by Andres Salomon <dilinger@…> 18 years ago.
javascript highlighting fix; against r2894
trac-search-combined.diff (9.3 KB ) - added by Matthew Good 18 years ago.
combination of previous patches applied on source:/trunk@2904
trac-search-escape.diff (9.0 KB ) - added by Matthew Good 18 years ago.
escape sql LIKE syntax in search query

Download all attachments as: .zip

Change History (23)

comment:1 by Christopher Lenz, 18 years ago

Description: modified (diff)

(fixed description)

by Andres Salomon <dilinger@…>, 18 years ago

Attachment: trac-search-r2894.patch added

handle search quoting more intelligently; against r2894.

by Andres Salomon <dilinger@…>, 18 years ago

Attachment: trac-search2-r2894.patch added

don't replace quotes w/ html entities; against r2894

comment:2 by Andres Salomon <dilinger@…>, 18 years ago

The first patch fixes a bunch of issues; the second patch fixes the third and fourth issues on that list. The highlighting is still a bit screwy, but I'm not sure what's actually implementing that (pointers would be helpful!). Upon further inspection, the ninth entry on the list above isn't an actual problem; I just hadn't realized that the number of search results was only provided if there's more than one page worth of results.

As far as things like '%a%' returning too many results; I think the best solution in this case (instead of pretending that we can enforce some sort of search policy on the user) would be to limit results to 50 (via a SQL modifier, LIMIT 50). The upper limit on search results could be configurable via config file, if that sort of flexibility is desired.

If you agree w/ that last bit, I'll whip up a patch.

comment:3 by dserodio@…, 18 years ago

Cc: dserodio@… added

by Andres Salomon <dilinger@…>, 18 years ago

Attachment: trac-search3-r2894.patch added

javascript highlighting fix; against r2894

comment:4 by Andres Salomon <dilinger@…>, 18 years ago

patch trac-search3 fixes the highlighting of quoted search terms in the results. Previously, the code was doing a simplistic split(∧s+/) on the search terms, which means that something like "'core feature'" would be split into "'core" and "feature'", which wouldn't match anything (variations on that, such as "'core feature '", would partially match).

This new patch uses the same regex that's used by the python code (in the first patch) to group search terms by quotes, and then proceeds to strip quotes when looking for matches.

comment:5 by Andres Salomon <dilinger@…>, 18 years ago

Regarding the third patch, I also have to wonder why this is being done in (*cringe*) javascript instead of being handled by the backend? It seems like the shorten_*() functions already have most of the necessary logic…

comment:6 by Matthew Good, 18 years ago

The Javascript is used to highlight search terms on any page, either from Trac's own search results or if you were referred from a search on another search engine. The shorten_result method is meant to take an excerpt of a string that contains some of the words from the query. Javascript is appropriate for the search term highlighting since this makes it easy to post-process every page and highlight any search terms found in the body.

by Matthew Good, 18 years ago

Attachment: trac-search-combined.diff added

combination of previous patches applied on source:/trunk@2904

by Matthew Good, 18 years ago

Attachment: trac-search-escape.diff added

escape sql LIKE syntax in search query

comment:7 by Matthew Good, 18 years ago

The patches look good, though it's easier to apply a single patch than individual diffs, so I went ahead and attached a combined version. I've also added an updated patch that escapes '%' and '_' characters found in the search terms. I'm not sure if PostgreSQL will work right with "ESCAPE '\'". Since backslash is already an escape character in PostgreSQL, it may need escaped for that string literal, so it's possible that another character will need to be used for the escape. Can someone test this out?

What is the comment in the sqlite backend referring to? Is there something more you wanted to do there, or can it be ignored?

comment:8 by Andres Salomon <dilinger@…>, 18 years ago

I held off on escaping % and _ because my tests showed it doing weird things when I tried it. For example, if you try searching for "%a%", it will return all results in the database, and the javascript highlighting will not match anything. If you escape the %'s (in the code itself, not in the search query), it will return no results. If you try searching for "99%" (and actually have a wiki page w/ that includes the string "99%"), it will return just that page, and the javascript highlighting will highlight the "99%" string. If you escape the %'s, it will return no results, which is clearly not what is intended.

Thus, I held off on the % and _ escaping for now; I would rather deal w/ that issue separately.

comment:9 by Andres Salomon <dilinger@…>, 18 years ago

Oh, and the comment in the sqlite backend is the same bit of documentation that's in the postgres backend; it snuck in there. :)

The postgres backend mentions that the like() function is a temporary hack; the sqlite backend should probably do the same.

comment:10 by Matthew Good, 18 years ago

Owner: changed from Jonas Borgström to Matthew Good
Status: newassigned

Andres: have you tried the last patch I attached? It works for my tests on SQLite searching for text including '%' or '_' (now searching for Python methods like __init__ should work).

Let me know if you have problems with it, but I think that I'll commit it once I get a chance to test it on PostgreSQL.

comment:11 by Christian Boos, 18 years ago

See also #2718.

comment:12 by Andres Salomon <dilinger@…>, 18 years ago

Matt:

The escaping patch works perfectly, I've included it in my bzr repository as well.

comment:13 by Andres Salomon <dilinger@…>, 18 years ago

Matt, would it help if I tested on postgres?

comment:14 by Matthew Good, 18 years ago

Resolution: fixed
Status: assignedclosed

Ok, as I was afraid, trying to use "\" as the ESCAPE character in the SQL was a problem for PostgreSQL, since it's already an escape character on that DB. So, I've changed it to "_" and committed as [2940].

Thanks Andres.

comment:15 by Andres Salomon <dilinger@…>, 18 years ago

Resolution: fixed
Status: closedreopened

Hm. Matt, that last change doesn't seem to work properly for me; at least not w/ sqlite. At least, searching for "%a%" and "_a_" returns incorrect results. I haven't looked at why this is, but the last patch is pretty different in terms of what actually gets escaped.

comment:16 by Matthew Good, 18 years ago

Resolution: fixed
Status: reopenedclosed

Ok, apparently SQLite doesn't do the escaping properly if you use a SQL wildcard as the ESCAPR character. So, [2944] should resolve this.

comment:17 by w.pasman@…, 10 years ago

I think the docu needs to be fixed to explain all these?

And, how do I search for "not use this" literally, but without the quotes? And with the quotes?

This is not explained in the manual, and I am getting every page containing "not", "use" or "this" if I try without the quotes.

in reply to:  17 comment:18 by Ryan J Ollos, 10 years ago

Replying to w.pasman@…:

… I am getting every page containing "not", "use" or "this" if I try without the quotes.

That shouldn't be the case, at least in Trac 1.0.x, though there is defect in Trac 1.0.1 that results in milestones being returned in the results when only one term is matched: #11545.

Please ask on the MailingList if you continue to have issues.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Matthew Good.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Matthew Good to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.