Opened 13 years ago
Last modified 12 months ago
#10665 new defect
Search through all text fields
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | high | Milestone: | next-major-releases |
Component: | search system | Version: | |
Severity: | normal | Keywords: | custom-field patch |
Cc: | ethan.jucovy@…, Ryan J Ollos | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
When I try to search for VerySpecificTermXYZ VerySpecificTermABC on demo-0.13 Trac finds nothing, because VerySpecificTermXYZ is saved in description field and VerySpecificTermABC is saved in Release Notes in Test-Ticket #600.
It would be nice if the search string will be considered for all fields (not only for one field).
Attachments (4)
Change History (19)
comment:1 by , 13 years ago
Summary: | Search through all fields → Search through all text fields |
---|
comment:2 by , 12 years ago
Keywords: | custom-field added |
---|---|
Priority: | normal → high |
by , 12 years ago
Attachment: | search_across_comments_and_custom_fields.diff added |
---|
by , 12 years ago
Attachment: | search_across_comments_and_custom_fields.2.diff added |
---|
follow-up: 6 comment:3 by , 12 years ago
framay, I've attached a patch attachment:search_across_comments_and_custom_fields.diff which seems to work for me on a local instance, but I haven't tried running the test suite or tested it against any edge cases, and I have no idea whether it would perform badly on large databases. :-)
Instead of the current approach of combining three subqueries (one for ticket fields, one for ticket comments, and one for ticket custom fields) my patch runs a search using JOINs with the following SQL:
SELECT summary, description, reporter, type, id, time, status, resolution FROM ticket WHERE id IN (SELECT DISTINCT ticket.id FROM ticket JOIN ticket_change ON ticket.id=ticket_change.ticket JOIN ticket_custom ON ticket.id=ticket_custom.ticket WHERE (ticket.summary LIKE %VerySpecificTermXYZ% ESCAPE '/' OR ticket.keywords LIKE %VerySpecificTermXYZ% ESCAPE '/' OR ticket.description LIKE %VerySpecificTermXYZ% ESCAPE '/' OR ticket.reporter LIKE %VerySpecificTermXYZ% ESCAPE '/' OR ticket.cc LIKE %VerySpecificTermXYZ% ESCAPE '/' OR CAST(ticket.id AS text) LIKE %VerySpecificTermXYZ% ESCAPE '/' OR ticket_change.newvalue LIKE %VerySpecificTermXYZ% ESCAPE '/' OR ticket_custom.value LIKE %VerySpecificTermXYZ% ESCAPE '/') AND (ticket.summary LIKE %VerySpecificTermABC% ESCAPE '/' OR ticket.keywords LIKE %VerySpecificTermABC% ESCAPE '/' OR ticket.description LIKE %VerySpecificTermABC% ESCAPE '/' OR ticket.reporter LIKE %VerySpecificTermABC% ESCAPE '/' OR ticket.cc LIKE %VerySpecificTermABC% ESCAPE '/' OR CAST(ticket.id AS text) LIKE %VerySpecificTermABC% ESCAPE '/' OR ticket_change.newvalue LIKE %VerySpecificTermABC% ESCAPE '/' OR ticket_custom.value LIKE %VerySpecificTermABC% ESCAPE '/') )
comment:4 by , 12 years ago
Cc: | added |
---|
comment:5 by , 12 years ago
Cc: | added |
---|
follow-up: 7 comment:6 by , 12 years ago
Replying to Ethan Jucovy <ethan.jucovy@…>:
framay, I've attached a patch attachment:search_across_comments_and_custom_fields.diff which seems to work for me on a local instance, but I haven't tried running the test suite or tested it against any edge cases, and I have no idea whether it would perform badly on large databases. :-)
Hi Ethan, thanks for the patch and sorry for answering you so late.
I just quickly tested your patch. So far it works well, but there is one thing which does not work at all: When a ticket is created and not yet modified, the search engine won't find it.
Otherwise it works as desired - i just tested it on our intregation system, which has same amount of data as productive system (~6000 tickets).
follow-ups: 8 9 comment:7 by , 12 years ago
Replying to framay <franz.mayer@…>:
Otherwise it works as desired - i just tested it on our intregation system, which has same amount of data as productive system (~6000 tickets).
And did you measure the time it takes for a search, before and after the change? Those JOINs look very costly to me… (at least the one on ticket_change
). Another approach would be to gather sets of tickets (one per term) and intersect them.
comment:8 by , 12 years ago
Replying to cboos:
Replying to framay <franz.mayer@…>:
Otherwise it works as desired - i just tested it on our intregation system, which has same amount of data as productive system (~6000 tickets).
Those JOINs look very costly to me… (at least the one on
ticket_change
).
Well you are right, it takes a lot longer for searching it.
I build some small logging into Trac core, just for testing:
-
web_ui.py
58 58 add_stylesheet, add_warning, auth_link, prevnext_nav, web_context 59 59 ) 60 60 from trac.wiki.formatter import format_to, format_to_html, format_to_oneliner 61 import time 61 62 62 63 63 64 class InvalidTicket(TracError): … … 197 198 if not 'ticket' in filters: 198 199 return 199 200 ticket_realm = Resource('ticket') 201 start_time = time.time() 200 202 with self.env.db_query as db: 201 203 sql, args = search_to_sql(db, ['summary', 'keywords', 202 204 'description', 'reporter', 'cc', … … 230 232 from_utimestamp(ts), author, 231 233 shorten_result(desc, terms)) 232 234 235 needed_time = time.time() - start_time 236 self.log.info("needed %s seconds for searching terms '%s'" % (needed_time, terms)) 237 233 238 # Attachments 234 239 for result in AttachmentModule(self.env).get_search_results( 235 240 req, ticket_realm, terms):
With Trac core:
INFO: needed 1.28463602066 seconds for searching terms '[u'VerySpecificTermABC']' INFO: needed 1.2873468399 seconds for searching terms '[u'VerySpecificTermXYZ']' INFO: needed 1.29791307449 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'
With applied patch:
INFO: [t10665] needed 9.66306495667 seconds for searching terms '[u'VerySpecificTermABC']' INFO: [t10665] needed 9.68026995659 seconds for searching terms '[u'VerySpecificTermXYZ']' INFO: [t10665] needed 9.67235589027 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'
by , 12 years ago
Attachment: | 10665-sql-intersects.diff added |
---|
comment:9 by , 12 years ago
Replying to cboos:
Another approach would be to gather sets of tickets (one per term) and intersect them.
I took a stab at this approach — grouping unions of ticket + ticket_change + ticket_custom for each search term, and then intersecting all of those groups together — in attachment:10665-sql-intersects.diff. I think it does what it's supposed to do. I don't have a large enough dataset available to measure the performance impact though.
comment:10 by , 12 years ago
Milestone: | → next-dev-1.1.x |
---|
#10200 was closed as duplicate (well, on second thought it's not strictly a duplicate, I'll make a note there).
The patch might be worth testing, but I'm still wary about the performance implications of doing this on the current db model.
comment:11 by , 12 years ago
I'd be curious to see how this patch does on framay's integration system.
I ran some benchmarks using SQLite in test environments with 10K and 100K tickets. Performance is impacted, but it seems to be much better than the previous patch. It looks like t.e.o's own ticket search would still run in well under half a second for any typical query.
Without the patch, the current search performance is a function of the number of tickets; the number of custom fields; and the amount of text in each custom field.
With the latest patch applied, the search performance is additionally a function of the number of search terms — basically it's as if a separate search is run for each term. So a one-term search takes the same amount of time as the current implementation, and a two-term search takes twice as long.
You can review the times I measured (and the script I used to populate the test databases) here: https://gist.github.com/4707911
So, the patch introduces a performance drop that's linear with the number of search terms. That sounds bad, but you still need to use a large number of (non-quoted) search terms for the slowdown to really be noticeable. I'd be surprised if that's common.
And for performance to really suffer, you need to have a rather large Trac database — more than 10K tickets with more than two long custom text fields. A five-term search against a database of 10K tickets with eight paragraph-length custom fields still returns in under a second; you need to have page-length fields, or well over 10K tickets, or more than five separate search terms, for the search to take longer than a second. Once you have that much data to search through, a dedicated indexing/search service will probably be warranted anyway.
Personally I think this performance hit is worth taking to make the search behavior more intuitive. It seems like for typical usage in most Trac installations, the performance hit will be negligible, and the amount of data and/or search terms needed to make performance suffer would justify a completely different solution that doesn't involve a direct database query, whether or not this patch is applied, and regardless of the underlying database model.
by , 12 years ago
Attachment: | 10665-sql-intersects_v1_1_1.diff added |
---|
corrected patch including test-logs for tracking needed time for search queries
comment:12 by , 12 years ago
I just tested your patch quickly (yet I had some problems to apply it with Trac-version 1.1.1
, so I did it manually).
Nevertheless it threw an error at first time:
ERROR: subquery in FROM must have an alias
It seems postgres needs to have a name of that subquery (I just named it "t", see line 225 in 10665-sql-intersects_v1_1_1.diff).
The results of my quick tests on our intregation system (~6500 tickets):
INFO: needed 2.33321213722 seconds for searching terms '[u'Test']' INFO: needed 1.32855892181 seconds for searching terms '[u'VerySpecificTermABC']' INFO: needed 2.59698700905 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']' INFO: needed 2.63305997849 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']' INFO: needed 1.33009886742 seconds for searching terms '[u'VerySpecificTermXYZ']' INFO: needed 2.60466694832 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'
comment:14 by , 10 years ago
Cc: | added; removed |
---|
comment:15 by , 10 years ago
Milestone: | next-dev-1.1.x → next-major-releases |
---|
Retargetting tickets to narrow focus for milestone:1.2. Please move the ticket back to milestone:next-dev-1.1.x if you intend to resolve it by milestone:1.2.
comment:16 by , 7 years ago
Keywords: | patch added |
---|
This defect is quite annoying, since we have a custom-field (similar to Release Notes on t.e.o), which contains quite a lot of information.
Maybe it's just a little thing in the search engine?