Edgewall Software
Modify

Opened 7 years ago

Last modified 22 months ago

#10665 new defect

Search through all text fields

Reported by: framay <franz.mayer@…> Owned by:
Priority: high Milestone: next-major-releases
Component: search system Version:
Severity: normal Keywords: custom-field patch
Cc: ethan.jucovy@…, Ryan J Ollos Branch:
Release Notes:
API Changes:

Description

When I try to search for VerySpecificTermXYZ VerySpecificTermABC on demo-0.13 Trac finds nothing, because VerySpecificTermXYZ is saved in description field and VerySpecificTermABC is saved in Release Notes in Test-Ticket #600.

It would be nice if the search string will be considered for all fields (not only for one field).

Attachments (4)

search_across_comments_and_custom_fields.diff (2.3 KB ) - added by Ethan Jucovy <ethan.jucovy@…> 7 years ago.
search_across_comments_and_custom_fields.2.diff (2.3 KB ) - added by Ethan Jucovy <ethan.jucovy@…> 7 years ago.
10665-sql-intersects.diff (2.9 KB ) - added by Ethan Jucovy <ethan.jucovy@…> 7 years ago.
10665-sql-intersects_v1_1_1.diff (3.8 KB ) - added by franz.mayer@… 6 years ago.
corrected patch including test-logs for tracking needed time for search queries

Download all attachments as: .zip

Change History (19)

comment:1 by framay <franz.mayer@…>, 7 years ago

Summary: Search through all fieldsSearch through all text fields

comment:2 by framay <franz.mayer@…>, 7 years ago

Keywords: custom-field added
Priority: normalhigh

This defect is quite annoying, since we have a custom-field (similar to Release Notes on t.e.o), which contains quite a lot of information.

Maybe it's just a little thing in the search engine?

by Ethan Jucovy <ethan.jucovy@…>, 7 years ago

by Ethan Jucovy <ethan.jucovy@…>, 7 years ago

comment:3 by Ethan Jucovy <ethan.jucovy@…>, 7 years ago

framay, I've attached a patch attachment:search_across_comments_and_custom_fields.diff which seems to work for me on a local instance, but I haven't tried running the test suite or tested it against any edge cases, and I have no idea whether it would perform badly on large databases. :-)

Instead of the current approach of combining three subqueries (one for ticket fields, one for ticket comments, and one for ticket custom fields) my patch runs a search using JOINs with the following SQL:

SELECT summary, description, reporter, type, id, time, status, resolution 
FROM ticket WHERE id IN
(SELECT DISTINCT ticket.id FROM ticket
 JOIN ticket_change ON ticket.id=ticket_change.ticket
 JOIN ticket_custom ON ticket.id=ticket_custom.ticket WHERE
 (ticket.summary LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR ticket.keywords LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR ticket.description LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR ticket.reporter LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR ticket.cc LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR CAST(ticket.id AS text) LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR ticket_change.newvalue LIKE %VerySpecificTermXYZ% ESCAPE '/'
  OR ticket_custom.value LIKE %VerySpecificTermXYZ% ESCAPE '/')
 AND
 (ticket.summary LIKE %VerySpecificTermABC% ESCAPE '/'
  OR ticket.keywords LIKE %VerySpecificTermABC% ESCAPE '/'
  OR ticket.description LIKE %VerySpecificTermABC% ESCAPE '/'
  OR ticket.reporter LIKE %VerySpecificTermABC% ESCAPE '/'
  OR ticket.cc LIKE %VerySpecificTermABC% ESCAPE '/'
  OR CAST(ticket.id AS text) LIKE %VerySpecificTermABC% ESCAPE '/'
  OR ticket_change.newvalue LIKE %VerySpecificTermABC% ESCAPE '/'
  OR ticket_custom.value LIKE %VerySpecificTermABC% ESCAPE '/')
)

comment:4 by Ryan J Ollos <ryano@…>, 7 years ago

Cc: ryano@… added

comment:5 by Ethan Jucovy <ethan.jucovy@…>, 7 years ago

Cc: ethan.jucovy@… added

in reply to:  3 ; comment:6 by framay <franz.mayer@…>, 7 years ago

Replying to Ethan Jucovy <ethan.jucovy@…>:

framay, I've attached a patch attachment:search_across_comments_and_custom_fields.diff which seems to work for me on a local instance, but I haven't tried running the test suite or tested it against any edge cases, and I have no idea whether it would perform badly on large databases. :-)

Hi Ethan, thanks for the patch and sorry for answering you so late.

I just quickly tested your patch. So far it works well, but there is one thing which does not work at all: When a ticket is created and not yet modified, the search engine won't find it.

Otherwise it works as desired - i just tested it on our intregation system, which has same amount of data as productive system (~6000 tickets).

in reply to:  6 ; comment:7 by Christian Boos, 7 years ago

Replying to framay <franz.mayer@…>:

Otherwise it works as desired - i just tested it on our intregation system, which has same amount of data as productive system (~6000 tickets).

And did you measure the time it takes for a search, before and after the change? Those JOINs look very costly to me… (at least the one on ticket_change). Another approach would be to gather sets of tickets (one per term) and intersect them.

in reply to:  7 comment:8 by framay <franz.mayer@…>, 7 years ago

Replying to cboos:

Replying to framay <franz.mayer@…>:

Otherwise it works as desired - i just tested it on our intregation system, which has same amount of data as productive system (~6000 tickets).

Those JOINs look very costly to me… (at least the one on ticket_change).

Well you are right, it takes a lot longer for searching it.

I build some small logging into Trac core, just for testing:

  • web_ui.py

     
    5858    add_stylesheet, add_warning, auth_link, prevnext_nav, web_context
    5959)
    6060from trac.wiki.formatter import format_to, format_to_html, format_to_oneliner
     61import time
    6162
    6263
    6364class InvalidTicket(TracError):
     
    197198        if not 'ticket' in filters:
    198199            return
    199200        ticket_realm = Resource('ticket')
     201        start_time = time.time()
    200202        with self.env.db_query as db:
    201203            sql, args = search_to_sql(db, ['summary', 'keywords',
    202204                                           'description', 'reporter', 'cc',
     
    230232                           from_utimestamp(ts), author,
    231233                           shorten_result(desc, terms))
    232234       
     235        needed_time = time.time() - start_time
     236        self.log.info("needed %s seconds for searching terms '%s'" % (needed_time, terms))
     237       
    233238        # Attachments
    234239        for result in AttachmentModule(self.env).get_search_results(
    235240            req, ticket_realm, terms):

With Trac core:

INFO: needed 1.28463602066 seconds for searching terms '[u'VerySpecificTermABC']'
INFO: needed 1.2873468399 seconds for searching terms '[u'VerySpecificTermXYZ']'
INFO: needed 1.29791307449 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'

With applied patch:

INFO: [t10665] needed 9.66306495667 seconds for searching terms '[u'VerySpecificTermABC']'
INFO: [t10665] needed 9.68026995659 seconds for searching terms '[u'VerySpecificTermXYZ']'
INFO: [t10665] needed 9.67235589027 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'

by Ethan Jucovy <ethan.jucovy@…>, 7 years ago

Attachment: 10665-sql-intersects.diff added

in reply to:  7 comment:9 by Ethan Jucovy <ethan.jucovy@…>, 7 years ago

Replying to cboos:

Another approach would be to gather sets of tickets (one per term) and intersect them.

I took a stab at this approach — grouping unions of ticket + ticket_change + ticket_custom for each search term, and then intersecting all of those groups together — in attachment:10665-sql-intersects.diff. I think it does what it's supposed to do. I don't have a large enough dataset available to measure the performance impact though.

comment:10 by Christian Boos, 6 years ago

Milestone: next-dev-1.1.x

#10200 was closed as duplicate (well, on second thought it's not strictly a duplicate, I'll make a note there).

The patch might be worth testing, but I'm still wary about the performance implications of doing this on the current db model.

comment:11 by Ethan Jucovy <ethan.jucovy@…>, 6 years ago

I'd be curious to see how this patch does on framay's integration system.

I ran some benchmarks using SQLite in test environments with 10K and 100K tickets. Performance is impacted, but it seems to be much better than the previous patch. It looks like t.e.o's own ticket search would still run in well under half a second for any typical query.

Without the patch, the current search performance is a function of the number of tickets; the number of custom fields; and the amount of text in each custom field.

With the latest patch applied, the search performance is additionally a function of the number of search terms — basically it's as if a separate search is run for each term. So a one-term search takes the same amount of time as the current implementation, and a two-term search takes twice as long.

You can review the times I measured (and the script I used to populate the test databases) here: https://gist.github.com/4707911

So, the patch introduces a performance drop that's linear with the number of search terms. That sounds bad, but you still need to use a large number of (non-quoted) search terms for the slowdown to really be noticeable. I'd be surprised if that's common.

And for performance to really suffer, you need to have a rather large Trac database — more than 10K tickets with more than two long custom text fields. A five-term search against a database of 10K tickets with eight paragraph-length custom fields still returns in under a second; you need to have page-length fields, or well over 10K tickets, or more than five separate search terms, for the search to take longer than a second. Once you have that much data to search through, a dedicated indexing/search service will probably be warranted anyway.

Personally I think this performance hit is worth taking to make the search behavior more intuitive. It seems like for typical usage in most Trac installations, the performance hit will be negligible, and the amount of data and/or search terms needed to make performance suffer would justify a completely different solution that doesn't involve a direct database query, whether or not this patch is applied, and regardless of the underlying database model.

by franz.mayer@…, 6 years ago

corrected patch including test-logs for tracking needed time for search queries

comment:12 by franz.mayer@…, 6 years ago

I just tested your patch quickly (yet I had some problems to apply it with Trac-version 1.1.1, so I did it manually).

Nevertheless it threw an error at first time:

ERROR:  subquery in FROM must have an alias

It seems postgres needs to have a name of that subquery (I just named it "t", see line 225 in 10665-sql-intersects_v1_1_1.diff).

The results of my quick tests on our intregation system (~6500 tickets):

INFO: needed 2.33321213722 seconds for searching terms '[u'Test']'
INFO: needed 1.32855892181 seconds for searching terms '[u'VerySpecificTermABC']'
INFO: needed 2.59698700905 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'
INFO: needed 2.63305997849 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'
INFO: needed 1.33009886742 seconds for searching terms '[u'VerySpecificTermXYZ']'
INFO: needed 2.60466694832 seconds for searching terms '[u'VerySpecificTermABC', u'VerySpecificTermXYZ']'

comment:14 by Ryan J Ollos, 5 years ago

Cc: Ryan J Ollos added; ryano@… removed

comment:15 by Ryan J Ollos, 4 years ago

Milestone: next-dev-1.1.xnext-major-releases

Retargetting tickets to narrow focus for milestone:1.2. Please move the ticket back to milestone:next-dev-1.1.x if you intend to resolve it by milestone:1.2.

comment:16 by figaro, 22 months ago

Keywords: patch added

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
The ticket will be disowned.
as The resolution will be set.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.