Edgewall Software

Search Refactoring

One of the complaints I've heard regularly about Trac was its not so useful search module. The usual answer was that we have nice ideas and plans for an AdvancedSearch, at some point in the future (1.0?).

But nevertheless, there are some simple steps that could be taken to incrementally improve the current situation, changes that don't require any kind of big redesign.

Issues

Ranking

The current situation is simply to find all the resources matching all the given search terms. It's either hit or miss, and the results are sorted in chronological order, most recent item first.

There are two things that could be done quite easily:

  • give more importance to the resources that have many hits
  • let the search source increase the relevance of a match if the match happens in some "search sensitive" area, like the summary or the keywords associated to that resource

While useful, the chronological ordering would gain to be replaced by a more useful relevance ordering, highest ranking results first.

Reuse of results

When you're searching for some tickets, you sometimes want to be able to restrict your search to either the opened or the closed tickets, or according to some other constraints on tickets. Since 0.10 (#2859), closed tickets are shown using the usual strike-through style and this is already helpful. But what would really be needed sometimes are the filtering capabilities of the ticket custom query module. In turn, it is inconvenient to start with a plain text search in the custom query. This has been discussed in #1329 (and the alternative #4824).

Direct access to resources by their numbers

While there's always been the possibility to use the "quickjump" facility in the search page, by writing any kind of TracLinks in the search field to go directly to the resource referred to by the link, there has been nevertheless constant request for the possibility to find resources by their numbers (see #1268 and numerous duplicates: #1644, #2919, #3419, #3856). There's also the need to take special care of short numbers (#4398) which are wrongly disqualified as search terms.

Proposed Solutions

Ranking

While trying different ranking ideas, I found that the major problem was managing to rank the results coming from different sources together in a fair way. For that, a "neutral" measure was needed. Such a measure can be the "number of hits" for the search terms within the matched resource. The basic idea is that each time the sequence of search terms appear "somewhere" in the text associated to a resource, this counts for one hit. This means that if there's a hit in a multi-line text field (like a wiki page content, or a ticket description or comment), we can search for multiple occurrences of the match and therefore add more "hits". Also, a source can decide that some hits are weighing more than others, like a hit in a "keywords" field.

The search page now presents the results entries ordered by relevance, the highest ranking entries first.

Nevertheless, it's still possible to look at the results in chronological order. This can be achieved by following the "Results by date" page navigation link, which leads to a Timeline style page showing the results ordered by date, most recent entries first.

Reuse of results

Implemented the proposal made in trac-dev:335 to "send" the list of matched tickets to the Ticket Query view, for further refinement of the search.

This is done by adding an 'alternate' format link named Ticket Query.

Likewise, it would be possible to send the changeset found to a log listing, which also now supports a list of individual changesets.

Direct access to resources by their numbers

The initial solution for #1268 suffers from some drawbacks, most notably that it doesn't match the exact number but a substring of it: this generates many false positives. We should not only match the exact id, but also present it as an "outstanding result", like TracLinks matches, presented in a special way. This can now be achieved at the level of the search provider, as it can return a result with a "special" weight.

r5553 does this for tickets. It should be finished for revisions and also done for reports.


Implemented so far:

TODO

Other minor glitches:

Last modified 10 years ago Last modified on May 28, 2008, 10:26:17 PM
Note: See TracWiki for help on using the wiki.