Edgewall Software

Search Refactoring

One of the complaints voiced regularly about Trac is its not so useful search module and as a result nice ideas and plans for an AdvancedSearch have been drafted. But nevertheless, there are some simple steps that could be taken to incrementally improve the search capability, changes that don't require any kind of big redesign.

Issues

Ranking

The current situation is simply to find all the resources matching all the given search terms. The results are in chronological order, most recent item first.

There are two things that could be done quite easily:

  • give more importance to the resources that have many hits
  • let the search source increase the relevance of a match if the match happens in some "search sensitive" area, like the summary or the keywords associated to that resource

While useful, the chronological ordering would gain to be replaced by a more useful relevance ordering, highest ranking results first.

Reuse of results

When you're searching for some tickets, you sometimes want to be able to restrict your search to either the opened or the closed tickets, or according to some other constraints on tickets. Since 0.10 (#2859), closed tickets are shown using the usual strike-through style and this is already helpful. But what would really be needed sometimes are the filtering capabilities of the ticket custom query module. In turn, it is inconvenient to start with a plain text search in the custom query. This has been discussed in #1329 (and the alternative #4824).

Direct access to resources by their numbers

While there's always been the possibility to use the "quickjump" facility in the search page, by writing any kind of TracLinks in the search field to go directly to the resource referred to by the link, there has been nevertheless requests for the possibility to find resources by their numbers (see #1268 and its duplicates: #1644, #2919, #3419, #3856). There's also the need to take special care of short numbers (#4398) which are incorrectly qualified as search terms.

Proposed Solutions

Ranking

While trying different ranking ideas, I found that the major problem was managing to rank the results coming from different sources together in a fair way. For that, a "neutral" measure was needed. Such a measure can be the "number of hits" for the search terms within the matched resource. The basic idea is that each time the sequence of search terms appear "somewhere" in the text associated to a resource, this counts for one hit. This means that if there's a hit in a multi-line text field (like a wiki page content, or a ticket description or comment), we can search for multiple occurrences of the match and therefore add more "hits". Also, a source can decide that some hits are weighing more than others, like a hit in a "keywords" field.

The search page now presents the results entries ordered by relevance, the highest ranking entries first.

Nevertheless, it's still possible to look at the results in chronological order. This can be achieved by following the "Results by date" page navigation link, which leads to a Timeline style page showing the results ordered by date, most recent entries first.

Reuse of results

Implemented the proposal made in trac-dev:335 to "send" the list of matched tickets to the Ticket Query view, for further refinement of the search.

This is done by adding an 'alternate' format link named Ticket Query.

Likewise, it would be possible to send the changeset found to a log listing, which also now supports a list of individual changesets.

Direct access to resources by their numbers

The initial solution for #1268 suffers from some drawbacks, most notably that it doesn't match the exact number but a substring of it: this generates many false positives. We should not only match the exact id, but also present it as an "outstanding result", like TracLinks matches, presented in a special way. This can now be achieved at the level of the search provider, as it can return a result with a "special" weight.

r5553 does this for tickets. It should be finished for revisions and also done for reports.


Implemented so far:

TODO

Other minor glitch:

#7069
'SearchResults doesn't indicate closed by etc..

Last modified 21 months ago Last modified on Mar 1, 2023, 5:03:45 PM
Note: See TracWiki for help on using the wiki.