= Advanced Search = One of the expected feature for [milestone:1.0] is a much improved search system. But what exactly should be improved? This is the place to discuss it and make proposals. Note that there's currently no development branch dedicated to this topic, but when there will be one, this page can be used to discuss the corresponding implementation details. As usual with Trac, the challenge is that we're not only searching Wiki pages, but other kind of Trac resources as well: tickets, changesets, etc. Therefore, the result shown should also be adapted to the kind of object retrieved (see e.g. #2859). A related question is how the TracSearch and the TracQuery should interact, see Trac-Dev:333, #1329, #2644. == Weighting == Right now, the results are returned in reverse chronological order (i.e. most recent first). All matches are considered equal. It was suggested that we could use some simple weighting techniques to return the results in a more useful order. For example, if a term is found in a ticket summary, this could "weight" more than if found in a ticket comment. Likewise, the number of times the term is found for a given result could be taken into account, etc. It should be possible to do a first version of this improvement independently of the rest, by modifying the return type of `ISearch.get_search_results` to return a list of `SearchResult` object (much like the [wiki:"TracDev/ApiChanges/0.11#ITimelineEventProvider" ITimelineEventProvider] change). == Indexing == It would probably be a good idea if objects were indexed as they are created/updated. This would obviously improve search performance greatly, and no longer effectively require a full retrieval of the entire DB. This could be optional I guess. A generic search system would provide components with a means to index content, query the content in a standard way (ie. a decent query language) and refer to this content at a later date (eg. ticket hits would display the ticket in a useful way, with links to specific comments, etc.) == Alec's Stream of Consciousness == If indexing on creation/update we would need hooks for each resource in Trac (ala `IWikiChangeListener`) to update the indexer. The potential downside for this is that indexing on the fly could slow down Trac's responsiveness at the cost of faster search. This could be mitigated by running the indexer in a thread. I like this solution. For indexing itself, there seems to be two solutions: use a generalised indexing engine (Hyperestraier, Lucene, etc.) or in-database indexers. A generalised indexing engine has advantages in that one interface could be used for all resources (wiki, ticket, source, attachment, ...). I am personally a fan of this option, and in particular [http://swapoff.org/pyndexter pyndexter] (bias!), which provides an abstraction layer for a number of indexers. It also includes a generic query language (similar to Google's) which is transformed into the query language particular to each backend. So, here is a completely unthoughtout proposal: {{{ #!python # trac.wiki.search from trac.search import SearchSystem class WikiIndexer(Component): implements(IWikiChangeListener) def _update(self, page): SearchSystem(self.env).add('wiki:%s' % page.id, content=page.content) wiki_page_added = _update wiki_page_changed = _update wiki_page_version_deleted = _update def wiki_page_deleted(self, page): SearchSystem(self.env).remove('wiki:%s' % page.id) }}} This kind of system could be implemented entirely as a plugin, assuming appropriate ''!ChangeListener'' style interfaces existed for all resources (currently only the versioncontrol module is missing this functionality). == Search Engines == Several search engines could be good candidate for handling the search requests, but probably this should be done in a pluggable way, so that different search engines could be supported. Among the possible candidates: * [http://www.xapian.org Xapian] and [http://divmod.org/trac/wiki/DivmodXapwrap DivmodXapwrap]. See also the discussion about using Xapin in MoinMoin: MoinMoin:FeatureRequests/AdvancedXapianSearch * [http://pylucene.osafoundation.org/ PyLucene] * [http://hyperestraier.sourceforge.net/ Hyper Estraier] and [http://hype.python-hosting.com/ hype]. * ... ? * There's been some efforts to provide a neutral API for some of the above search engines: - [http://swapoff.org/wiki/pyndexter pyndexter] [[BR]] The Hyperestraier adapter works well, Xapian is coming along nicely and the pure python indexer is based on that used by th:wiki:RepoSearchPlugin (ie. works, but has issues). I have yet to write the !PyLucene adapter, but it doesn't look too difficult. - [http://blog.case.edu/bmb12/2006/08/merquery_summer_of_code_results merquery] [[BR]] This is now a Django specific SoC project. - [http://opensearch.a9.com/ OpenSearch] * DatabaseBackend may also have their own way to implement full text search: - Recent SQLite (3.3.8) comes with experimental full-text search module, sqlite:FtsOne (depends on pysqlite:#180) - postgres fulltext search - [http://pgfoundry.org/projects/pgestraier postgres calling hyper estraier], soure is[http://svn.rot13.org/index.cgi/pgestraier/browse/trunk/ here] - [http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2 Tsearch2] - [http://www.devx.com/opensource/Article/21674 article about tsearch2] - [http://techdocs.postgresql.org/techdocs/fulltextindexing.php postgres fulltext indexing] - [http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html MySql fulltext indexing]