Edgewall Software
Modify

Opened 19 years ago

Closed 17 years ago

Last modified 5 years ago

#2561 closed defect (fixed)

[PATCH] can't search for attachment name or description

Reported by: gregj@… Owned by: remy.blank@…
Priority: normal Milestone: 0.11.1
Component: search system Version: 0.9
Severity: normal Keywords: attachment patch
Cc: remy.blank@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Search does not find wiki pages or tickets when the search term appears only in an attachment file name or description. Suggest including attachment name and description in indexed text for searching.

Attachments (3)

attachment-search.patch (2.3 KB ) - added by Remy Blank <remy.blank@…> 17 years ago.
Patch against 0.11-stable [7346] adding searching in attachment metadata
attachment-search-2.patch (3.0 KB ) - added by Remy Blank <remy.blank@…> 17 years ago.
Made search in attachments consistent with timeline handling
attachment-result-format.patch (1.5 KB ) - added by Remy Blank <remy.blank@…> 17 years ago.
Change attachment search result format to <filename> (<parent_name>)

Download all attachments as: .zip

Change History (23)

comment:1 by Don <drj826@…>, 18 years ago

Has anyone suggested making the content of attachments searchable? This might be a major feature add — but VERY useful! The same goes for SVN repository content.

comment:2 by Christian Boos, 18 years ago

Milestone: 1.0

Right, the attachment module could be provide an ISearchSource implementation as well.

comment:3 by Christian Boos, 18 years ago

Keywords: attachment added

#5302 was closed as duplicate.

in reply to:  1 comment:4 by Christian Boos, 18 years ago

Milestone: 1.00.11
Owner: changed from Jonas Borgström to Christian Boos

Replying to Don <drj826@acm.org>:

Has anyone suggested making the content of attachments searchable?

That would be #5589.

Let's focus this ticket on the metadata.

comment:5 by Christian Boos, 17 years ago

#6137 closed as duplicate.

by Remy Blank <remy.blank@…>, 17 years ago

Attachment: attachment-search.patch added

Patch against 0.11-stable [7346] adding searching in attachment metadata

comment:6 by Remy Blank <remy.blank@…>, 17 years ago

Cc: remy.blank@… added

This patch adds an ISearchSource for attachment metadata. It currently searches in the filename, description and author fields. The search results point directly to the attachment page, from where the parent (wiki page, ticket) can be reached in one click.

I would be grateful if I could get some feedback about the patch. I am diving into Trac programming, and it would be great to know how I'm doing.

comment:7 by Remy Blank <remy.blank@…>, 17 years ago

Summary: can't search for attachment name or description[PATCH] can't search for attachment name or description

comment:8 by Christian Boos, 17 years ago

Milestone: 0.11.20.11.1

The patch is looking great, thanks!

One suggestion though: what about taking into account the other filters for narrowing the search? I think it's reasonable to make the assumption that a search filter for a given realm will use that realm string, so we could do the following:

  • when only the attachment filter is given, return all attachments, regardless of the realm
  • when there are other filters given, return only attachments for which the realm is matching

That way, when selecting "wiki + attachment" filters, you would get wiki attachments, but no ticket attachments, for example.

An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events). Instead of the Attachment module being a search source by itself, it only provides a helper function, quite similar to the get_search_results() method you've already written, taking an additional resource_realm parameter for retrieving only attachments for that realm.

I think I even prefer the second option, as it's more consistent with how the timeline works, and doesn't introduce the assumption on filter vs. realm names.

What do you think?

in reply to:  8 ; comment:9 by Remy Blank <remy.blank@…>, 17 years ago

Replying to cboos:

One suggestion though: what about taking into account the other filters for narrowing the search? I think it's reasonable to make the assumption that a search filter for a given realm will use that realm string, so we could do the following:

  • when only the attachment filter is given, return all attachments, regardless of the realm
  • when there are other filters given, return only attachments for which the realm is matching

Sounds like a good idea. But as you said, it makes the attachments a kind of special case. The result would still be quite intuitive, though. I like it.

An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events). Instead of the Attachment module being a search source by itself, it only provides a helper function, quite similar to the get_search_results() method you've already written, taking an additional resource_realm parameter for retrieving only attachments for that realm.

That's exactly how I started implementing it, but I backed out for several reasons:

  • Search results for attachments should be the parent resource, that is, when a search term is found in attachment metadata, the associated parent resource should be given as a result. This has several implications:
    • An extra DB lookup is needed for every result, for example in the case of the wiki, to get the wiki text. I haven't evaluated how this impacts search performance, but I would guess it can become noticeable.
    • When several attachments match for a given parent resource, they should be collapsed into a single search result. While quite simple to implement in the helper function in AttachmentModule, I felt that it was more consistent to treat attachments as first-class citizens and to return all matches.
    • OTOH, you might be more interested in the parent resource than the attachment anyway. With the current implementation, the parent resource is just one click away, so this might not have much weight.
  • It requires some implementation work in every attachment user. This is currently only wiki and ticket, but I still feel that this is unnecessary duplication.

I felt that I preferred the consistency of treating attachments as a normal resource provider and listing every matching attachment. This will also make it easier to transition to searching in attachment content later on, where I will be more interested in the attachment than the parent resource.

Let me know how you would like me to proceed.

in reply to:  9 ; comment:10 by Christian Boos, 17 years ago

Replying to Remy Blank <remy.blank@pobox.com>:

Replying to cboos:

An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events). Instead of the Attachment module being a search source by itself, it only provides a helper function, …

That's exactly how I started implementing it, but I backed out for several reasons:

  • Search results for attachments should be the parent resource, that is, when a search term is found in attachment metadata, the associated parent resource should be given as a result.

Hm, why? That's not how it works for the timeline. The timeline events are pointing to the attachment directly, and it should be done the same way for search results.

  • It requires some implementation work in every attachment user. This is currently only wiki and ticket, but I still feel that this is unnecessary duplication.

That's true, but the duplication would be rather minimal, 2-3 lines for iterating on the helper results and yielding them to the caller. Note however that the milestones (and some other realms used by plugins) also support attachments in 0.11.

I felt that I preferred the consistency of treating attachments as a normal resource provider and listing every matching attachment. This will also make it easier to transition to searching in attachment content later on, where I will be more interested in the attachment than the parent resource.

No question about listing every matching attachment. However, should attachments really be treated as a normal resource provider, or as an "ancillary" resource provider upon which other resource managers have control. We had the same choice for the Timeline, and I remember we decided to delegate to the main resource provider instead of providing an "Attachment" check box. I think both options can certainly be defended.

Let me know how you would like me to proceed.

So my personal preference would be consistency with the timeline.

in reply to:  10 comment:11 by Remy Blank <remy.blank@…>, 17 years ago

Hm, why? That's not how it works for the timeline. The timeline events are pointing to the attachment directly, and it should be done the same way for search results.

Well, for the user, the attachment list is part of the parent resource text, so it just seemed natural to list the parent resource if a search term matches an attachment. But you're right, it is not necessary, and I also prefer listing every matching attachment.

That's true, but the duplication would be rather minimal, 2-3 lines for iterating on the helper results and yielding them to the caller. Note however that the milestones (and some other realms used by plugins) also support attachments in 0.11.

You are right, of course. If we list individual attachments instead of the parent resource, it becomes trivial. Thanks for the reminder about the milestones.

So my personal preference would be consistency with the timeline.

It does make sense. Thanks for the feedback, I'll come back with a new patch shortly.

by Remy Blank <remy.blank@…>, 17 years ago

Attachment: attachment-search-2.patch added

Made search in attachments consistent with timeline handling

comment:12 by Remy Blank <remy.blank@…>, 17 years ago

Keywords: patch added

The version 2 of the patch removes the separate ISearchSource for attachments and treats searching in attachments the same way as timeline events, i.e. every attachment user having search functionality gets its attachment search results from AttachmentModule.

Christian, you mentioned that milestones can now have attachments. However, as far as I can tell, they don't provide search functionality, do they? Would that be desirable? While I'm at it (and now that I'm a bit familiar with the search functionality), I could provide an implementation for that.

comment:13 by Christian Boos, 17 years ago

Owner: changed from Christian Boos to remy.blank@…

Perfect, I'll test that this evening. About the milestone search, sure, another patch for adding the ISearchSource interface to the Milestone module would be welcome (just create another [PATCH] enhancement ticket for that ;-) ).

comment:14 by Remy Blank <remy.blank@…>, 17 years ago

There you go: #7444

comment:15 by Christian Boos, 17 years ago

Patch worked fine, applied in [7350].

One minor glitch though, the attachment match results were displayed the parent resource name first, then the filename. When glancing over the results, it was not that obvious that those were attachments. What about this alternative format? "<filename>" (<resource>)

  • trac/attachment.py

     
    497497            attachment = resource_realm(id=id).child('attachment', filename)
    498498            if 'ATTACHMENT_VIEW' in req.perm(attachment):
    499499                yield (get_resource_url(self.env, attachment, req.href),
    500                        "%s: %s" % (get_resource_shortname(self.env,
    501                                                           attachment.parent),
    502                                    filename),
     500                       get_resource_description(self.env, attachment,
     501                                                'compact'),
    503502                       datetime.fromtimestamp(time, utc), author,
    504503                       shorten_result(desc, terms))
    505504
     
    530529
    531530    def get_resource_description(self, resource, format=None, **kwargs):
    532531        if format == 'compact':
    533             return '%s:%s' % (get_resource_shortname(self.env,
    534                                                      resource.parent),
    535                               resource.filename)
     532            return '"%s" (%s)' % (resource.id,
     533                    get_resource_shortname(self.env, resource.parent))
     534
    536535        elif format == 'summary':
    537536            return Attachment(self.env, resource).description
    538537        if resource.id:
    539             return _("Attachment '%(id)s' in %(parent)s", id=resource.id,
     538            return _('Attachment "%(id)s" in %(parent)s', id=resource.id,
    540539                     parent=get_resource_name(self.env, resource.parent))
    541540        else:
    542541            return _("Attachments of %(parent)s",

comment:16 by Remy Blank <remy.blank@…>, 17 years ago

The reason I chose <parent_shortname>: <filename> is that the heading of the attachment page is <parent_name>: <filename> (and the search results in general seem to have the shortname instead of the name).

I like your proposal much better. But I have a few comments:

  • I would drop the quotes around the filename. I find that they make the output look weird, and there are no quotes in the heading of the attachment page either. This is higly cosmetic, of course.
  • I would use the resource name instead of the shortname between the brackets. It doesn't make the string much longer, but I find it to be clearer (compare e.g. attachment-search.patch (#2561) with attachment-search.patch (Ticket #2561).
  • I would use get_resource_shortname() instead of get_resource_description(..., 'compact') in get_search_results().

I'll attach an updated patch with these changes.

by Remy Blank <remy.blank@…>, 17 years ago

Change attachment search result format to <filename> (<parent_name>)

comment:17 by Christian Boos, 17 years ago

Resolution: fixed
Status: newclosed

Fine! Your patch committed as [7357]. I removed the last chunk of the change, which replaced ' by " which triggered unit tests failures. As you said, this is highly cosmetic, so let's keep the status quo rather than changing the tests.

I also took this opportunity to add you to the THANKS file, where you'll be in good company ;-)

in reply to:  17 comment:18 by Remy Blank <remy.blank@…>, 17 years ago

Replying to cboos:

Fine! Your patch committed as [7357].

Thanks!

I removed the last chunk of the change, which replaced ' by " which triggered unit tests failures. As you said, this is highly cosmetic, so let's keep the status quo rather than changing the tests.

Actually, this last hunk was coming directly from your patch proposal above. The quotes I was talking about were the ones around the filename, which you have dropped as I suggested.

I also took this opportunity to add you to the THANKS file, where you'll be in good company ;-)

Thanks!

comment:19 by Christian Boos, 17 years ago

Now after having used this feature a bit, I wonder if there shouldn't be a way to get only the attachment matches… That could be done if there were real filters. What is currently called filters are actually more selectors for the sources, but there's no API yet for filtering out matches (by author, by type…). Same problematic for the timeline btw (see #1198).

comment:20 by Remy Blank <remy.blank@…>, 17 years ago

So the idea would be to have source selectors:

  • tickets
  • changesets
  • milestones
  • wiki

and additionally a (variable?) set of filters:

  • search in resource body
  • search in attachments
  • author(s)
  • date range

Did I get this right? I wonder if this is worth the trouble, as it will get superseded by AdvancedSearch. It still makes sense for the timeline, though, so maybe we should start experimenting there.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain remy.blank@….
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from remy.blank@… to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.