Edgewall Software

Ticket #2561 (closed defect: fixed)

Opened 3 years ago

Last modified 7 weeks ago

[PATCH] can't search for attachment name or description

Reported by: gregj@… Owned by: remy.blank@…
Priority: normal Milestone: 0.11.1
Component: search system Version: 0.9
Severity: normal Keywords: attachment patch
Cc: remy.blank@…

Description

Search does not find wiki pages or tickets when the search term appears only in an attachment file name or description. Suggest including attachment name and description in indexed text for searching.

Attachments

attachment-search.patch (2.3 kB) - added by Remy Blank <remy.blank@…> 7 weeks ago.
Patch against 0.11-stable [7346] adding searching in attachment metadata
attachment-search-2.patch (3.0 kB) - added by Remy Blank <remy.blank@…> 7 weeks ago.
Made search in attachments consistent with timeline handling
attachment-result-format.patch (1.5 kB) - added by Remy Blank <remy.blank@…> 7 weeks ago.
Change attachment search result format to <filename> (<parent_name>)

Change History

follow-up: ↓ 4   Changed 2 years ago by Don <drj826@…>

Has anyone suggested making the content of attachments searchable? This might be a major feature add -- but VERY useful! The same goes for SVN repository content.

  Changed 18 months ago by cboos

  • milestone set to 1.0

Right, the attachment module could be provide an ISearchSource implementation as well.

  Changed 16 months ago by cboos

  • keywords attachment added

#5302 was closed as duplicate.

in reply to: ↑ 1   Changed 15 months ago by cboos

  • owner changed from jonas to cboos
  • milestone changed from 1.0 to 0.11

Replying to Don <drj826@acm.org>:

Has anyone suggested making the content of attachments searchable?

That would be #5589.

Let's focus this ticket on the metadata.

  Changed 11 months ago by cboos

#6137 closed as duplicate.

Changed 7 weeks ago by Remy Blank <remy.blank@…>

Patch against 0.11-stable [7346] adding searching in attachment metadata

  Changed 7 weeks ago by Remy Blank <remy.blank@…>

  • cc remy.blank@… added

This patch adds an ISearchSource for attachment metadata. It currently searches in the filename, description and author fields. The search results point directly to the attachment page, from where the parent (wiki page, ticket) can be reached in one click.

I would be grateful if I could get some feedback about the patch. I am diving into Trac programming, and it would be great to know how I'm doing.

  Changed 7 weeks ago by Remy Blank <remy.blank@…>

  • summary changed from can't search for attachment name or description to [PATCH] can't search for attachment name or description

follow-up: ↓ 9   Changed 7 weeks ago by cboos

  • milestone changed from 0.11.2 to 0.11.1

The patch is looking great, thanks!

One suggestion though: what about taking into account the other filters for narrowing the search? I think it's reasonable to make the assumption that a search filter for a given realm will use that realm string, so we could do the following:

  • when only the attachment filter is given, return all attachments, regardless of the realm
  • when there are other filters given, return only attachments for which the realm is matching

That way, when selecting "wiki + attachment" filters, you would get wiki attachments, but no ticket attachments, for example.

An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events). Instead of the Attachment module being a search source by itself, it only provides a helper function, quite similar to the get_search_results() method you've already written, taking an additional resource_realm parameter for retrieving only attachments for that realm.

I think I even prefer the second option, as it's more consistent with how the timeline works, and doesn't introduce the assumption on filter vs. realm names.

What do you think?

in reply to: ↑ 8 ; follow-up: ↓ 10   Changed 7 weeks ago by Remy Blank <remy.blank@…>

Replying to cboos:

One suggestion though: what about taking into account the other filters for narrowing the search? I think it's reasonable to make the assumption that a search filter for a given realm will use that realm string, so we could do the following: - when only the attachment filter is given, return all attachments, regardless of the realm - when there are other filters given, return only attachments for which the realm is matching

Sounds like a good idea. But as you said, it makes the attachments a kind of special case. The result would still be quite intuitive, though. I like it.

An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events). Instead of the Attachment module being a search source by itself, it only provides a helper function, quite similar to the get_search_results() method you've already written, taking an additional resource_realm parameter for retrieving only attachments for that realm.

That's exactly how I started implementing it, but I backed out for several reasons:

  • Search results for attachments should be the parent resource, that is, when a search term is found in attachment metadata, the associated parent resource should be given as a result. This has several implications:
    • An extra DB lookup is needed for every result, for example in the case of the wiki, to get the wiki text. I haven't evaluated how this impacts search performance, but I would guess it can become noticeable.
    • When several attachments match for a given parent resource, they should be collapsed into a single search result. While quite simple to implement in the helper function in AttachmentModule, I felt that it was more consistent to treat attachments as first-class citizens and to return all matches.
    • OTOH, you might be more interested in the parent resource than the attachment anyway. With the current implementation, the parent resource is just one click away, so this might not have much weight.
  • It requires some implementation work in every attachment user. This is currently only wiki and ticket, but I still feel that this is unnecessary duplication.

I felt that I preferred the consistency of treating attachments as a normal resource provider and listing every matching attachment. This will also make it easier to transition to searching in attachment content later on, where I will be more interested in the attachment than the parent resource.

Let me know how you would like me to proceed.

in reply to: ↑ 9 ; follow-up: ↓ 11   Changed 7 weeks ago by cboos

Replying to Remy Blank <remy.blank@pobox.com>:

Replying to cboos:

An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events). Instead of the Attachment module being a search source by itself, it only provides a helper function, ...

That's exactly how I started implementing it, but I backed out for several reasons: * Search results for attachments should be the parent resource, that is, when a search term is found in attachment metadata, the associated parent resource should be given as a result.

Hm, why? That's not how it works for the timeline. The timeline events are pointing to the attachment directly, and it should be done the same way for search results.

* It requires some implementation work in every attachment user. This is currently only wiki and ticket, but I still feel that this is unnecessary duplication.

That's true, but the duplication would be rather minimal, 2-3 lines for iterating on the helper results and yielding them to the caller. Note however that the milestones (and some other realms used by plugins) also support attachments in 0.11.

I felt that I preferred the consistency of treating attachments as a normal resource provider and listing every matching attachment. This will also make it easier to transition to searching in attachment content later on, where I will be more interested in the attachment than the parent resource.

No question about listing every matching attachment. However, should attachments really be treated as a normal resource provider, or as an "ancillary" resource provider upon which other resource managers have control. We had the same choice for the Timeline, and I remember we decided to delegate to the main resource provider instead of providing an "Attachment" check box. I think both options can certainly be defended.

Let me know how you would like me to proceed.

So my personal preference would be consistency with the timeline.

in reply to: ↑ 10   Changed 7 weeks ago by Remy Blank <remy.blank@…>

Hm, why? That's not how it works for the timeline. The timeline events are pointing to the attachment directly, and it should be done the same way for search results.

Well, for the user, the attachment list is part of the parent resource text, so it just seemed natural to list the parent resource if a search term matches an attachment. But you're right, it is not necessary, and I also prefer listing every matching attachment.

That's true, but the duplication would be rather minimal, 2-3 lines for iterating on the helper results and yielding them to the caller. Note however that the milestones (and some other realms used by plugins) also support attachments in 0.11.

You are right, of course. If we list individual attachments instead of the parent resource, it becomes trivial. Thanks for the reminder about the milestones.

So my personal preference would be consistency with the timeline.

It does make sense. Thanks for the feedback, I'll come back with a new patch shortly.

Changed 7 weeks ago by Remy Blank <remy.blank@…>

Made search in attachments consistent with timeline handling

  Changed 7 weeks ago by Remy Blank <remy.blank@…>

  • keywords patch added

The version 2 of the patch removes the separate ISearchSource for attachments and treats searching in attachments the same way as timeline events, i.e. every attachment user having search functionality gets its attachment search results from AttachmentModule.

Christian, you mentioned that milestones can now have attachments. However, as far as I can tell, they don't provide search functionality, do they? Would that be desirable? While I'm at it (and now that I'm a bit familiar with the search functionality), I could provide an implementation for that.

  Changed 7 weeks ago by cboos

  • owner changed from cboos to remy.blank@…

Perfect, I'll test that this evening. About the milestone search, sure, another patch for adding the ISearchSource interface to the Milestone module would be welcome (just create another [PATCH] enhancement ticket for that ;-) ).

  Changed 7 weeks ago by Remy Blank <remy.blank@…>

There you go: #7444

  Changed 7 weeks ago by cboos

Patch worked fine, applied in [7350].

One minor glitch though, the attachment match results were displayed the parent resource name first, then the filename. When glancing over the results, it was not that obvious that those were attachments. What about this alternative format? "<filename>" (<resource>)

  • trac/attachment.py

     
    497497            attachment = resource_realm(id=id).child('attachment', filename) 
    498498            if 'ATTACHMENT_VIEW' in req.perm(attachment): 
    499499                yield (get_resource_url(self.env, attachment, req.href), 
    500                        "%s: %s" % (get_resource_shortname(self.env, 
    501                                                           attachment.parent), 
    502                                    filename), 
     500                       get_resource_description(self.env, attachment, 
     501                                                'compact'), 
    503502                       datetime.fromtimestamp(time, utc), author, 
    504503                       shorten_result(desc, terms)) 
    505504 
     
    530529 
    531530    def get_resource_description(self, resource, format=None, **kwargs): 
    532531        if format == 'compact': 
    533             return '%s:%s' % (get_resource_shortname(self.env, 
    534                                                      resource.parent), 
    535                               resource.filename) 
     532            return '"%s" (%s)' % (resource.id, 
     533                    get_resource_shortname(self.env, resource.parent)) 
     534 
    536535        elif format == 'summary': 
    537536            return Attachment(self.env, resource).description 
    538537        if resource.id: 
    539             return _("Attachment '%(id)s' in %(parent)s", id=resource.id, 
     538            return _('Attachment "%(id)s" in %(parent)s', id=resource.id, 
    540539                     parent=get_resource_name(self.env, resource.parent)) 
    541540        else: 
    542541            return _("Attachments of %(parent)s", 

  Changed 7 weeks ago by Remy Blank <remy.blank@…>

The reason I chose <parent_shortname>: <filename> is that the heading of the attachment page is <parent_name>: <filename> (and the search results in general seem to have the shortname instead of the name).

I like your proposal much better. But I have a few comments:

  • I would drop the quotes around the filename. I find that they make the output look weird, and there are no quotes in the heading of the attachment page either. This is higly cosmetic, of course.
  • I would use the resource name instead of the shortname between the brackets. It doesn't make the string much longer, but I find it to be clearer (compare e.g. attachment-search.patch (#2561) with attachment-search.patch (Ticket #2561).
  • I would use get_resource_shortname() instead of get_resource_description(..., 'compact') in get_search_results().

I'll attach an updated patch with these changes.

Changed 7 weeks ago by Remy Blank <remy.blank@…>

Change attachment search result format to <filename> (<parent_name>)

follow-up: ↓ 18   Changed 7 weeks ago by cboos

  • status changed from new to closed
  • resolution set to fixed

Fine! Your patch committed as [7357]. I removed the last chunk of the change, which replaced ' by " which triggered unit tests failures. As you said, this is highly cosmetic, so let's keep the status quo rather than changing the tests.

I also took this opportunity to add you to the THANKS file, where you'll be in good company ;-)

in reply to: ↑ 17   Changed 7 weeks ago by Remy Blank <remy.blank@…>

Replying to cboos:

Fine! Your patch committed as [7357].

Thanks!

I removed the last chunk of the change, which replaced ' by " which triggered unit tests failures. As you said, this is highly cosmetic, so let's keep the status quo rather than changing the tests.

Actually, this last hunk was coming directly from your patch proposal above. The quotes I was talking about were the ones around the filename, which you have dropped as I suggested.

I also took this opportunity to add you to the THANKS file, where you'll be in good company ;-)

Thanks!

  Changed 7 weeks ago by cboos

Now after having used this feature a bit, I wonder if there shouldn't be a way to get only the attachment matches... That could be done if there were real filters. What is currently called filters are actually more selectors for the sources, but there's no API yet for filtering out matches (by author, by type...). Same problematic for the timeline btw (see #1198).

  Changed 7 weeks ago by Remy Blank <remy.blank@…>

So the idea would be to have source selectors:

  • tickets
  • changesets
  • milestones
  • wiki

and additionally a (variable?) set of filters:

  • search in resource body
  • search in attachments
  • author(s)
  • date range

Did I get this right? I wonder if this is worth the trouble, as it will get superseded by AdvancedSearch. It still makes sense for the timeline, though, so maybe we should start experimenting there.

Add/Change #2561 ([PATCH] can't search for attachment name or description)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.