#2561 closed defect (fixed)
[PATCH] can't search for attachment name or description
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | normal | Milestone: | 0.11.1 |
Component: | search system | Version: | 0.9 |
Severity: | normal | Keywords: | attachment patch |
Cc: | remy.blank@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Search does not find wiki pages or tickets when the search term appears only in an attachment file name or description. Suggest including attachment name and description in indexed text for searching.
Attachments (3)
Change History (23)
follow-up: 4 comment:1 by , 18 years ago
comment:2 by , 18 years ago
Milestone: | → 1.0 |
---|
Right, the attachment module could be provide an ISearchSource implementation as well.
comment:4 by , 17 years ago
Milestone: | 1.0 → 0.11 |
---|---|
Owner: | changed from | to
Replying to Don <drj826@acm.org>:
Has anyone suggested making the content of attachments searchable?
That would be #5589.
Let's focus this ticket on the metadata.
by , 16 years ago
Attachment: | attachment-search.patch added |
---|
Patch against 0.11-stable [7346] adding searching in attachment metadata
comment:6 by , 16 years ago
Cc: | added |
---|
This patch adds an ISearchSource
for attachment metadata. It currently searches in the filename
, description
and author
fields. The search results point directly to the attachment page, from where the parent (wiki page, ticket) can be reached in one click.
I would be grateful if I could get some feedback about the patch. I am diving into Trac programming, and it would be great to know how I'm doing.
comment:7 by , 16 years ago
Summary: | can't search for attachment name or description → [PATCH] can't search for attachment name or description |
---|
follow-up: 9 comment:8 by , 16 years ago
Milestone: | 0.11.2 → 0.11.1 |
---|
The patch is looking great, thanks!
One suggestion though: what about taking into account the other filters for narrowing the search? I think it's reasonable to make the assumption that a search filter for a given realm will use that realm string, so we could do the following:
- when only the attachment filter is given, return all attachments, regardless of the realm
- when there are other filters given, return only attachments for which the realm is matching
That way, when selecting "wiki + attachment" filters, you would get wiki attachments, but no ticket attachments, for example.
An alternative approach would be to do something similar as what is done for timeline event providers (see Attachment.get_timeline_events
).
Instead of the Attachment module being a search source by itself, it only provides a helper function, quite similar to the get_search_results()
method you've already written, taking an additional resource_realm parameter for retrieving only attachments for that realm.
I think I even prefer the second option, as it's more consistent with how the timeline works, and doesn't introduce the assumption on filter vs. realm names.
What do you think?
follow-up: 10 comment:9 by , 16 years ago
Replying to cboos:
One suggestion though: what about taking into account the other filters for narrowing the search? I think it's reasonable to make the assumption that a search filter for a given realm will use that realm string, so we could do the following:
- when only the attachment filter is given, return all attachments, regardless of the realm
- when there are other filters given, return only attachments for which the realm is matching
Sounds like a good idea. But as you said, it makes the attachments a kind of special case. The result would still be quite intuitive, though. I like it.
An alternative approach would be to do something similar as what is done for timeline event providers (see
Attachment.get_timeline_events
). Instead of the Attachment module being a search source by itself, it only provides a helper function, quite similar to theget_search_results()
method you've already written, taking an additional resource_realm parameter for retrieving only attachments for that realm.
That's exactly how I started implementing it, but I backed out for several reasons:
- Search results for attachments should be the parent resource, that is, when a search term is found in attachment metadata, the associated parent resource should be given as a result. This has several implications:
- An extra DB lookup is needed for every result, for example in the case of the wiki, to get the wiki text. I haven't evaluated how this impacts search performance, but I would guess it can become noticeable.
- When several attachments match for a given parent resource, they should be collapsed into a single search result. While quite simple to implement in the helper function in
AttachmentModule
, I felt that it was more consistent to treat attachments as first-class citizens and to return all matches. - OTOH, you might be more interested in the parent resource than the attachment anyway. With the current implementation, the parent resource is just one click away, so this might not have much weight.
- It requires some implementation work in every attachment user. This is currently only wiki and ticket, but I still feel that this is unnecessary duplication.
I felt that I preferred the consistency of treating attachments as a normal resource provider and listing every matching attachment. This will also make it easier to transition to searching in attachment content later on, where I will be more interested in the attachment than the parent resource.
Let me know how you would like me to proceed.
follow-up: 11 comment:10 by , 16 years ago
Replying to Remy Blank <remy.blank@pobox.com>:
Replying to cboos:
An alternative approach would be to do something similar as what is done for timeline event providers (see
Attachment.get_timeline_events
). Instead of the Attachment module being a search source by itself, it only provides a helper function, …That's exactly how I started implementing it, but I backed out for several reasons:
- Search results for attachments should be the parent resource, that is, when a search term is found in attachment metadata, the associated parent resource should be given as a result.
Hm, why? That's not how it works for the timeline. The timeline events are pointing to the attachment directly, and it should be done the same way for search results.
- It requires some implementation work in every attachment user. This is currently only wiki and ticket, but I still feel that this is unnecessary duplication.
That's true, but the duplication would be rather minimal, 2-3 lines for iterating on the helper results and yielding them to the caller. Note however that the milestones (and some other realms used by plugins) also support attachments in 0.11.
I felt that I preferred the consistency of treating attachments as a normal resource provider and listing every matching attachment. This will also make it easier to transition to searching in attachment content later on, where I will be more interested in the attachment than the parent resource.
No question about listing every matching attachment. However, should attachments really be treated as a normal resource provider, or as an "ancillary" resource provider upon which other resource managers have control. We had the same choice for the Timeline, and I remember we decided to delegate to the main resource provider instead of providing an "Attachment" check box. I think both options can certainly be defended.
Let me know how you would like me to proceed.
So my personal preference would be consistency with the timeline.
comment:11 by , 16 years ago
Hm, why? That's not how it works for the timeline. The timeline events are pointing to the attachment directly, and it should be done the same way for search results.
Well, for the user, the attachment list is part of the parent resource text, so it just seemed natural to list the parent resource if a search term matches an attachment. But you're right, it is not necessary, and I also prefer listing every matching attachment.
That's true, but the duplication would be rather minimal, 2-3 lines for iterating on the helper results and yielding them to the caller. Note however that the milestones (and some other realms used by plugins) also support attachments in 0.11.
You are right, of course. If we list individual attachments instead of the parent resource, it becomes trivial. Thanks for the reminder about the milestones.
So my personal preference would be consistency with the timeline.
It does make sense. Thanks for the feedback, I'll come back with a new patch shortly.
by , 16 years ago
Attachment: | attachment-search-2.patch added |
---|
Made search in attachments consistent with timeline handling
comment:12 by , 16 years ago
Keywords: | patch added |
---|
The version 2 of the patch removes the separate ISearchSource
for attachments and treats searching in attachments the same way as timeline events, i.e. every attachment user having search functionality gets its attachment search results from AttachmentModule
.
Christian, you mentioned that milestones can now have attachments. However, as far as I can tell, they don't provide search functionality, do they? Would that be desirable? While I'm at it (and now that I'm a bit familiar with the search functionality), I could provide an implementation for that.
comment:13 by , 16 years ago
Owner: | changed from | to
---|
Perfect, I'll test that this evening. About the milestone search, sure, another patch for adding the ISearchSource
interface to the Milestone module would be welcome (just create another [PATCH] enhancement ticket for that ;-) ).
comment:15 by , 16 years ago
Patch worked fine, applied in [7350].
One minor glitch though, the attachment match results were displayed the parent resource name first, then the filename. When glancing over the results, it was not that obvious that those were attachments. What about this alternative format? "<filename>" (<resource>)
-
trac/attachment.py
497 497 attachment = resource_realm(id=id).child('attachment', filename) 498 498 if 'ATTACHMENT_VIEW' in req.perm(attachment): 499 499 yield (get_resource_url(self.env, attachment, req.href), 500 "%s: %s" % (get_resource_shortname(self.env, 501 attachment.parent), 502 filename), 500 get_resource_description(self.env, attachment, 501 'compact'), 503 502 datetime.fromtimestamp(time, utc), author, 504 503 shorten_result(desc, terms)) 505 504 … … 530 529 531 530 def get_resource_description(self, resource, format=None, **kwargs): 532 531 if format == 'compact': 533 return ' %s:%s' % (get_resource_shortname(self.env,534 resource.parent),535 resource.filename) 532 return '"%s" (%s)' % (resource.id, 533 get_resource_shortname(self.env, resource.parent)) 534 536 535 elif format == 'summary': 537 536 return Attachment(self.env, resource).description 538 537 if resource.id: 539 return _( "Attachment '%(id)s' in %(parent)s", id=resource.id,538 return _('Attachment "%(id)s" in %(parent)s', id=resource.id, 540 539 parent=get_resource_name(self.env, resource.parent)) 541 540 else: 542 541 return _("Attachments of %(parent)s",
comment:16 by , 16 years ago
The reason I chose <parent_shortname>: <filename>
is that the heading of the attachment page is <parent_name>: <filename>
(and the search results in general seem to have the shortname instead of the name).
I like your proposal much better. But I have a few comments:
- I would drop the quotes around the filename. I find that they make the output look weird, and there are no quotes in the heading of the attachment page either. This is higly cosmetic, of course.
- I would use the resource name instead of the shortname between the brackets. It doesn't make the string much longer, but I find it to be clearer (compare e.g.
attachment-search.patch (#2561)
withattachment-search.patch (Ticket #2561)
.
- I would use
get_resource_shortname()
instead ofget_resource_description(..., 'compact')
inget_search_results()
.
I'll attach an updated patch with these changes.
by , 16 years ago
Attachment: | attachment-result-format.patch added |
---|
Change attachment search result format to <filename> (<parent_name>)
follow-up: 18 comment:17 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fine! Your patch committed as [7357].
I removed the last chunk of the change, which replaced '
by "
which triggered unit tests failures. As you said, this is highly cosmetic, so let's keep the status quo rather than changing the tests.
I also took this opportunity to add you to the THANKS file, where you'll be in good company ;-)
comment:18 by , 16 years ago
Replying to cboos:
Fine! Your patch committed as [7357].
Thanks!
I removed the last chunk of the change, which replaced
'
by"
which triggered unit tests failures. As you said, this is highly cosmetic, so let's keep the status quo rather than changing the tests.
Actually, this last hunk was coming directly from your patch proposal above. The quotes I was talking about were the ones around the filename, which you have dropped as I suggested.
I also took this opportunity to add you to the THANKS file, where you'll be in good company ;-)
Thanks!
comment:19 by , 16 years ago
Now after having used this feature a bit, I wonder if there shouldn't be a way to get only the attachment matches… That could be done if there were real filters. What is currently called filters are actually more selectors for the sources, but there's no API yet for filtering out matches (by author, by type…). Same problematic for the timeline btw (see #1198).
comment:20 by , 16 years ago
So the idea would be to have source selectors:
- tickets
- changesets
- milestones
- wiki
and additionally a (variable?) set of filters:
- search in resource body
- search in attachments
- author(s)
- date range
Did I get this right? I wonder if this is worth the trouble, as it will get superseded by AdvancedSearch. It still makes sense for the timeline, though, so maybe we should start experimenting there.
Has anyone suggested making the content of attachments searchable? This might be a major feature add — but VERY useful! The same goes for SVN repository content.