Ticket #2672 (closed defect: fixed)
Opened 6 years ago
Last modified 2 years ago
Trac is trying to diff binary content
| Reported by: | sgeertgens@… | Owned by: | rblank |
|---|---|---|---|
| Priority: | high | Milestone: | 0.11.6 |
| Component: | version control/changeset view | Version: | 0.9.2 |
| Severity: | minor | Keywords: | mimeview |
| Cc: | sgeertgens@…, yellen@…, ybarthelemy@…, thorsten.holtkaemper@…, public@…, joel@… | ||
| Release Notes: | |||
| API Changes: | |||
Description
In one of my repositories (trac0.9.2+apache2.0.54+mod_python+svn1.2.3), I have a few folders filled with .PDF files. A recent commit added and modified a number of them (~20 files affected). When trying to view the changeset through Trac, Trac is trying to create text diffs for some of them (3-4 files)... at the same time, the apache process on the server pegs the CPU and things don't seem to recover without an Apache restart.
I've verified that all the .PDF files all have their mimetype set to application/octet-stream, and I'm unable to determine a difference between a PDF that Trac is treating correctly and a PDF that Trac is trying to diff as text.
Attachments
Change History
comment:1 Changed 6 years ago by sgeertgens@…
comment:2 Changed 6 years ago by cboos
- Milestone set to 0.10
- Owner changed from jonas to cboos
Correct :)
If the fix ends up being simple enough, maybe we can backport it to 0.9-stable.
comment:3 Changed 6 years ago by anonymous
- Cc sgeertgens@… added
comment:4 Changed 6 years ago by cboos
- Keywords mimeview added
- Milestone changed from 0.10 to 0.11
- Severity changed from normal to minor
Maybe the changes from #3332 will also solve this issue.
If not, post-poning to 0.11.
comment:5 Changed 6 years ago by anonymous
- Cc yellen@… added
comment:6 Changed 5 years ago by mgood
#4206 has been marked as a duplicate.
comment:7 Changed 5 years ago by anonymous
- Cc sascha@… added
comment:8 Changed 4 years ago by osimons
Confirmed this still happens. svn:mimetype set to application/octet-stream for a pdf file. Subversion client will not let me diff (marked as binary), but commit change and Trac will show the diff anyway.
comment:9 Changed 4 years ago by cboos
- Milestone changed from 0.11.1 to 0.12
- Priority changed from normal to high
See also #3428 and googlegroups:trac-users:f142b7822fe9df40
comment:10 Changed 4 years ago by ybarthelemy@…
- Cc ybarthelemy@… added
comment:11 Changed 4 years ago by sascha@…
- Cc sascha@… removed
comment:12 Changed 4 years ago by thorsten.holtkaemper@…
- Cc thorsten.holtkaemper@… added
comment:13 Changed 3 years ago by public@…
- Cc public@… added
I ran into this problem to with Trac 0.11. (svn:mime-type set to either application/octet-stream or application/pdf, doesn't matter)
I created a simple work-around which makes the is_binary() function check wether the data it is supposed to check starts with '%PDF-' and returns True if it does. (I will attach the diff file.)
I assume that you do not want to apply that to the Development tree of Trac as there seem to be better solutions, but it may be useful for other users in the meantime. Maybe you even want to apply it to a maintenance release of Trac 0.11. (I use Trac to manage .tex and generated .pdf files, and the diff view is basically useless if you always have to skip pages of binary PDF changes...)
Regards,
Milan Holzäpfel
Changed 3 years ago by public@…
- Attachment trac-0.11.2.1-dont_show_binary_pdf_differences.diff added
Workaround for trac-0.11.2.1
comment:14 Changed 3 years ago by rblank
- Milestone changed from 0.13 to 0.11.4
- Owner changed from cboos to rblank
I'll try to fix this properly. I haven't looked at the code yet, but shouldn't the criterion for trying a diff based on the mime type be that it starts with text/?
comment:15 Changed 3 years ago by cboos
Please take a look at r2108. Redoing this change, with the list coming from a [mimeview] treat_as_binary setting instead of being hard-coded would be fine.
There are many mime-types that are not starting with text/ and for which it's nevertheless interesting to have diffs for. Mostly XML stuff, like application/xslt+xml or image/svg+xml. But there are others (e.g. application/x-sh, model/vrml), so I don't think we can simply look for a xml substring.
As the complaints heard so far were mostly about PDF and RTF files, I think the treat_as_binary approach is good enough.
comment:16 follow-up: ↓ 17 Changed 3 years ago by ybarthelemy@…
Shouldn't application/octet-stream be added to the list of binary types as well ?
comment:17 in reply to: ↑ 16 Changed 3 years ago by cboos
Replying to ybarthelemy@…:
Shouldn't application/octet-stream be added to the list of binary types as well ?
Yes, we should do that as well. Together with Subversion svn:mime-type, that would allow to mark any kind of file for being treated as binary, even if it's not.
Note that attachment:trac-mimetype-no-binary-diff.patch:ticket:3428 proposed a fix along the lines proposed by Remy in comment:14 above.
I also closed #7213 as duplicate.
comment:19 Changed 3 years ago by cboos
- Milestone changed from 0.11.4 to 0.11.5
comment:20 Changed 3 years ago by rblank
- Resolution set to fixed
- Status changed from new to closed
Fixed in [8120] by adding the suggested [mimeview] treat_as_binary option, defaulting to:
- application/octet-stream
- application/pdf
- application/postscript
- application/rtf
and using the option in the plain text and diff viewers.
comment:21 follow-up: ↓ 22 Changed 3 years ago by treaves@…
What version is this fixed in? It's still a defect in 0.11.4.
comment:22 in reply to: ↑ 21 Changed 3 years ago by anonymous
Replying to treaves@…:
What version is this fixed in? It's still a defect in 0.11.4.
Actually, it's still broken in 0.11.5
comment:23 Changed 3 years ago by cboos
If you can provide us with two versions of a file that can be used to reproduce the problem, please do so and then reopen the ticket, we'll look at it.
comment:24 Changed 3 years ago by anonymous
Hmm... I wonder if this was due to stale info? After a reboot of my server (for an OS upgrade), I can no longer reproduce. Odd.
comment:25 follow-up: ↓ 26 Changed 2 years ago by batrick
- Resolution fixed deleted
- Status changed from closed to reopened
i am running trac 0.11.5 on debian lenny. after manual easy_install, i was hoping to have treat_as_binary support. however, pdfs still appear in the changeset view.
what i tried
- adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
- adding mime_map = (...),application/pdf:pdf to the same section
result: unfortunately nothing. pdfs still appear under "view changes"
i don't really know how to program python, but to see what was going on i added some trace code around the treat_as_binary part of api.py, recompiled - and got nothing. so it seemed to me that the code was not even called (with ample salt, as i said i hardly know any python)
it would be great if this could be fixed - i would really like to use trac to follow up on my latex files, but have the pdfs in svn for quick downloads also.
thanks,
batrick
comment:26 in reply to: ↑ 25 ; follow-up: ↓ 27 Changed 2 years ago by rblank
- Milestone changed from 0.11.5 to 0.11.6
Replying to batrick:
- adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
- adding mime_map = (...),application/pdf:pdf to the same section
result: unfortunately nothing. pdfs still appear under "view changes"
I assume you haven't set the svn:mime-type property for your pdf files? The mime_map is not used in that situation, which is an omission on my part. I'll fix that.
comment:27 in reply to: ↑ 26 Changed 2 years ago by batrick
thanks for the quick reply.
I assume you haven't set the svn:mime-type property for your pdf files? The mime_map is not used in that situation, which is an omission on my part. I'll fix that.
i am not aware of the svn:mime-type property. could give a hint how it can be used to avoid the pdf diff'ing problem?
thanks,
batrick
comment:28 Changed 2 years ago by rblank
If you use Subversion from the command line, you can set the property to "application/pdf" on .pdf files with the following command:
svn pset svn:mime-type application/pdf *.pdf
If you use a GUI for that, you should look out for a "Properties" dialog, where you could add a property named "svn:mime-type" with the value "application/pdf" for all of your .pdf files.
Of course, this is only needed temporarily. Once this is fixed, you won't need the properties anymore (but they won't hurt, either).
comment:29 Changed 2 years ago by rblank
- Resolution set to fixed
- Status changed from reopened to closed
Fixed in [8578], along with an environment reload bug.
comment:30 follow-up: ↓ 31 Changed 2 years ago by cboos
Just a hint about a possible optimization: when we do
old_content = old_node.get_content().read() if mview.is_binary(old_node.content_type, old_node.path, old_content): return None
We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary. For binary files, this cost is usually not negligible.
So what about:
if mview.is_binary(old_node.content_type, old_node.path): return None old_content = old_node.get_content().read() if mview.is_binary(content=old_content): return None
comment:31 in reply to: ↑ 30 Changed 2 years ago by rblank
- Resolution fixed deleted
- Status changed from closed to reopened
Replying to cboos:
We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary.
Yeah, I know. My feeling was that this is only an issue for very large binary files, something that should be relatively rare in a Subversion repository. And, you know, premature optimization is the root of all evil ;-)
comment:32 Changed 2 years ago by rblank
- Resolution set to fixed
- Status changed from reopened to closed
Suggestion implemented in [8585]. Thanks!



C.Boos replied to my original query in the mailing list with the following:
The changeset module is only using the is_binary heuristic in
order to decide if there's something to diff or not.
We should probably first check the mimetype as we do in the
browser module.
Seems like it's worth to fill a ticket for that issue.
-- Christian