Opened 19 years ago
Closed 15 years ago
#2672 closed defect (fixed)
Trac is trying to diff binary content
Reported by: | Owned by: | Remy Blank | |
---|---|---|---|
Priority: | high | Milestone: | 0.11.6 |
Component: | version control/changeset view | Version: | 0.9.2 |
Severity: | minor | Keywords: | mimeview |
Cc: | sgeertgens@…, yellen@…, ybarthelemy@…, thorsten.holtkaemper@…, public@…, joel@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
In one of my repositories (trac0.9.2+apache2.0.54+mod_python+svn1.2.3), I have a few folders filled with .PDF files. A recent commit added and modified a number of them (~20 files affected). When trying to view the changeset through Trac, Trac is trying to create text diffs for some of them (3-4 files)… at the same time, the apache process on the server pegs the CPU and things don't seem to recover without an Apache restart.
I've verified that all the .PDF files all have their mimetype set to application/octet-stream, and I'm unable to determine a difference between a PDF that Trac is treating correctly and a PDF that Trac is trying to diff as text.
Attachments (1)
Change History (33)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
Milestone: | → 0.10 |
---|---|
Owner: | changed from | to
Correct :)
If the fix ends up being simple enough, maybe we can backport it to 0.9-stable.
comment:3 by , 19 years ago
Cc: | added |
---|
comment:4 by , 18 years ago
Keywords: | mimeview added |
---|---|
Milestone: | 0.10 → 0.11 |
Severity: | normal → minor |
Maybe the changes from #3332 will also solve this issue. If not, post-poning to 0.11.
comment:5 by , 18 years ago
Cc: | added |
---|
comment:7 by , 18 years ago
Cc: | added |
---|
comment:8 by , 17 years ago
Confirmed this still happens. svn:mimetype
set to application/octet-stream
for a pdf file. Subversion client will not let me diff (marked as binary), but commit change and Trac will show the diff anyway.
comment:9 by , 17 years ago
Milestone: | 0.11.1 → 0.12 |
---|---|
Priority: | normal → high |
See also #3428 and googlegroups:trac-users:f142b7822fe9df40
comment:10 by , 16 years ago
Cc: | added |
---|
comment:11 by , 16 years ago
Cc: | removed |
---|
comment:12 by , 16 years ago
Cc: | added |
---|
comment:13 by , 16 years ago
Cc: | added |
---|
I ran into this problem to with Trac 0.11. (svn:mime-type set to either application/octet-stream or application/pdf, doesn't matter)
I created a simple work-around which makes the is_binary() function check wether the data it is supposed to check starts with '%PDF-' and returns True if it does. (I will attach the diff file.)
I assume that you do not want to apply that to the Development tree of Trac as there seem to be better solutions, but it may be useful for other users in the meantime. Maybe you even want to apply it to a maintenance release of Trac 0.11. (I use Trac to manage .tex and generated .pdf files, and the diff view is basically useless if you always have to skip pages of binary PDF changes…)
Regards, Milan Holzäpfel
by , 16 years ago
Attachment: | trac-0.11.2.1-dont_show_binary_pdf_differences.diff added |
---|
Workaround for trac-0.11.2.1
comment:14 by , 16 years ago
Milestone: | 0.13 → 0.11.4 |
---|---|
Owner: | changed from | to
I'll try to fix this properly. I haven't looked at the code yet, but shouldn't the criterion for trying a diff based on the mime type be that it starts with text/
?
comment:15 by , 16 years ago
Please take a look at r2108. Redoing this change, with the list coming from a [mimeview] treat_as_binary
setting instead of being hard-coded would be fine.
There are many mime-types that are not starting with text/ and for which it's nevertheless interesting to have diffs for. Mostly XML stuff, like application/xslt+xml or image/svg+xml. But there are others (e.g. application/x-sh, model/vrml), so I don't think we can simply look for a xml substring.
As the complaints heard so far were mostly about PDF and RTF files, I think the treat_as_binary
approach is good enough.
follow-up: 17 comment:16 by , 16 years ago
Shouldn't application/octet-stream be added to the list of binary types as well ?
comment:17 by , 16 years ago
Replying to ybarthelemy@…:
Shouldn't application/octet-stream be added to the list of binary types as well ?
Yes, we should do that as well. Together with Subversion svn:mime-type, that would allow to mark any kind of file for being treated as binary, even if it's not.
Note that attachment:trac-mimetype-no-binary-diff.patch:ticket:3428 proposed a fix along the lines proposed by Remy in comment:14 above.
I also closed #7213 as duplicate.
comment:19 by , 16 years ago
Milestone: | 0.11.4 → 0.11.5 |
---|
comment:20 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in [8120] by adding the suggested [mimeview] treat_as_binary
option, defaulting to:
- application/octet-stream
- application/pdf
- application/postscript
- application/rtf
and using the option in the plain text and diff viewers.
follow-up: 22 comment:21 by , 15 years ago
What version is this fixed in? It's still a defect in 0.11.4.
comment:22 by , 15 years ago
Replying to treaves@…:
What version is this fixed in? It's still a defect in 0.11.4.
Actually, it's still broken in 0.11.5
comment:23 by , 15 years ago
If you can provide us with two versions of a file that can be used to reproduce the problem, please do so and then reopen the ticket, we'll look at it.
comment:24 by , 15 years ago
Hmm… I wonder if this was due to stale info? After a reboot of my server (for an OS upgrade), I can no longer reproduce. Odd.
follow-up: 26 comment:25 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
i am running trac 0.11.5 on debian lenny. after manual easy_install, i was hoping to have treat_as_binary support. however, pdfs still appear in the changeset view.
what i tried
- adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
- adding mime_map = (…),application/pdf:pdf to the same section
result: unfortunately nothing. pdfs still appear under "view changes"
i don't really know how to program python, but to see what was going on i added some trace code around the treat_as_binary part of api.py, recompiled - and got nothing. so it seemed to me that the code was not even called (with ample salt, as i said i hardly know any python)
it would be great if this could be fixed - i would really like to use trac to follow up on my latex files, but have the pdfs in svn for quick downloads also.
thanks,
batrick
follow-up: 27 comment:26 by , 15 years ago
Milestone: | 0.11.5 → 0.11.6 |
---|
Replying to batrick:
- adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
- adding mime_map = (…),application/pdf:pdf to the same section
result: unfortunately nothing. pdfs still appear under "view changes"
I assume you haven't set the svn:mime-type
property for your pdf files? The mime_map
is not used in that situation, which is an omission on my part. I'll fix that.
comment:27 by , 15 years ago
thanks for the quick reply.
I assume you haven't set the
svn:mime-type
property for your pdf files? Themime_map
is not used in that situation, which is an omission on my part. I'll fix that.
i am not aware of the svn:mime-type property. could give a hint how it can be used to avoid the pdf diff'ing problem?
thanks, batrick
comment:28 by , 15 years ago
If you use Subversion from the command line, you can set the property to "application/pdf" on .pdf
files with the following command:
svn pset svn:mime-type application/pdf *.pdf
If you use a GUI for that, you should look out for a "Properties" dialog, where you could add a property named "svn:mime-type" with the value "application/pdf" for all of your .pdf
files.
Of course, this is only needed temporarily. Once this is fixed, you won't need the properties anymore (but they won't hurt, either).
comment:29 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Fixed in [8578], along with an environment reload bug.
follow-up: 31 comment:30 by , 15 years ago
Just a hint about a possible optimization: when we do
old_content = old_node.get_content().read() if mview.is_binary(old_node.content_type, old_node.path, old_content): return None
We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary. For binary files, this cost is usually not negligible.
So what about:
if mview.is_binary(old_node.content_type, old_node.path): return None old_content = old_node.get_content().read() if mview.is_binary(content=old_content): return None
comment:31 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Replying to cboos:
We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary.
Yeah, I know. My feeling was that this is only an issue for very large binary files, something that should be relatively rare in a Subversion repository. And, you know, premature optimization is the root of all evil ;-)
comment:32 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Suggestion implemented in [8585]. Thanks!
C.Boos replied to my original query in the mailing list with the following:
The changeset module is only using the
is_binary
heuristic in order to decide if there's something to diff or not. We should probably first check the mimetype as we do in the browser module.Seems like it's worth to fill a ticket for that issue.
— Christian