Edgewall Software
Modify

Opened 14 years ago

Closed 10 years ago

#2672 closed defect (fixed)

Trac is trying to diff binary content

Reported by: sgeertgens@… Owned by: Remy Blank
Priority: high Milestone: 0.11.6
Component: version control/changeset view Version: 0.9.2
Severity: minor Keywords: mimeview
Cc: sgeertgens@…, yellen@…, ybarthelemy@…, thorsten.holtkaemper@…, public@…, joel@… Branch:
Release Notes:
API Changes:

Description

In one of my repositories (trac0.9.2+apache2.0.54+mod_python+svn1.2.3), I have a few folders filled with .PDF files. A recent commit added and modified a number of them (~20 files affected). When trying to view the changeset through Trac, Trac is trying to create text diffs for some of them (3-4 files)… at the same time, the apache process on the server pegs the CPU and things don't seem to recover without an Apache restart.

I've verified that all the .PDF files all have their mimetype set to application/octet-stream, and I'm unable to determine a difference between a PDF that Trac is treating correctly and a PDF that Trac is trying to diff as text.

Attachments (1)

trac-0.11.2.1-dont_show_binary_pdf_differences.diff (460 bytes ) - added by public@… 11 years ago.
Workaround for trac-0.11.2.1

Download all attachments as: .zip

Change History (33)

comment:1 by sgeertgens@…, 14 years ago

C.Boos replied to my original query in the mailing list with the following:

The changeset module is only using the is_binary heuristic in order to decide if there's something to diff or not. We should probably first check the mimetype as we do in the browser module.

Seems like it's worth to fill a ticket for that issue.

— Christian

comment:2 by Christian Boos, 14 years ago

Milestone: 0.10
Owner: changed from Jonas Borgström to Christian Boos

Correct :)

If the fix ends up being simple enough, maybe we can backport it to 0.9-stable.

comment:3 by anonymous, 14 years ago

Cc: sgeertgens@… added

comment:4 by Christian Boos, 13 years ago

Keywords: mimeview added
Milestone: 0.100.11
Severity: normalminor

Maybe the changes from #3332 will also solve this issue. If not, post-poning to 0.11.

comment:5 by anonymous, 13 years ago

Cc: yellen@… added

comment:6 by Matthew Good, 13 years ago

#4206 has been marked as a duplicate.

comment:7 by anonymous, 13 years ago

Cc: sascha@… added

comment:8 by osimons, 12 years ago

Confirmed this still happens. svn:mimetype set to application/octet-stream for a pdf file. Subversion client will not let me diff (marked as binary), but commit change and Trac will show the diff anyway.

comment:9 by Christian Boos, 12 years ago

Milestone: 0.11.10.12
Priority: normalhigh

See also #3428 and googlegroups:trac-users:f142b7822fe9df40

Moving to 0.12, as this is related to #3332.

comment:10 by ybarthelemy@…, 11 years ago

Cc: ybarthelemy@… added

comment:11 by sascha@…, 11 years ago

Cc: sascha@… removed

comment:12 by thorsten.holtkaemper@…, 11 years ago

Cc: thorsten.holtkaemper@… added

comment:13 by public@…, 11 years ago

Cc: public@… added

I ran into this problem to with Trac 0.11. (svn:mime-type set to either application/octet-stream or application/pdf, doesn't matter)

I created a simple work-around which makes the is_binary() function check wether the data it is supposed to check starts with '%PDF-' and returns True if it does. (I will attach the diff file.)

I assume that you do not want to apply that to the Development tree of Trac as there seem to be better solutions, but it may be useful for other users in the meantime. Maybe you even want to apply it to a maintenance release of Trac 0.11. (I use Trac to manage .tex and generated .pdf files, and the diff view is basically useless if you always have to skip pages of binary PDF changes…)

Regards, Milan Holzäpfel

by public@…, 11 years ago

Workaround for trac-0.11.2.1

comment:14 by Remy Blank, 11 years ago

Milestone: 0.130.11.4
Owner: changed from Christian Boos to Remy Blank

I'll try to fix this properly. I haven't looked at the code yet, but shouldn't the criterion for trying a diff based on the mime type be that it starts with text/?

comment:15 by Christian Boos, 11 years ago

Please take a look at r2108. Redoing this change, with the list coming from a [mimeview] treat_as_binary setting instead of being hard-coded would be fine.

There are many mime-types that are not starting with text/ and for which it's nevertheless interesting to have diffs for. Mostly XML stuff, like application/xslt+xml or image/svg+xml. But there are others (e.g. application/x-sh, model/vrml), so I don't think we can simply look for a xml substring.

As the complaints heard so far were mostly about PDF and RTF files, I think the treat_as_binary approach is good enough.

comment:16 by ybarthelemy@…, 11 years ago

Shouldn't application/octet-stream be added to the list of binary types as well ?

in reply to:  16 comment:17 by Christian Boos, 11 years ago

Replying to ybarthelemy@…:

Shouldn't application/octet-stream be added to the list of binary types as well ?

Yes, we should do that as well. Together with Subversion svn:mime-type, that would allow to mark any kind of file for being treated as binary, even if it's not.

Note that attachment:trac-mimetype-no-binary-diff.patch:ticket:3428 proposed a fix along the lines proposed by Remy in comment:14 above.

I also closed #7213 as duplicate.

comment:18 by joel@…, 11 years ago

Cc: joel@… added

I also have this problem…

comment:19 by Christian Boos, 11 years ago

Milestone: 0.11.40.11.5

comment:20 by Remy Blank, 10 years ago

Resolution: fixed
Status: newclosed

Fixed in [8120] by adding the suggested [mimeview] treat_as_binary option, defaulting to:

  • application/octet-stream
  • application/pdf
  • application/postscript
  • application/rtf

and using the option in the plain text and diff viewers.

comment:21 by treaves@…, 10 years ago

What version is this fixed in? It's still a defect in 0.11.4.

in reply to:  21 comment:22 by anonymous, 10 years ago

Replying to treaves@…:

What version is this fixed in? It's still a defect in 0.11.4.

Actually, it's still broken in 0.11.5

comment:23 by Christian Boos, 10 years ago

If you can provide us with two versions of a file that can be used to reproduce the problem, please do so and then reopen the ticket, we'll look at it.

comment:24 by anonymous, 10 years ago

Hmm… I wonder if this was due to stale info? After a reboot of my server (for an OS upgrade), I can no longer reproduce. Odd.

comment:25 by batrick, 10 years ago

Resolution: fixed
Status: closedreopened

i am running trac 0.11.5 on debian lenny. after manual easy_install, i was hoping to have treat_as_binary support. however, pdfs still appear in the changeset view.

what i tried

  • adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
  • adding mime_map = (…),application/pdf:pdf to the same section

result: unfortunately nothing. pdfs still appear under "view changes"

i don't really know how to program python, but to see what was going on i added some trace code around the treat_as_binary part of api.py, recompiled - and got nothing. so it seemed to me that the code was not even called (with ample salt, as i said i hardly know any python)

it would be great if this could be fixed - i would really like to use trac to follow up on my latex files, but have the pdfs in svn for quick downloads also.

thanks,

batrick

in reply to:  25 ; comment:26 by Remy Blank, 10 years ago

Milestone: 0.11.50.11.6

Replying to batrick:

  • adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
  • adding mime_map = (…),application/pdf:pdf to the same section

result: unfortunately nothing. pdfs still appear under "view changes"

I assume you haven't set the svn:mime-type property for your pdf files? The mime_map is not used in that situation, which is an omission on my part. I'll fix that.

in reply to:  26 comment:27 by batrick, 10 years ago

thanks for the quick reply.

I assume you haven't set the svn:mime-type property for your pdf files? The mime_map is not used in that situation, which is an omission on my part. I'll fix that.

i am not aware of the svn:mime-type property. could give a hint how it can be used to avoid the pdf diff'ing problem?

thanks, batrick

comment:28 by Remy Blank, 10 years ago

If you use Subversion from the command line, you can set the property to "application/pdf" on .pdf files with the following command:

svn pset svn:mime-type application/pdf *.pdf

If you use a GUI for that, you should look out for a "Properties" dialog, where you could add a property named "svn:mime-type" with the value "application/pdf" for all of your .pdf files.

Of course, this is only needed temporarily. Once this is fixed, you won't need the properties anymore (but they won't hurt, either).

comment:29 by Remy Blank, 10 years ago

Resolution: fixed
Status: reopenedclosed

Fixed in [8578], along with an environment reload bug.

comment:30 by Christian Boos, 10 years ago

Just a hint about a possible optimization: when we do

    old_content = old_node.get_content().read() 
    if mview.is_binary(old_node.content_type, old_node.path, 
                       old_content):
        return None

We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary. For binary files, this cost is usually not negligible.

So what about:

    if mview.is_binary(old_node.content_type, old_node.path):
        return None
    old_content = old_node.get_content().read() 
    if mview.is_binary(content=old_content):
        return None

in reply to:  30 comment:31 by Remy Blank, 10 years ago

Resolution: fixed
Status: closedreopened

Replying to cboos:

We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary.

Yeah, I know. My feeling was that this is only an issue for very large binary files, something that should be relatively rare in a Subversion repository. And, you know, premature optimization is the root of all evil ;-)

comment:32 by Remy Blank, 10 years ago

Resolution: fixed
Status: reopenedclosed

Suggestion implemented in [8585]. Thanks!

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Remy Blank.
The resolution will be deleted. Next status will be 'reopened'.
to as closed The owner will be changed from Remy Blank to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.