Edgewall Software
Modify

Ticket #2672 (closed defect: fixed)

Opened 6 years ago

Last modified 2 years ago

Trac is trying to diff binary content

Reported by: sgeertgens@… Owned by: rblank
Priority: high Milestone: 0.11.6
Component: version control/changeset view Version: 0.9.2
Severity: minor Keywords: mimeview
Cc: sgeertgens@…, yellen@…, ybarthelemy@…, thorsten.holtkaemper@…, public@…, joel@…
Release Notes:
API Changes:

Description

In one of my repositories (trac0.9.2+apache2.0.54+mod_python+svn1.2.3), I have a few folders filled with .PDF files. A recent commit added and modified a number of them (~20 files affected). When trying to view the changeset through Trac, Trac is trying to create text diffs for some of them (3-4 files)... at the same time, the apache process on the server pegs the CPU and things don't seem to recover without an Apache restart.

I've verified that all the .PDF files all have their mimetype set to application/octet-stream, and I'm unable to determine a difference between a PDF that Trac is treating correctly and a PDF that Trac is trying to diff as text.

Attachments

trac-0.11.2.1-dont_show_binary_pdf_differences.diff (460 bytes) - added by public@… 3 years ago.
Workaround for trac-0.11.2.1

Download all attachments as: .zip

Change History

comment:1 Changed 6 years ago by sgeertgens@…

C.Boos replied to my original query in the mailing list with the following:

The changeset module is only using the is_binary heuristic in
order to decide if there's something to diff or not.
We should probably first check the mimetype as we do in the
browser module.

Seems like it's worth to fill a ticket for that issue.

-- Christian

comment:2 Changed 6 years ago by cboos

  • Milestone set to 0.10
  • Owner changed from jonas to cboos

Correct :)

If the fix ends up being simple enough, maybe we can backport it to 0.9-stable.

comment:3 Changed 6 years ago by anonymous

  • Cc sgeertgens@… added

comment:4 Changed 6 years ago by cboos

  • Keywords mimeview added
  • Milestone changed from 0.10 to 0.11
  • Severity changed from normal to minor

Maybe the changes from #3332 will also solve this issue.
If not, post-poning to 0.11.

comment:5 Changed 6 years ago by anonymous

  • Cc yellen@… added

comment:6 Changed 5 years ago by mgood

#4206 has been marked as a duplicate.

comment:7 Changed 5 years ago by anonymous

  • Cc sascha@… added

comment:8 Changed 4 years ago by osimons

Confirmed this still happens. svn:mimetype set to application/octet-stream for a pdf file. Subversion client will not let me diff (marked as binary), but commit change and Trac will show the diff anyway.

comment:9 Changed 4 years ago by cboos

  • Milestone changed from 0.11.1 to 0.12
  • Priority changed from normal to high

See also #3428 and googlegroups:trac-users:f142b7822fe9df40

Moving to 0.12, as this is related to #3332.

comment:10 Changed 4 years ago by ybarthelemy@…

  • Cc ybarthelemy@… added

comment:11 Changed 4 years ago by sascha@…

  • Cc sascha@… removed

comment:12 Changed 4 years ago by thorsten.holtkaemper@…

  • Cc thorsten.holtkaemper@… added

comment:13 Changed 3 years ago by public@…

  • Cc public@… added

I ran into this problem to with Trac 0.11. (svn:mime-type set to either application/octet-stream or application/pdf, doesn't matter)

I created a simple work-around which makes the is_binary() function check wether the data it is supposed to check starts with '%PDF-' and returns True if it does. (I will attach the diff file.)

I assume that you do not want to apply that to the Development tree of Trac as there seem to be better solutions, but it may be useful for other users in the meantime. Maybe you even want to apply it to a maintenance release of Trac 0.11. (I use Trac to manage .tex and generated .pdf files, and the diff view is basically useless if you always have to skip pages of binary PDF changes...)

Regards,
Milan Holzäpfel

Changed 3 years ago by public@…

Workaround for trac-0.11.2.1

comment:14 Changed 3 years ago by rblank

  • Milestone changed from 0.13 to 0.11.4
  • Owner changed from cboos to rblank

I'll try to fix this properly. I haven't looked at the code yet, but shouldn't the criterion for trying a diff based on the mime type be that it starts with text/?

comment:15 Changed 3 years ago by cboos

Please take a look at r2108. Redoing this change, with the list coming from a [mimeview] treat_as_binary setting instead of being hard-coded would be fine.

There are many mime-types that are not starting with text/ and for which it's nevertheless interesting to have diffs for. Mostly XML stuff, like application/xslt+xml or image/svg+xml. But there are others (e.g. application/x-sh, model/vrml), so I don't think we can simply look for a xml substring.

As the complaints heard so far were mostly about PDF and RTF files, I think the treat_as_binary approach is good enough.

comment:16 follow-up: Changed 3 years ago by ybarthelemy@…

Shouldn't application/octet-stream be added to the list of binary types as well ?

comment:17 in reply to: ↑ 16 Changed 3 years ago by cboos

Replying to ybarthelemy@…:

Shouldn't application/octet-stream be added to the list of binary types as well ?

Yes, we should do that as well. Together with Subversion svn:mime-type, that would allow to mark any kind of file for being treated as binary, even if it's not.

Note that attachment:trac-mimetype-no-binary-diff.patch:ticket:3428 proposed a fix along the lines proposed by Remy in comment:14 above.

I also closed #7213 as duplicate.

comment:18 Changed 3 years ago by joel@…

  • Cc joel@… added

I also have this problem...

comment:19 Changed 3 years ago by cboos

  • Milestone changed from 0.11.4 to 0.11.5

comment:20 Changed 3 years ago by rblank

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in [8120] by adding the suggested [mimeview] treat_as_binary option, defaulting to:

  • application/octet-stream
  • application/pdf
  • application/postscript
  • application/rtf

and using the option in the plain text and diff viewers.

comment:21 follow-up: Changed 3 years ago by treaves@…

What version is this fixed in? It's still a defect in 0.11.4.

comment:22 in reply to: ↑ 21 Changed 3 years ago by anonymous

Replying to treaves@…:

What version is this fixed in? It's still a defect in 0.11.4.

Actually, it's still broken in 0.11.5

comment:23 Changed 3 years ago by cboos

If you can provide us with two versions of a file that can be used to reproduce the problem, please do so and then reopen the ticket, we'll look at it.

comment:24 Changed 3 years ago by anonymous

Hmm... I wonder if this was due to stale info? After a reboot of my server (for an OS upgrade), I can no longer reproduce. Odd.

comment:25 follow-up: Changed 2 years ago by batrick

  • Resolution fixed deleted
  • Status changed from closed to reopened

i am running trac 0.11.5 on debian lenny. after manual easy_install, i was hoping to have treat_as_binary support. however, pdfs still appear in the changeset view.

what i tried

  • adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
  • adding mime_map = (...),application/pdf:pdf to the same section

result: unfortunately nothing. pdfs still appear under "view changes"

i don't really know how to program python, but to see what was going on i added some trace code around the treat_as_binary part of api.py, recompiled - and got nothing. so it seemed to me that the code was not even called (with ample salt, as i said i hardly know any python)

it would be great if this could be fixed - i would really like to use trac to follow up on my latex files, but have the pdfs in svn for quick downloads also.

thanks,

batrick

comment:26 in reply to: ↑ 25 ; follow-up: Changed 2 years ago by rblank

  • Milestone changed from 0.11.5 to 0.11.6

Replying to batrick:

  • adding treat_as_binary = application/pdf to the [mimeview] section of trac.ini
  • adding mime_map = (...),application/pdf:pdf to the same section

result: unfortunately nothing. pdfs still appear under "view changes"

I assume you haven't set the svn:mime-type property for your pdf files? The mime_map is not used in that situation, which is an omission on my part. I'll fix that.

comment:27 in reply to: ↑ 26 Changed 2 years ago by batrick

thanks for the quick reply.

I assume you haven't set the svn:mime-type property for your pdf files? The mime_map is not used in that situation, which is an omission on my part. I'll fix that.

i am not aware of the svn:mime-type property. could give a hint how it can be used to avoid the pdf diff'ing problem?

thanks,
batrick

comment:28 Changed 2 years ago by rblank

If you use Subversion from the command line, you can set the property to "application/pdf" on .pdf files with the following command:

svn pset svn:mime-type application/pdf *.pdf

If you use a GUI for that, you should look out for a "Properties" dialog, where you could add a property named "svn:mime-type" with the value "application/pdf" for all of your .pdf files.

Of course, this is only needed temporarily. Once this is fixed, you won't need the properties anymore (but they won't hurt, either).

comment:29 Changed 2 years ago by rblank

  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed in [8578], along with an environment reload bug.

comment:30 follow-up: Changed 2 years ago by cboos

Just a hint about a possible optimization: when we do

    old_content = old_node.get_content().read() 
    if mview.is_binary(old_node.content_type, old_node.path, 
                       old_content):
        return None

We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary. For binary files, this cost is usually not negligible.

So what about:

    if mview.is_binary(old_node.content_type, old_node.path):
        return None
    old_content = old_node.get_content().read() 
    if mview.is_binary(content=old_content):
        return None

comment:31 in reply to: ↑ 30 Changed 2 years ago by rblank

  • Resolution fixed deleted
  • Status changed from closed to reopened

Replying to cboos:

We read the whole file eventually for nothing if the content_type and path are enough to decide if it's binary.

Yeah, I know. My feeling was that this is only an issue for very large binary files, something that should be relatively rare in a Subversion repository. And, you know, premature optimization is the root of all evil ;-)

comment:32 Changed 2 years ago by rblank

  • Resolution set to fixed
  • Status changed from reopened to closed

Suggestion implemented in [8585]. Thanks!

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
to The owner will be changed from rblank. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.