Edgewall Software

Opened 5 years ago

Last modified 7 months ago

#10267 new defect

Version controls diffs of large text files kill the system

Reported by: anonymous Owned by:
Priority: high Milestone: next-stable-1.0.x
Component: version control/changeset view Version: 0.12.2
Severity: major Keywords: diff
Release Notes:
API Changes:

Description (last modified by Ryan J Ollos)

We have some large text files (they are some data files we deal with) and when you try to view "last revision" or a revision with them it bombs (i.e. the service's cpu usage goes to 100%, and things come to a screeching halt).

Looks like the issue is in: versioncontrol/web_ui/changeset.py

The _content_changes method tries to do a diff of anything that isn't a binary. This is fine for most text files, unless they are big (like even 5 to 10 megs seems a bit much for the diffing method used). Anyway, I simply added the following to _content_changes:

if (len(old_content) > 100000 or len(new_content) > 100000):
    return None

(Sorry if it isn't proper pep8)

It looks like you have a max_diff_bytes but that only effects the diff results. Ideally you would want a threshold before you have the diff results.

Attachments (0)

Change History (5)

comment:1 Changed 5 years ago by Christian Boos

Component: generalversion control/changeset view
Keywords: diff added
Milestone: next-major-0.1X
Priority: normalhigh
Severity: normalmajor

Incidentally I faced the very same problem this morning…

When I didn't get any response in the browser, I logged on the server and saw the httpd process stuck at 100% CPU. With gdb, I couldn't figure out the problem… because there was no error, just intensive processing in order to generate a diff for an XML file that grew from 70k lines to 100k lines, as I found out after waiting long enough ;-)

comment:2 Changed 5 years ago by anonymous

Same problem here. Changeset view fails for an update to a huge sql dump. The suggested patch works very well for me. I suggest to include the fix in the next minor update.

comment:3 Changed 3 years ago by Christian Boos

Milestone: next-major-releasesnext-stable-1.0.x

We indeed need another threshold, like max_bytes_for_diff. And it probably should take into account the cumulative size, otherwise you'd still have a problem with 10 1MB files if your threshold is at 2MB …

comment:4 Changed 15 months ago by Jun Omae

Small improvement for diff_div.html in [537484b1d/jomae.git]. Before the changes, check of that diff.style is sidebyside/inline is executed for each diff line. We could reduce times of the check to just once.

comment:5 Changed 14 months ago by Ryan J Ollos

Description: modified (diff)

Modify Ticket

Change Properties
Set your email in Preferences
as new The ticket will remain with no owner.
The ticket will be disowned. Next status will be 'new'.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.