Edgewall Software
Modify

Opened 3 years ago

Last modified 3 weeks ago

#10267 new defect

Version controls diffs of large text files kill the system

Reported by: anonymous Owned by:
Priority: high Milestone: next-stable-1.0.x
Component: version control/changeset view Version: 0.12.2
Severity: major Keywords: diff
Cc:
Release Notes:
API Changes:

Description

We have some large text files (they are some data files we deal with) and when you try to view "last revision" or a revision with them it bombs (i.e. the service's cpu usage goes to 100%, and things come to a screeching halt).

Looks like the issue is in: versioncontrol/web_ui/changeset.py

The _content_changes method tries to do a diff of anything that isn't a binary. This is fine for most text files, unless they are big (like even 5 to 10 megs seems a bit much for the diffing method used). Anyway, I simply added the following to _content_changes:

if (len(old_content) > 100000 or len(new_content) > 100000):

return None

(Sorry if it isn't proper pep8)

It looks like you have a max_diff_bytes but that only effects the diff results. Ideally you would want a threshold before you have the diff results.

Attachments (0)

Change History (4)

comment:1 Changed 3 years ago by cboos

  • Component changed from general to version control/changeset view
  • Keywords diff added
  • Milestone set to next-major-0.1X
  • Priority changed from normal to high
  • Severity changed from normal to major

Incidentally I faced the very same problem this morning…

When I didn't get any response in the browser, I logged on the server and saw the httpd process stuck at 100% CPU. With gdb, I couldn't figure out the problem… because there was no error, just intensive processing in order to generate a diff for an XML file that grew from 70k lines to 100k lines, as I found out after waiting long enough ;-)

comment:2 Changed 3 years ago by anonymous

Same problem here. Changeset view fails for an update to a huge sql dump. The suggested patch works very well for me. I suggest to include the fix in the next minor update.

comment:3 Changed 2 years ago by cboos

  • Milestone changed from next-major-releases to next-stable-1.0.x

We indeed need another threshold, like max_bytes_for_diff. And it probably should take into account the cumulative size, otherwise you'd still have a problem with 10 1MB files if your threshold is at 2MB …

comment:4 Changed 6 weeks ago by jomae

Small improvement for diff_div.html in [537484b1d/jomae.git]. Before the changes, check of that diff.style is sidebyside/inline is executed for each diff line. We could reduce times of the check to just once.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.