Opened 2 years ago
Last modified 3 months ago
#10267 new defect
Version controls diffs of large text files kill the system
| Reported by: | anonymous | Owned by: | |
|---|---|---|---|
| Priority: | high | Milestone: | next-stable-1.0.x |
| Component: | version control/changeset view | Version: | 0.12.2 |
| Severity: | major | Keywords: | diff |
| Cc: | |||
| Release Notes: | |||
| API Changes: | |||
Description
We have some large text files (they are some data files we deal with) and when you try to view "last revision" or a revision with them it bombs (i.e. the service's cpu usage goes to 100%, and things come to a screeching halt).
Looks like the issue is in: versioncontrol/web_ui/changeset.py
The _content_changes method tries to do a diff of anything that isn't a binary. This is fine for most text files, unless they are big (like even 5 to 10 megs seems a bit much for the diffing method used). Anyway, I simply added the following to _content_changes:
if (len(old_content) > 100000 or len(new_content) > 100000):
return None
(Sorry if it isn't proper pep8)
It looks like you have a max_diff_bytes but that only effects the diff results. Ideally you would want a threshold before you have the diff results.
Attachments (0)
Change History (3)
comment:1 Changed 2 years ago by cboos
- Component changed from general to version control/changeset view
- Keywords diff added
- Milestone set to next-major-0.1X
- Priority changed from normal to high
- Severity changed from normal to major
comment:2 Changed 23 months ago by anonymous
Same problem here. Changeset view fails for an update to a huge sql dump. The suggested patch works very well for me. I suggest to include the fix in the next minor update.
comment:3 Changed 9 months ago by cboos
- Milestone changed from next-major-releases to next-stable-1.0.x
We indeed need another threshold, like max_bytes_for_diff. And it probably should take into account the cumulative size, otherwise you'd still have a problem with 10 1MB files if your threshold is at 2MB …



Incidentally I faced the very same problem this morning…
When I didn't get any response in the browser, I logged on the server and saw the httpd process stuck at 100% CPU. With gdb, I couldn't figure out the problem… because there was no error, just intensive processing in order to generate a diff for an XML file that grew from 70k lines to 100k lines, as I found out after waiting long enough ;-)