Edgewall Software
Modify

Opened 13 years ago

Last modified 3 years ago

#4474 new defect

diffing two large trees results in a massive list with a lot of empty links

Reported by: theultramage@… Owned by:
Priority: normal Milestone: next-stable-1.2.x
Component: version control/changeset view Version: devel
Severity: normal Keywords: svn diff
Cc: nick-trac@… Branch:
Release Notes:
API Changes:

Description (last modified by theultramage@…)

If I run a manual diff of two large directories in different branches (600 files, 15MB), instead of directly showing all the diffs, it displays a list of links to the diffs of individual files.

That wouldn't be too bad (it's probably intended to reduce browser load but I'd like to be able to customize the behavior somehow - help!), but in the above case, many of the links lead to empty diffs, which is obviously wrong. Please fix it so it only links to non-empty diffs.

Attachments (0)

Change History (27)

in reply to:  description comment:1 by Emmanuel Blot, 13 years ago

Replying to theultramage@gmail.com:

That wouldn't be too bad (it's probably intended to reduce browser load but I'd like to be able to customize the behavior somehow - help!),

About this peculiar point: did you set up the Trac parameters as described in TracIni#changeset-section (max_diff_*)?

comment:2 by theultramage@…, 13 years ago

Yes I noticed that one shortly after, and it works for that particular subdirectory. (although when I ran it on the root dir it rattled for a while and then gave up with no error messages or anything :)

comment:3 by Christian Boos, 13 years ago

Keywords: svn diff added
Milestone: 0.11

Yes, this is a real issue in SubversionRepository.get_changes() that I still need to investigate.

comment:4 by theultramage@…, 13 years ago

This was fixed sometime in january, probably by r4548 (but not sure).

in reply to:  4 ; comment:5 by Christian Boos, 13 years ago

Replying to theultramage@gmail.com:

This was fixed sometime in january, probably by r4548 (but not sure).

Ah, then we were not speaking of the same thing. I noticed that this happened when comparing different branches, and that spurious diffs were appearing for binaries, like in, e.g.

diff:trunk/htdocs@4700//sandbox/security/htdocs@4700

(those are actually the same pictures)

comment:6 by theultramage@…, 13 years ago

Hm, in my case it was probably caused by the tree-diffing engine not skipping empty diffs (see example, the first file). This doesn't happen anymore on trunk. Maybe a case needs to be added for binary files?

comment:7 by Christian Boos, 13 years ago

Ok thanks for the example, it helps to understand the issue.

comment:8 by anonymous, 13 years ago

Update; the situation still happens, but it seems that only for larger diffs. I needed to do a diff of two entire branches (with almost the same content) today, and it still gave me a huge list with "(No files)" inside. I hope it is reproducible.

comment:9 by Christian Boos, 13 years ago

Milestone: 0.110.12

The bug will probably stay until after the 0.11 release, but could probably be then backported to 0.11-stable.

comment:10 by nick-trac@…, 13 years ago

Cc: nick-trac@… added

comment:11 by mark.freund@…, 11 years ago

Cc: mark.freund@… added

in reply to:  5 comment:12 by david.mccabe@…, 11 years ago

Replying to cboos:

I noticed that this happened when comparing different branches, and that spurious diffs were appearing for binaries

I'm having this issue. svn_repos_dir_delta returns a difference for binary files that are not straight copies, even if the contents are identical.

My local Trac (0.11) has a patch for changeset.py:

old_content = old_node.get_content().read()
new_content = new_node.get_content().read()
if is_binary(old_content) or is_binary(new_content):
    if old_content == new_content:
        return []
    return None

comment:13 by Christian Boos, 11 years ago

Milestone: 0.130.12

Great, thanks for sharing this information. Ideally we'd need to be able to only compare checksums here (e.g. svn_fs_file_checksum for Subversion).

comment:14 by theultramage@…, 11 years ago

I re-tested this with latest trunk (r7757). Inputs:

Outputs:

  • A huge list of links <file path (view diffs)>
  • Files: 8 added, 2 removed, 571 modified
  • 402 files are binary identical (according to 'find . | xargs md5')

So trac skipped over some identical files (as it should have), but still reported hundreds of them. A lot of the 'view diffs' links lead to a page that only says '(No files)'. Therefore the issue still exists.

One potential workaround might be increasing the 'max_diff_bytes' config option to a very high value; I faintly recall this helping in cases of sufficiently small dirs.

comment:15 by theultramage@…, 11 years ago

While investigating an unrelated issue, I had an idea… could the following be the cause?

Trac starts to diff the target files one by one, eliminating identical files. In the middle of processing, it detects that it has already consumed max_diff_bytes of input. Then it doesn't continue, and instead just builds a list of all remaining files with links to individual files, for manual diffing. This would be what I was mentioning above.

Of course, such output that just lists (almost) all files has zero informational value. Not sure what to do about it though, except for bumping max_diff_bytes to 2 * sizeof(repository) …

comment:16 by Dave Matthews, 11 years ago

We've just hit this problem (using 0.11.2.1). The scenario is as follows:

  • We're trying to compare the differences between a branch and the trunk.
  • Compared with the base of the branch, there are 618 files modified (all text files, not binary files).
  • Compared with the latest version of the trunk there are only 7 files modified. ie. The branch contains only 6 modified files but there were lots of changes made to the trunk after the branch was created which have now been merged in.
  • If I try to display the differences between the branch and the latest version of the trunk in Trac then:
    • If I set max_diff_files to 618 (and set max_diff_bytes to be large) then the 7 modified files are shown correctly.
    • If I reduce max_diff_files to 617 then I get 618 modified files reported (with 611 of the "view diffs" links pointing to a page saying "No files").

I don't understand why the number of files modified on the branch is relevant here. Subversion happily reports 7 modified files when you compare the 2 URLs - isn't Trac using this?

comment:17 by theultramage@…, 11 years ago

Trac doesn't use the revision history for diffing. Whenever you request a diff of two files, it must manually compute the diff. If you have 1000 files in each directory you're comparing, trac has to diff each of those. Even if the difference between them is minimal, trac doesn't know that unless it actually diffs the files.

The max diff size and max diff files settings are there to limit the complexity of requests users can make. If there are more files to process than the limit allows, trac will just give up when it hits the limit and will dump the rest as unprocessed links that you can click on to request further diffs (on a per-file basis).

The way how trac displays the partial results is very awkward. Nonetheless, these settings provide at least some form of prevention against overloading the server. In our case, displaying of multi-file diff pages was disabled entirely. It's an inconvenience, but it lets us work with trac without long page loading delays.

in reply to:  17 comment:18 by Dave Matthews, 11 years ago

Replying to theultramage@…:

Trac doesn't use the revision history for diffing. Whenever you request a diff of two files, it must manually compute the diff. If you have 1000 files in each directory you're comparing, trac has to diff each of those. Even if the difference between them is minimal, trac doesn't know that unless it actually diffs the files.

The max diff size and max diff files settings are there to limit the complexity of requests users can make. If there are more files to process than the limit allows, trac will just give up when it hits the limit and will dump the rest as unprocessed links that you can click on to request further diffs (on a per-file basis).

I don't understand this. Our project tree contains several thousand files. We frequently show the differences between the trunk and a branch and this has always worked fine. It is only in this case, where a large number of files have been modified on the branch, that we have had a problem. Also, it is this number of files modified which seems important, not the number of files in the tree.

in reply to:  16 comment:19 by Dave Matthews, 11 years ago

A bit of further info …

Replying to Dave Matthews:

  • Compared with the latest version of the trunk there are only 7 files modified. ie. The branch contains only 6 modified files but there were lots of changes made to the trunk after the branch was created which have now been merged in.
  • If I try to display the differences between the branch and the latest version of the trunk in Trac then:
    • If I set max_diff_files to 618 (and set max_diff_bytes to be large) then the 7 modified files are shown correctly.

I was getting confused here as to whether there are 6 or 7 files modified and I now realise why. There are, in fact, only 6 files modified. Even after increasing max_diff_files Trac still shows 7 files modified. The extra file is a binary file which ties in with earlier comments about spurious diffs for binaries. So, are there 2 separate issues here or are they related?

comment:20 by Remy Blank, 10 years ago

Milestone: 0.12next-minor-0.12.x

comment:21 by mark.freund@…, 9 years ago

Cc: mark.freund@… removed

comment:22 by Christian Boos, 8 years ago

#10662 was closed as duplicate of this bug. It contains a reproduction recipe involving text files (i.e. not binaries).

in reply to:  17 comment:23 by Andrew.P.Keen@…, 8 years ago

Replying to theultramage@…:

Trac doesn't use the revision history for diffing. Whenever you request a diff of two files, it must manually compute the diff. If you have 1000 files in each directory you're comparing, trac has to diff each of those. Even if the difference between them is minimal, trac doesn't know that unless it actually diffs the files.

Why doesn't trac to do something like "svn diff --summarize <old_path>@<old> <new_path>@<new>" to satisfy the diff query?

comment:24 by Ryan J Ollos, 5 years ago

Milestone: next-minor-0.12.xnext-stable-1.0.x

comment:25 by theultramage@…, 5 years ago

Description: modified (diff)

Removed dead url; at this point I think the behavior is understood well enough to reproduce.

comment:26 by Ryan J Ollos, 5 years ago

Owner: Christian Boos removed

comment:27 by Ryan J Ollos, 3 years ago

Milestone: next-stable-1.0.xnext-stable-1.2.x

Moved ticket assigned to next-stable-1.0.x since maintenance of 1.0.x is coming to a close. Please move the ticket back if it's critical to fix on 1.0.x.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
The ticket will be disowned. Next status will be 'new'.
as The resolution will be set. Next status will be 'closed'.
The owner will be changed from (none) to anonymous. Next status will be 'assigned'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.