Edgewall Software

Ticket #2591 (closed enhancement: fixed)

Opened 3 years ago

Last modified 19 months ago

Option to limit the number of diff'ed files in the changeset view

Reported by: eblot Owned by: cboos
Priority: high Milestone: 0.10
Component: version control/changeset view Version: devel
Severity: major Keywords: diff tracdiff
Cc:

Description

It would be nice if the changeset view provided an option to limit the maximum number of files diff'ed:
When Trac attempts to show the changed files of a very large changeset (hundreds of files), it can take several minutes to complete the diff computation and send the result back to the web browser. This delay is quite unusual, and not predictable for the user as he/she does know before selecting a changeset the number of changed files. The maximum number of files value should therefore be persistent.

Attachments

detect_big_changesets.patch (3.9 KB) - added by cboos 3 years ago.
Detect big changesets and in this case don't show the diffs in the same page. Instead, add links to more specific diff views.
detect_big_changesets.2.patch (4.2 KB) - added by cboos 3 years ago.
Improved patch

Change History

Changed 3 years ago by cboos

  • keywords tracdiff added
  • owner changed from jonas to cboos
  • priority changed from low to high
  • milestone set to 0.10

That's an important feature to have for TracDiff too.

I'd suggest the following solution (copied from the ML thread):

[changeset]
max_diff_shown = 100

(suggestion for a better name welcomed)

If there are more than max_diff_shown actual diffs to show (Added/Deleted or Edits on Binary files wouldn't count), then no diffs would be inlined. Rather, the (n diff) link at the right of the file entry would link to the restricted changeset for that file.

Changed 3 years ago by cmlenz

I think it'd be more important to set a maximum size of files that are diffed... which are what actually takes up memory.

Also, we can improve the transcoding... currently we convert both versions to UTF-8 before performing the diff, which loads the complete file contents in memory. However, in 99% of the cases, the encoding of the two versions will match, so we'd only need to transcode to a common encoding when it differs.

These options should be investigated before limiting the number of files IMO.

Changed 3 years ago by cboos

See also #2343.

Changed 3 years ago by eblot

See also #2636, similar request for enhancement about the browser view

Changed 3 years ago by cboos

Marked #2700 as a duplicate (a too large HTML diff can even kill tracd).

Changed 3 years ago by eblot

  • severity changed from normal to major

I just ran once more into troubles with this issue:

Some users use the RSS feed to get the news about the SVN repository changes. When a changeset is "large enough", the Apache server takes ages to generate the changeset diff, which apparently ends up into deadlocking some of the threads of the Apache server.

This is not the most annoying part: as a side effect, the SVN repository (also run with the same Apache server) stops working: "svn copy URL URL" commands fail with the following cryptic message:

svn: MERGE request failed on '/svn/project'
svn: Failed to run '/local/svn/project/hooks/pre-commit' hook

The Apache server needs to be restarted to "fix up" this issue.

I'm increasing the severity of this ticket, as this issue may cause important side effects.

Changed 3 years ago by cboos

You mean the timeline RSS? There shouldn't be any diff generation in this situation. Do you use the changeset_show_file option? If so, try turning it off to see if this is the cause of the problem.

Perhaps it's a bug in Repository.get_changesets, try to monitor the behavior of that method.

Changed 3 years ago by eblot

About the timeline RSS:

Used from Thunderbird for example, when the user selects a news the preview pane shows the linked document. In this case (changeset event) the preview pane shows http://server/trac/project/changeset/N i.e. the changeset view, which implies the full diff of the changeset.

Changed 3 years ago by cboos

  • status changed from new to assigned

Please try out attachment:detect_big_changesets.patch

This make use of the following configuration variables

[changeset]
max_diff_size = 100000
max_diff_files = 1000

This gives two hints for detecting big changesets. If a changeset is determined to be big, then for each diff, only the link to the restricted changeset for that file will be shown.

In the future, that link could be changed to an AJAX query for retrieving only that diff dynamically.

Changed 3 years ago by cboos

Detect big changesets and in this case don't show the diffs in the same page. Instead, add links to more specific diff views.

Changed 3 years ago by eblot

I havn't extensively tested the patch, but AFAICT, it works great.

Thanks a lot !

Changed 3 years ago by eblot

Ooops, there is an issue with the patch:

When the change is about an added file or a removed file, the "full diff" URL is wrong: One of the path is correct, the other path is invalid (most of the time, it is a path to an unrelated file of the same changeset...)

Changed 3 years ago by cboos

Yep. I realized that too :)

There's now a second iteration of the patch available, which displays the full diff link only for modified files and also takes care of linking to the restricted changeset if the big changeset is really a changeset and not a more general diff.

Changed 3 years ago by cboos

Improved patch

Changed 3 years ago by eblot

I've applied the second patch. It seems to work ok. Thanks.

Changed 3 years ago by cboos

There's another alternative, which is to use max_preview_size (see #2343) to filter out big files, instead of relying on a highly approximative diff size estimate: that would give a mix of inlined changes for the files that fall below the limit and links to individual diffs for the big files above that limit. The global max_diff_files would still be used for avoiding huge changesets made of a lot a small changes.

Thoughts?

Changed 3 years ago by cboos

  • status changed from assigned to closed
  • resolution set to fixed

The above modification has been added to the attachment:detect_big_changesets.2.patch and checked in as r2968.

I keep in mind doing the optimization about not converting the files beforehand to UTF8 in case they share the same encoding, but the main problem should be fixed.

Changed 3 years ago by cboos

doing the optimization about not converting the files beforehand to UTF8 in case they share the same encoding,

in r3092 I decided to do the diffs in unicode, even if both contents share the same encoding. If we would do a diff keeping the original encodings, this could reintroduce problems like #2363, as different characters may share an identical starting sequence of bytes...

Add/Change #2591 (Option to limit the number of diff'ed files in the changeset view)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
to The owner will change from cboos. Next status will be 'closed'
 
Note: See TracTickets for help on using tickets.