Edgewall Software
Modify

Opened 8 years ago

Closed 8 years ago

#10225 closed enhancement (fixed)

Provide Python hooks for Mercurial to call trac-admin

Reported by: Remy Blank Owned by: Remy Blank
Priority: normal Milestone: plugin - mercurial
Component: plugin/mercurial Version:
Severity: normal Keywords: review
Cc: eric@… Branch:
Release Notes:
API Changes:

Description

Setting up hooks in Mercurial to call trac-admin $ENV changeset commands is reasonably simple on Unix, but quite difficult on Windows due to the absence of xargs.

We could simplify the setup by providing Python functions for the commit and changegroup in MercurialPlugin. This would make for a very simple setup when the plugin is installed globally:

[hooks]
commit = python:tracext.hg.hooks:commit
changegroup = python:tracext.hg.hooks:changegroup

Or, if the plugin is installed per environment, people could download the file hooks.py from here and use:

[hooks]
commit = python:/path/to/hooks.py:commit
changegroup = python:/path/to/hooks.py:changegroup

Of course, the hooks should be cross-platform. The path to the trac-admin command should be configurable in .hgrc.

Attachments (1)

t10255-use-revs-r10718.patch (2.4 KB ) - added by Christian Boos 8 years ago.
add all revs for the changegroup and optionally use directly Trac's API for the notification

Download all attachments as: .zip

Change History (14)

comment:1 by eric@…, 8 years ago

Cc: eric@… added

comment:2 by psuter <petsuter@…>, 8 years ago

I ended up using the incoming hook instead of changegroup to avoid the need for xargs. Are there any downsides to that?

comment:3 by eric@…, 8 years ago

I don't see any downsides if it works … but it still won't fully solve my problem at least. I'm running Trac in Apache on a Windows box, and Windows doesn't like xargs or $HG_NODE. I tried subbing %HG_NODE% instead without any success, which is why I'm hoping an actual commit/changegroup hook scripted in Python will help.

in reply to:  2 comment:4 by Remy Blank, 8 years ago

Replying to psuter <petsuter@…>:

I ended up using the incoming hook instead of changegroup to avoid the need for xargs. Are there any downsides to that?

Yes, a pretty major one. The incoming hook is called once for every new changeset. The changegroup hook is called once with the first new changeset (and the other ones can be determined from the changelog). So with the former, you have to call trac-admin once for every changeset, whereas the latter allows a single call to trac-admin with a list of revisions. Considering that the setup phase of a Trac environment is very expensive, there's a major performance gain in using changegroup.

I'm almost done, and I will need some testers on Windows :)

comment:5 by Remy Blank, 8 years ago

Keywords: review added

A hook function has been added to TracMercurial in [10718]. It has been tested on Linux, and I need some volunteers to test on Windows. Copy the file hooks.py somewhere accessible, then configure the hooks in hgrc:

[hooks]
commit = python:C:\path\to\hooks.py:add_changesets
changegroup = python:C:\path\to\hooks.py:add_changesets

[trac]
env = C:\path\to\env
trac-admin = C:\path\to\trac-admin

comment:6 by Christian Boos, 8 years ago

Simple testing shows it works (on Windows), but it could still be improved.

First, using the chain of first children is not going to work in all cases, and using changectx.descendants as I first thought neither. Consider the following case, for the incoming sequence b, c, d where d is the merge of b and c (i.e. we have for example a -> b, a -> c, (b, c) -> d, with a being a changeset already present in the repository). If we start with node == b, we don't want to miss c even if it's not a descendant of b. Fortunately there's a much simpler way to get all the relevant changesets, because the hook is executed in a transaction (at least I verified for pull, I assume its the same for other means), so we have a guarantee that all changesets after node are part of the changegroup. So we'll need to use rev numbers here.

Second remark, using subprocess is one approach… and I understand you could sometimes want to notify environments with different versions of Trac for the same repository; in that situation using trac-admin is certainly the simpler way to go. But in the usual case where you have one version of Trac installed and reachable from the same version of Python used to run the hooks, you could as well call the relevant Trac Python code directly. I propose doing that when there's no trac-admin entry specified in the [trac] section of the .hgrc.

See follow-up patch, t10255-use-revs-r10718.patch.

Last edited 8 years ago by Christian Boos (previous) (diff)

by Christian Boos, 8 years ago

add all revs for the changegroup and optionally use directly Trac's API for the notification

in reply to:  6 ; comment:7 by Remy Blank, 8 years ago

Replying to cboos:

First, using the chain of first children is not going to work in all cases, and using changectx.descendants as I first thought neither.

To be honest, I took the code from next_rev() of the plugin, because I wrongly assumed that it was used to make the changelog. Using revision numbers makes sense, and allows increasing the default number of changesets per call to trac-admin. I tried to find out how Mercurial implemented its log command, but the code combines many different cases (with a file specification, with revision specification, …) that I couldn't find an easy way.

Second remark, using subprocess is one approach… and I understand you could sometimes want to notify environments with different versions of Trac for the same repository; in that situation using trac-admin is certainly the simpler way to go.

It's not really due to different versions of Trac, but rather for the situation where you run Trac instances in virtualenvs. I actually wanted to add some functionality to handle this case (instead of requiring to write a wrapper script for trac-admin), but I was afraid that it would be difficult to make it cross-platform.

But in the usual case where you have one version of Trac installed and reachable from the same version of Python used to run the hooks, you could as well call the relevant Trac Python code directly. I propose doing that when there's no trac-admin entry specified in the [trac] section of the .hgrc.

I'm not sure I like the idea. It means that we load a complete Trac environment into the process executing the operation, which is fine when you run the hg executable (local operations, ssh) but not so nice when you run hgweb. All your hgweb processes end up taking much more memory. The upside is that you only pay once for environment startup, which can be nice if you push a lot.

in reply to:  7 ; comment:8 by Christian Boos, 8 years ago

Replying to rblank:

But in the usual case where you have one version of Trac installed and reachable from the same version of Python used to run the hooks, you could as well call the relevant Trac Python code directly. I propose doing that when there's no trac-admin entry specified in the [trac] section of the .hgrc.

I'm not sure I like the idea. It means that we load a complete Trac environment into the process executing the operation, which is fine when you run the hg executable (local operations, ssh) but not so nice when you run hgweb. All your hgweb processes end up taking much more memory.

It's probably worth to measure this cost … and verify the env gets properly gc'ed. If memory usage appears to be a real issue, it's easy to use the alternative way with trac-admin. But if think it's nice to have at least the option to bypass the call to trac-admin.

The upside is that you only pay once for environment startup, which can be nice if you push a lot.

Well, only if we add use_cache=True to open_environment(), which I didn't do here.

in reply to:  8 ; comment:9 by Remy Blank, 8 years ago

Replying to cboos:

It's probably worth to measure this cost … and verify the env gets properly gc'ed. If memory usage appears to be a real issue, it's easy to use the alternative way with trac-admin. But if think it's nice to have at least the option to bypass the call to trac-admin.

IIRC, Python is notoriously bad at releasing memory back to the OS, so even if the environment is garbage-collected, the process will probably keep the memory around. Measuring should indeed clarify the issue.

Well, only if we add use_cache=True to open_environment(), which I didn't do here.

Shame, then I see no upside at all :)

I mean, in the configuration where everything is installed globally, the trac-admin executable should be in PATH, so the default value of trac-admin should already be fine.

in reply to:  9 ; comment:10 by Christian Boos, 8 years ago

Replying to rblank:

Replying to cboos:

It's probably worth to measure this cost … and verify the env gets properly gc'ed. If memory usage appears to be a real issue, it's easy to use the alternative way with trac-admin. But if think it's nice to have at least the option to bypass the call to trac-admin.

IIRC, Python is notoriously bad at releasing memory back to the OS, so even if the environment is garbage-collected, the process will probably keep the memory around.

Not on Windows ;-) But even without gc'ing the env, I'd be surprised if from one hook run to the next you'd have any noticeable memory increase.

Measuring should indeed clarify the issue.

Well, only if we add use_cache=True to open_environment(), which I didn't do here.

Shame, then I see no upside at all :)

For me the upside is to avoid having to launch an external process, pass a possibly long list of command line arguments, when the call could have been done instead directly in the current process, in one go. I indeed didn't think about long running processes, that's why I didn't bother using the cache. But if the memory impact is negligible, it might indeed be interesting to activate the cache.

I mean, in the configuration where everything is installed globally, the trac-admin executable should be in PATH, so the default value of trac-admin should already be fine.

Again, probably not on Windows.

in reply to:  10 comment:11 by Remy Blank, 8 years ago

Replying to cboos:

Not on Windows ;-) But even without gc'ing the env, I'd be surprised if from one hook run to the next you'd have any noticeable memory increase.

Not from one run to the next. You would get one increase on first run, and then constant usage (probably).

For me the upside is to avoid having to launch an external process, pass a possibly long list of command line arguments, when the call could have been done instead directly in the current process, in one go.

Ah, you've been lurking on Mercurial mailing list for too long, and are influenced by all the people moaning when they are told to call hg on the command-line to integrate Mercurial into their software.

In this particular case, I find the subprocess approach quite sensible: start a new process, use whatever memory is necessary, and release all of it to the OS on termination. Nice and clean.

But I don't mind having the option to call Trac directly. Choice is good ;) Patch applied in [10726].

Do you want me to add an option cache_env to allow using the environment cache?

comment:12 by Remy Blank, 8 years ago

Added environment variable expansion in paths in [10727].

comment:13 by Remy Blank, 8 years ago

Resolution: fixed
Status: newclosed

Added the Mercurial option cache_env in [10731]. Please reopen if there's anything else to be done.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Remy Blank.
The resolution will be deleted. Next status will be 'reopened'.
to as closed The owner will be changed from Remy Blank to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.