Edgewall Software
Modify

Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#8128 closed defect (fixed)

tracd with flup can't cope with spaces in URLs

Reported by: anonymous Owned by: Christian Boos
Priority: high Milestone: 0.11.4
Component: web frontend Version: 0.11.3
Severity: major Keywords: flup ajp wsgi whitespace
Cc: srevill@…, nick.messenger@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

Having installed mercurial-plugin-0.11 with trac 0.11.3 we've found that the Browse Source viewer cannot cope with filenames that contain spaces!

This is a real blocker for our intended use as it's quite common to have files under source control with spaces in their names.

I believe the fault to be in the backend.py source file of the mercurial-plugin hg sources, specifically, line 616 states:

if path in self.manifest: # then it's a file

whereas I suspect this test is failing - maybe path contains %20 stuff and the contents of self.manifest don't. I'm no Python programmer so who knows… I would expect the fix to look something like this:

if path.encode('utf-8') in self.manifest: # then it's a file

or similar.

Attachments (5)

backend.py (33.8 KB ) - added by srevill@… 16 years ago.
bugfixed backend.py source file
test_whitespace_t8128.bundle (937 bytes ) - added by Christian Boos 16 years ago.
small hg repository containing a file named "the z file"
output.txt (3.7 KB ) - added by srevill@… 16 years ago.
Output from suggested debug - also prints self.manifest for comparison
test_whitespace_t8128.bbundle (1.2 KB ) - added by Christian Boos 16 years ago.
another try …
t8128-tracd-unquote-r7943.patch (1.9 KB ) - added by Christian Boos 16 years ago.
revised patch adding explicit unquoting

Download all attachments as: .zip

Change History (28)

comment:1 by srevill@…, 16 years ago

Oops. Forgot to add my email address.

by srevill@…, 16 years ago

Attachment: backend.py added

bugfixed backend.py source file

comment:2 by srevill@…, 16 years ago

I've found the cause of the problem, which was pretty much as I suspected above. The attached file is a replacement for backend.py in the mercurial-plugin sources. I'll let you decide if you want to take it.

comment:3 by Christian Boos, 16 years ago

Cc: srevill@… added
Component: version control/browserplugin/mercurial
Keywords: whitespace needinfo added; mercurial spaces sources removed
Owner: set to Christian Boos

Well, "for me it works" ;-) I added a file containing some white spaces "the z file" and access it with /browser/the%20z%20file and I get to the file.

Your change does:

  • tracext/hg/backend.py

     
    1414import time
    1515import posixpath
    1616import re
     17import urllib
    1718
    1819from genshi.builder import tag
    1920
     
    613614
    614615    def _init_path(self, log, path):
    615616        kind = None
    616         if path in self.manifest: # then it's a file
     617
     618        if urllib.unquote(path) in self.manifest: # then it's a file
    617619            kind = Node.FILE
    618             self.filectx = self.repos.repo.filectx(path, self.n)
     620            self.filectx = self.repos.repo.filectx(urllib.unquote(path), self.n)
    619621            log_rev = self.filectx.linkrev()
    620622            node = log.node(log_rev)
    621623        else: # it will be a directory if there are matching entries

but I wonder how you manage to get an URL encoded path in the first place.

It would help to get more information about what you're really doing. Maybe you could do the following:

  • tracext/hg/backend.py

     
    613613
    614614    def _init_path(self, log, path):
    615615        kind = None
     616
     617        import traceback
     618        print '*** _init_path:', repr(path)
     619        traceback.print_stack()
     620
    616621        if path in self.manifest: # then it's a file
    617622            kind = Node.FILE
    618623            self.filectx = self.repos.repo.filectx(path, self.n)

In my test, I get the following output:

*** _init_path: 'the z file'

I attach my test repository as a bundle here, I'd be interested if you could reproduce the problem with that repo. If not, then please create a test repository that exhibits the problem and attach it to this as well.

Maybe you're using some kind of extension?

by Christian Boos, 16 years ago

small hg repository containing a file named "the z file"

comment:4 by srevill@…, 16 years ago

Thanks for the stuff - I'll give it a try. Here's the background. We created a small test repository and a couple of files have spaces in their names. If I go to "Browse Source" then click on my "Filename with spaces" file (without my patch) then I get an error saying: "No node Filename%2520with%2520spaces at revision 5:4be7907d31b5".

I added debug yesterday in the area you suggest above and could see the following:

path= Filename%20with%20spaces manifest= flibble.h manifest= Filename with spaces manifest= flibble.c manifest= ISO-IEC 13818-1 (2000) (MPEG-2 Systems).pdf

clearly, the path variable contents are URL-ified but the manifest contents are not. My debug code didn't use repr() however, just straight "print" with stdio redirected to a file.

My fix (above) is actually insufficient because although it fixes the use case I've just described, you get the same error at the very next page if you click on the " Download in other formats → Original Format" link.

Looking at the source code, I expect a whole related range of failures if I were to start creating directories with spaces in their names, too.

As for what we're running, it's vanilla mercurial-plugin-0.11 with trac 0.11.3. We also added the textile egg and syntax colouring Pygments egg but these won't have anything to do with this issue.

I'll have a look at the stuff you've supplied and see what happens…

comment:5 by anonymous, 16 years ago

Sorry about the formatting above - I'm used to textile. (Would be helpful if there were an option to make that the default markup! ;)

by srevill@…, 16 years ago

Attachment: output.txt added

Output from suggested debug - also prints self.manifest for comparison

comment:6 by srevill@…, 16 years ago

I've added the output from your suggested debug code. Note: I included an additional line:

print '*** self.manifest:', repr(self.manifest)

so that you can see why the 'in' comparison will always fail. To generate this output, I did my usual: click "Browse source" then click on the "Filename with spaces" file link.

I'll try with your bundle next.

comment:7 by srevill@…, 16 years ago

Any chance you could use an archive format that isn't MacOS-specific? I'm using Linux here and can't find any way to unpack that bundle. Ta.

by Christian Boos, 16 years ago

another try …

comment:8 by Christian Boos, 16 years ago

The previous .bundle was created with hg bundle -a -t gzip, but that format might be not supported by your version of Mercurial, so the .bbundle one is now using the default bzip2 format.

Re-create the repository by using the following commands:

$ hg init test-rename
$ cd test-rename
$ hg unbundle ../test_whitespace_t8128.bbundle 

But anyway, the problem is not with your or mine repository, but with the web front-end you're using. By looking at output.txt, I see build/bdist.linux-x86_64/egg/flup/server/ajp_base.py, what kind of deployment is this?

If you avoid using either trac/web/modpython_frontend.py or trac/web/wsgi.py, then the PATH_INFO won't be URL unquoted:

./modpython_frontend.py:0088:             environ['PATH_INFO'] = urllib.unquote(request_uri[len(root_uri):])
--------------------------------
./modpython_frontend.py: 1 match

./wsgi.py:0140:         environ['PATH_INFO'] = urllib.unquote(path_info)
------------------
./wsgi.py: 1 match

… which explains the problem you're seeing.

comment:9 by bavison@…, 16 years ago

what kind of deployment is this?

I should admit to having had a hand in the installation. None of us are python experts, so we were following the instructions at http://trac.edgewall.org/wiki/TracOnRhel5 as closely as possible. We're running Trac on an AMD64 RHEL 5 server, using Apache as the web frontend, using flup as an AJP to WSGI gateway as described on the web page. Relevant versions are

  • python 2.4.3-24
  • httpd 2.2.3-22
  • trac 0.11.3-1
  • mercurial 1.1.2-1
  • TracMercurial 0.11.0.7
  • flup 1.0.1
  • python-setuptools 0.6c6 (needed to be upgraded manually, required by flup)

comment:10 by Christian Boos, 16 years ago

Component: plugin/mercurialweb frontend
Owner: Christian Boos removed

Ok, I've never tried something like that, but it's definitely missing the unquote part, and not only for TracMercurial but in general. So expect the same kind of issues with Wiki page names, attachments, etc. containing spaces.

I see three possibilities:

  • ask on the MailingList for help, maybe other people are using this kind of deployment (as a variant of this, you could also try to e-mail to the author of the TracOnRhel5 page, see the History link there)
  • if I'm not mistaken, tracd can talk to mod_proxy_ajp, using the --protocol ajp switch. It'll also use flup but provides it with a correct Trac middleware application which takes care of this unquote issue
  • try another, more "conventional", deployment, like TracModWsgi or TracModPython

comment:11 by srevill@…, 16 years ago

Thanks for the advice - we're looking into this. It's tricky for us to sort out because the server isn't a dedicated Trac installation so there are other things (Apache, Mercurial, user configuration, Bugzilla, etc) that we need to be careful not to upset.

Any advice that anyone can offer will be more thank welcome - how would we contact the author of the TracOnRhel5 page? The email address appears to be obscured.

in reply to:  11 comment:12 by Christian Boos, 16 years ago

Replying to srevill@…:

… how would we contact the author of the TracOnRhel5 page? The email address appears to be obscured.

Silly me ;P) I'll forward it to you.

Besides, one low risk change you could try is to modify your trac_server_wrapper to call directly tracd with the --protocal ajp flag (and other relevant options like the port, see TracStandalone), instead of that ajp_to_wsgi_gateway program.

comment:13 by bavison@…, 16 years ago

I can report that if we run tracd in AJP mode and redirect URLs to it from Apache exactly as we did with the ajp_to_wsgi_gateway, then we get exactly the same problem with spaces in filenames, page names, milestone names etc. Perhaps that points to it being a bug in flup (since you said tracd —protocol=ajp uses flup too)?

As yet another approach, I've tried running tracd —protocol=http. This fixes the spaces, but is unusable because user authentication doesn't work. http://groups.google.com/group/trac-users/browse_thread/thread/a80be94fd53c9bbe/f8c52d2ce74d62c9?hl=en&ie=UTF-8&oe=utf-8&q=HTDigest%2C+mod_proxy_rewrite%2C+and+infinite+loops+on+login describes identical symptoms, but never seems to have been followed up on.

comment:14 by bavison@…, 16 years ago

Sorry, the link above doesn't go straight to the post in question - search that page for "invalid nonce" and you'll find it.

comment:15 by Christian Boos, 16 years ago

Milestone: 0.11.4
Owner: set to Christian Boos
Status: newassigned
Summary: TracMercurial viewer can't cope with spaces in filenames!!tracd with flup can't cope with spaces in URLs

Please try:

  • trac/web/standalone.py

     
    2323import os
    2424import sys
    2525from SocketServer import ThreadingMixIn
     26import urllib
    2627
    2728from trac import __version__ as VERSION
    2829from trac.util import autoreload, daemon
     
    7071        return self.application(environ, start_response)
    7172
    7273
     74class FlupMiddleware(object):
     75
     76    def __init__(self, application):
     77        self.application = application
     78
     79    def __call__(self, environ, start_response):
     80        environ['PATH_INFO'] = urllib.unquote(environ.get('PATH_INFO', ''))
     81        return self.application(environ, start_response)
     82
     83
    7384class TracEnvironMiddleware(object):
    7485
    7586    def __init__(self, application, env_parent_dir, env_paths, single_env):
     
    236247        def serve():
    237248            server_cls = __import__('flup.server.%s' % options.protocol,
    238249                                    None, None, ['']).WSGIServer
     250            wsgi_app = FlupMiddleware(wsgi_app)
    239251            ret = server_cls(wsgi_app, bindAddress=server_address).run()
    240252            sys.exit(ret and 42 or 0) # if SIGHUP exit with status 42

I haven't tested myself, but that should fix the issue with whitespace.

Also, can you confirm that this tracd/ajp combination can do user authentication when tracd/http can't?

comment:16 by Christian Boos, 16 years ago

Keywords: needinfo removed
Milestone: 0.11.60.11.4

Eventually for 0.11.4, if positive feedback given in time ;-)

comment:17 by Christian Boos, 16 years ago

Wait, looks like there's something that can be done on the Apache side, see http://tomcat.apache.org/connectors-doc/webserver_howto/printer/apache.html#Forwarding:

JkOptions     +ForwardURICompat

I'll write an updated patch which makes the FlupMiddleware conditional: -Q will prevent unquoting, like ajp-wsgi does.

by Christian Boos, 16 years ago

revised patch adding explicit unquoting

comment:18 by Christian Boos, 16 years ago

As you've figured out, the previous patch from comment:15 had an issue, which is fixed in t8128-tracd-unquote-r7943.patch.

Also, I kept the old behavior the default one, as I think it's quite possible that some other flup backends don't require an explicit unquoting on our side. For example, maybe it's not needed for the fcgi protocol and it's not needed for the older mod_jk.

The unquoting will happen only when the new --unquote option is specified. and that seems to be needed when you have:

 JkOptions     +ForwardURIProxy

which is the default in the newer mod_jk.

comment:19 by bavison@…, 16 years ago

I couldn't see any difference when I applied JkOptions +ForwardURICompat, but the updated patch works perfectly! Many thanks.

For reference, the AJP gateway described in the TracOnRhel5 page is different code and is unaffected by the patch. But I think we'll be more than happy to move over to using tracd.

comment:20 by Jonas Borgström, 16 years ago

The patch looks good and should be safe since it's not activated by default anyway.

I've committed it in [7953] to make sure it would make it into 0.11.4rc1.

Any more unresolved issues, or can we close this ticket now?

comment:21 by Christian Boos, 16 years ago

Keywords: flup ajp wsgi added
Resolution: fixed
Severity: criticalmajor
Status: assignedclosed

I think it's fine now, thanks for the review jonas. It might be interesting to know for which other protocols the --unquote flag is needed (scgi? fcgi?).

Ben & Steve, could one of you please update the TracOnRhel5 page so that it reflects your working setup based on tracd --protocol=ajp --unquote, instead of (or in complement to) the ajp_to_wsgi_gateway approach described there?

comment:22 by nick.messenger@…, 16 years ago

Cc: nick.messenger@… added

I ran into this using AJP and IIS 6.0. Is the patch applicable to v0.11.2.1? I'll try it out on Monday. The log seems to suggest it'll work.

in reply to:  22 comment:23 by Christian Boos, 16 years ago

Replying to nick.messenger@…: Yes, this fix should probably be also interesting for the IIS/ajp deployment. But the easiest thing for you should be to simply upgrade to 0.11.4rc1 (so far, no issues was found with the rc1, so it looks like 0.11.4 final has a good chance to stay the same as rc1).

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.