#8128 closed defect (fixed)
tracd with flup can't cope with spaces in URLs
Reported by: | anonymous | Owned by: | Christian Boos |
---|---|---|---|
Priority: | high | Milestone: | 0.11.4 |
Component: | web frontend | Version: | 0.11.3 |
Severity: | major | Keywords: | flup ajp wsgi whitespace |
Cc: | srevill@…, nick.messenger@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
Having installed mercurial-plugin-0.11 with trac 0.11.3 we've found that the Browse Source viewer cannot cope with filenames that contain spaces!
This is a real blocker for our intended use as it's quite common to have files under source control with spaces in their names.
I believe the fault to be in the backend.py source file of the mercurial-plugin hg sources, specifically, line 616 states:
if path in self.manifest: # then it's a file
whereas I suspect this test is failing - maybe path contains %20 stuff and the contents of self.manifest don't. I'm no Python programmer so who knows… I would expect the fix to look something like this:
if path.encode('utf-8') in self.manifest: # then it's a file
or similar.
Attachments (5)
Change History (28)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
I've found the cause of the problem, which was pretty much as I suspected above. The attached file is a replacement for backend.py in the mercurial-plugin sources. I'll let you decide if you want to take it.
comment:3 by , 16 years ago
Cc: | added |
---|---|
Component: | version control/browser → plugin/mercurial |
Keywords: | whitespace needinfo added; mercurial spaces sources removed |
Owner: | set to |
Well, "for me it works" ;-)
I added a file containing some white spaces "the z file" and access it with /browser/the%20z%20file
and I get to the file.
Your change does:
-
tracext/hg/backend.py
14 14 import time 15 15 import posixpath 16 16 import re 17 import urllib 17 18 18 19 from genshi.builder import tag 19 20 … … 613 614 614 615 def _init_path(self, log, path): 615 616 kind = None 616 if path in self.manifest: # then it's a file 617 618 if urllib.unquote(path) in self.manifest: # then it's a file 617 619 kind = Node.FILE 618 self.filectx = self.repos.repo.filectx( path, self.n)620 self.filectx = self.repos.repo.filectx(urllib.unquote(path), self.n) 619 621 log_rev = self.filectx.linkrev() 620 622 node = log.node(log_rev) 621 623 else: # it will be a directory if there are matching entries
but I wonder how you manage to get an URL encoded path
in the first place.
It would help to get more information about what you're really doing. Maybe you could do the following:
-
tracext/hg/backend.py
613 613 614 614 def _init_path(self, log, path): 615 615 kind = None 616 617 import traceback 618 print '*** _init_path:', repr(path) 619 traceback.print_stack() 620 616 621 if path in self.manifest: # then it's a file 617 622 kind = Node.FILE 618 623 self.filectx = self.repos.repo.filectx(path, self.n)
In my test, I get the following output:
*** _init_path: 'the z file'
I attach my test repository as a bundle here, I'd be interested if you could reproduce the problem with that repo. If not, then please create a test repository that exhibits the problem and attach it to this as well.
Maybe you're using some kind of extension?
by , 16 years ago
Attachment: | test_whitespace_t8128.bundle added |
---|
small hg repository containing a file named "the z file"
comment:4 by , 16 years ago
Thanks for the stuff - I'll give it a try. Here's the background. We created a small test repository and a couple of files have spaces in their names. If I go to "Browse Source" then click on my "Filename with spaces" file (without my patch) then I get an error saying: "No node Filename%2520with%2520spaces at revision 5:4be7907d31b5".
I added debug yesterday in the area you suggest above and could see the following:
path= Filename%20with%20spaces manifest= flibble.h manifest= Filename with spaces manifest= flibble.c manifest= ISO-IEC 13818-1 (2000) (MPEG-2 Systems).pdf
clearly, the path variable contents are URL-ified but the manifest contents are not. My debug code didn't use repr() however, just straight "print" with stdio redirected to a file.
My fix (above) is actually insufficient because although it fixes the use case I've just described, you get the same error at the very next page if you click on the " Download in other formats → Original Format" link.
Looking at the source code, I expect a whole related range of failures if I were to start creating directories with spaces in their names, too.
As for what we're running, it's vanilla mercurial-plugin-0.11 with trac 0.11.3. We also added the textile egg and syntax colouring Pygments egg but these won't have anything to do with this issue.
I'll have a look at the stuff you've supplied and see what happens…
comment:5 by , 16 years ago
Sorry about the formatting above - I'm used to textile. (Would be helpful if there were an option to make that the default markup! ;)
by , 16 years ago
Attachment: | output.txt added |
---|
Output from suggested debug - also prints self.manifest for comparison
comment:6 by , 16 years ago
I've added the output from your suggested debug code. Note: I included an additional line:
print '*** self.manifest:', repr(self.manifest)
so that you can see why the 'in' comparison will always fail. To generate this output, I did my usual: click "Browse source" then click on the "Filename with spaces" file link.
I'll try with your bundle next.
comment:7 by , 16 years ago
Any chance you could use an archive format that isn't MacOS-specific? I'm using Linux here and can't find any way to unpack that bundle. Ta.
comment:8 by , 16 years ago
The previous .bundle was created with hg bundle -a -t gzip
, but that format might be not supported by your version of Mercurial, so the .bbundle one is now using the default bzip2 format.
Re-create the repository by using the following commands:
$ hg init test-rename $ cd test-rename $ hg unbundle ../test_whitespace_t8128.bbundle
But anyway, the problem is not with your or mine repository, but with the web front-end you're using.
By looking at output.txt, I see build/bdist.linux-x86_64/egg/flup/server/ajp_base.py
, what kind of deployment is this?
If you avoid using either trac/web/modpython_frontend.py or trac/web/wsgi.py, then the PATH_INFO won't be URL unquoted:
./modpython_frontend.py:0088: environ['PATH_INFO'] = urllib.unquote(request_uri[len(root_uri):]) -------------------------------- ./modpython_frontend.py: 1 match ./wsgi.py:0140: environ['PATH_INFO'] = urllib.unquote(path_info) ------------------ ./wsgi.py: 1 match
… which explains the problem you're seeing.
comment:9 by , 16 years ago
what kind of deployment is this?
I should admit to having had a hand in the installation. None of us are python experts, so we were following the instructions at http://trac.edgewall.org/wiki/TracOnRhel5 as closely as possible. We're running Trac on an AMD64 RHEL 5 server, using Apache as the web frontend, using flup as an AJP to WSGI gateway as described on the web page. Relevant versions are
- python 2.4.3-24
- httpd 2.2.3-22
- trac 0.11.3-1
- mercurial 1.1.2-1
- TracMercurial 0.11.0.7
- flup 1.0.1
- python-setuptools 0.6c6 (needed to be upgraded manually, required by flup)
comment:10 by , 16 years ago
Component: | plugin/mercurial → web frontend |
---|---|
Owner: | removed |
Ok, I've never tried something like that, but it's definitely missing the unquote
part, and not only for TracMercurial but in general. So expect the same kind of issues with Wiki page names, attachments, etc. containing spaces.
I see three possibilities:
- ask on the MailingList for help, maybe other people are using this kind of deployment (as a variant of this, you could also try to e-mail to the author of the TracOnRhel5 page, see the History link there)
- if I'm not mistaken, tracd can talk to
mod_proxy_ajp
, using the--protocol ajp
switch. It'll also use flup but provides it with a correct Trac middleware application which takes care of this unquote issue - try another, more "conventional", deployment, like TracModWsgi or TracModPython
follow-up: 12 comment:11 by , 16 years ago
Thanks for the advice - we're looking into this. It's tricky for us to sort out because the server isn't a dedicated Trac installation so there are other things (Apache, Mercurial, user configuration, Bugzilla, etc) that we need to be careful not to upset.
Any advice that anyone can offer will be more thank welcome - how would we contact the author of the TracOnRhel5 page? The email address appears to be obscured.
comment:12 by , 16 years ago
Replying to srevill@…:
… how would we contact the author of the TracOnRhel5 page? The email address appears to be obscured.
Silly me ;P) I'll forward it to you.
Besides, one low risk change you could try is to modify your trac_server_wrapper to call directly tracd
with the --protocal ajp
flag (and other relevant options like the port, see TracStandalone), instead of that ajp_to_wsgi_gateway
program.
comment:13 by , 16 years ago
I can report that if we run tracd in AJP mode and redirect URLs to it from Apache exactly as we did with the ajp_to_wsgi_gateway, then we get exactly the same problem with spaces in filenames, page names, milestone names etc. Perhaps that points to it being a bug in flup (since you said tracd —protocol=ajp uses flup too)?
As yet another approach, I've tried running tracd —protocol=http. This fixes the spaces, but is unusable because user authentication doesn't work. http://groups.google.com/group/trac-users/browse_thread/thread/a80be94fd53c9bbe/f8c52d2ce74d62c9?hl=en&ie=UTF-8&oe=utf-8&q=HTDigest%2C+mod_proxy_rewrite%2C+and+infinite+loops+on+login describes identical symptoms, but never seems to have been followed up on.
comment:14 by , 16 years ago
Sorry, the link above doesn't go straight to the post in question - search that page for "invalid nonce" and you'll find it.
comment:15 by , 16 years ago
Milestone: | → 0.11.4 |
---|---|
Owner: | set to |
Status: | new → assigned |
Summary: | TracMercurial viewer can't cope with spaces in filenames!! → tracd with flup can't cope with spaces in URLs |
Please try:
-
trac/web/standalone.py
23 23 import os 24 24 import sys 25 25 from SocketServer import ThreadingMixIn 26 import urllib 26 27 27 28 from trac import __version__ as VERSION 28 29 from trac.util import autoreload, daemon … … 70 71 return self.application(environ, start_response) 71 72 72 73 74 class FlupMiddleware(object): 75 76 def __init__(self, application): 77 self.application = application 78 79 def __call__(self, environ, start_response): 80 environ['PATH_INFO'] = urllib.unquote(environ.get('PATH_INFO', '')) 81 return self.application(environ, start_response) 82 83 73 84 class TracEnvironMiddleware(object): 74 85 75 86 def __init__(self, application, env_parent_dir, env_paths, single_env): … … 236 247 def serve(): 237 248 server_cls = __import__('flup.server.%s' % options.protocol, 238 249 None, None, ['']).WSGIServer 250 wsgi_app = FlupMiddleware(wsgi_app) 239 251 ret = server_cls(wsgi_app, bindAddress=server_address).run() 240 252 sys.exit(ret and 42 or 0) # if SIGHUP exit with status 42
I haven't tested myself, but that should fix the issue with whitespace.
Also, can you confirm that this tracd/ajp combination can do user authentication when tracd/http can't?
comment:16 by , 16 years ago
Keywords: | needinfo removed |
---|---|
Milestone: | 0.11.6 → 0.11.4 |
Eventually for 0.11.4, if positive feedback given in time ;-)
comment:17 by , 16 years ago
Wait, looks like there's something that can be done on the Apache side, see http://tomcat.apache.org/connectors-doc/webserver_howto/printer/apache.html#Forwarding:
JkOptions +ForwardURICompat
I'll write an updated patch which makes the FlupMiddleware conditional: -Q will prevent unquoting, like ajp-wsgi does.
by , 16 years ago
Attachment: | t8128-tracd-unquote-r7943.patch added |
---|
revised patch adding explicit unquoting
comment:18 by , 16 years ago
As you've figured out, the previous patch from comment:15 had an issue, which is fixed in t8128-tracd-unquote-r7943.patch.
Also, I kept the old behavior the default one, as I think it's quite possible that some other flup backends don't require an explicit unquoting on our side. For example, maybe it's not needed for the fcgi protocol and it's not needed for the older mod_jk.
The unquoting will happen only when the new --unquote
option is specified. and that seems to be needed when you have:
JkOptions +ForwardURIProxy
which is the default in the newer mod_jk.
comment:19 by , 16 years ago
I couldn't see any difference when I applied JkOptions +ForwardURICompat, but the updated patch works perfectly! Many thanks.
For reference, the AJP gateway described in the TracOnRhel5 page is different code and is unaffected by the patch. But I think we'll be more than happy to move over to using tracd.
comment:20 by , 16 years ago
The patch looks good and should be safe since it's not activated by default anyway.
I've committed it in [7953] to make sure it would make it into 0.11.4rc1.
Any more unresolved issues, or can we close this ticket now?
comment:21 by , 16 years ago
Keywords: | flup ajp wsgi added |
---|---|
Resolution: | → fixed |
Severity: | critical → major |
Status: | assigned → closed |
I think it's fine now, thanks for the review jonas.
It might be interesting to know for which other protocols the --unquote
flag is needed (scgi? fcgi?).
Ben & Steve, could one of you please update the TracOnRhel5 page so that it reflects your working setup based on tracd --protocol=ajp --unquote
, instead of (or in complement to) the ajp_to_wsgi_gateway approach described there?
follow-up: 23 comment:22 by , 16 years ago
Cc: | added |
---|
I ran into this using AJP and IIS 6.0. Is the patch applicable to v0.11.2.1? I'll try it out on Monday. The log seems to suggest it'll work.
comment:23 by , 16 years ago
Replying to nick.messenger@…: Yes, this fix should probably be also interesting for the IIS/ajp deployment. But the easiest thing for you should be to simply upgrade to 0.11.4rc1 (so far, no issues was found with the rc1, so it looks like 0.11.4 final has a good chance to stay the same as rc1).
Oops. Forgot to add my email address.