Opened 18 years ago
Closed 15 years ago
#3663 closed defect (fixed)
Chinese folder or file name error
Reported by: | Owned by: | Christian Boos | |
---|---|---|---|
Priority: | normal | Milestone: | 0.12 |
Component: | web frontend/mod_python | Version: | devel |
Severity: | normal | Keywords: | apache cgi unicode review |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description (last modified by )
i clicked folder of file(encoding:utf-8) trac report this message:
Traceback (most recent call last): File "C:\Python23\Lib\site-packages\trac\web\main.py", line 335, in dispatch_request dispatcher.dispatch(req) File "C:\Python23\Lib\site-packages\trac\web\main.py", line 167, in dispatch if not req.path_info or req.path_info == '/': File "C:\Python23\lib\site-packages\trac\web\api.py", line 193, in <lambda> path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8'), UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 17: unexpected code byte
Attachments (5)
Change History (38)
comment:1 by , 18 years ago
Keywords: | unicode needinfo added |
---|---|
Owner: | changed from | to
comment:3 by , 18 years ago
and a file save as ANSI Encoding,when browse whill raise a encoding error.
ÎÒÊǺÃÈË£¬²»ÊÇ»µÈË¡£
see picture a3
follow-up: 5 comment:4 by , 18 years ago
I've tested the same filename found in the screenshot with FastCGI and tracd and it works for me. Be sure you're using the latest Trac source.
What web server are you using? Also, which frontend are you using with Trac: FastCGI, CGI, mod_python, or something else? Normally I would expect to see another Trac module above dispatch_request
in the stack where it was called from, but that's not there in this traceback.
comment:5 by , 18 years ago
Replying to mgood:
I've tested the same filename found in the screenshot with FastCGI and tracd and it works for me. Be sure you're using the latest Trac source.
What web server are you using? Also, which frontend are you using with Trac: FastCGI, CGI, mod_python, or something else? Normally I would expect to see another Trac module above
dispatch_request
in the stack where it was called from, but that's not there in this traceback.
Apache 2.0.59(Win32) SVN 1.3.2 DAV 2 CGI IE 6
follow-up: 10 comment:9 by , 18 years ago
Description: | modified (diff) |
---|---|
Keywords: | apache cgi win32 added; chinese removed |
Milestone: | 0.10 → 0.10.1 |
Priority: | high → normal |
Ok, so according to attachment:a2.PNG, the request was for path "vendor/试试中文.txt" (url = u'vendor/\u8bd5\u8bd5\u4e2d\u6587'
), but then, the PATH_INFO
CGI env variable contained something that had a byte 0xa8 in position 16…
I couldn't find an encoding that would actually encode the above url
to this, so I'm left wondering what to do here…
Perhaps you could add a print `self.environ.get('PATH_INFO', '')`
statement in trac.web.api.py
before line 193, in order to see the full sequence of bytes. From that, perhaps we could guess what Apache is doing.
Also, this issue is possibly related to: http://twiki.org/cgi-bin/view/Codev.ApacheTwoBreaksNonUTF8EncodedURLsOnWindows
In any case, this is not going to block 0.10…
comment:10 by , 18 years ago
Replying to cboos:
…
url = u'vendor/\u8bd5\u8bd5\u4e2d\u6587'
), but then, thePATH_INFO
CGI env variable contained something that had a byte 0xa8 in position 16…
Sorry, I meant url = u'/browser/vendor/\u8bd5\u8bd5\u4e2d\u6587'
, so that position 16 effectively corresponds to the first non-ascii character.
comment:12 by , 18 years ago
Keywords: | windows added; win32 removed |
---|
follow-up: 16 comment:13 by , 18 years ago
Hello all. I had the same problem. I have windows server 2003 (Russian locale), apache-2.0.52, trac-0.10.3 (is running as cgi), python-2.4.1.
I couldn't submit and view(get) file with filenames were consist of russian characters (Attachements). I couldn't get the same files from svn repository through trac (Browse Source).
I decided the problem the following way. I changed the trac/web/api.py file. Need change line 219: path_info = property(fget=lambda self: self.environ.get('PATH_INFO', ).decode('utf-8'), To: path_info = property(fget=lambda self: self.environ.get('PATH_INFO', ).decode('mbcs'),
It's all. Best regards, Alyabushev Alexander alexhalf@…
comment:14 by , 18 years ago
#4003 closed as duplicate. The reporter of that ticket said that he tried to "update from encode('utf-8') to encode('mbcs') in api.py file" without success (btw, that was a few months before comment:13, so I wonder where that advice came from, initially).
comment:16 by , 17 years ago
#6372 was closed as duplicate.
This is the suggested fix: Replying to alexhalf@gmail.com:
… I decided the problem the following way. I changed the trac/web/api.py file. Need change line 219:
path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8'),
To:
path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('mbcs'),
Please try it out. If that works, then also try to create/view wiki pages containing unicode characters, to check if they are still working with that change.
comment:17 by , 17 years ago
Component: | version control/browser → web frontend/mod_python |
---|---|
Keywords: | windows removed |
Milestone: | 0.10.6 → 0.11.1 |
#7368 was closed as duplicate. That one is the first to be reported for a non-Windows platform.
Also that one is about mod_wsgi and not mod_python, but I think the root cause must be the same.
comment:18 by , 16 years ago
Keywords: | verify added; needinfo removed |
---|---|
Milestone: | 0.11.2 → 0.11.3 |
Still worth investigating/fixing one day, keeping on the -stable TODO list.
follow-up: 21 comment:19 by , 15 years ago
Keywords: | review added; verify removed |
---|---|
Milestone: | next-minor-0.12.x → 0.11.7 |
Status: | new → assigned |
As discussed in #9077, I've seen a few similar errors in the trac.log of the demo-0.11 Trac instace:
2010-02-04 08:21:54,829 Trac[main] ERROR: Internal Server Error: Traceback (most recent call last): File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/web/main.py", line 450, in _dispatch_request dispatcher.dispatch(req) File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/web/main.py", line 167, in dispatch if handler.match_request(req): File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/wiki/intertrac.py", line 39, in match_request match = re.match(r'^/intertrac/(.*)', req.path_info) File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/web/api.py", line 208, in <lambda> path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8'), File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 1: unexpected code byte ... 2010-02-04 08:45:14,858 Trac[main] ERROR: Internal Server Error: Traceback (most recent call last): File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/web/main.py", line 450, in _dispatch_request dispatcher.dispatch(req) File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/web/main.py", line 167, in dispatch if handler.match_request(req): File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/wiki/intertrac.py", line 39, in match_request match = re.match(r'^/intertrac/(.*)', req.path_info) File "/usr/local/virtualenv-0.11/lib/python2.5/site-packages/Trac-0.11.7stable_r8997-py2.5.egg/trac/web/api.py", line 208, in <lambda> path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8'), File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 1: unexpected code byte
We should fallback to .decode('latin1') if .decode('utf-8') fails, i.e. actually use to_unicode()
here.
Reproducing the error is actually easy:
>chcp Active code page: 437 >wget http://localhost:8000/devel-0.11/wiki/été --2010-03-03 15:54:56-- http://localhost:8000/devel-0.11/wiki/%E9t%E9 Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:8000... connected. HTTP request sent, awaiting response... 500 Internal Server Error 2010-03-03 15:54:56 ERROR 500: Internal Server Error.
The backtrace corresponding to the above:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 6-8: invalid data (7668) Trac[main] ERROR: Internal Server Error: Traceback (most recent call last): File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\main.py", line 450, in _dispatch_request dispatcher.dispatch(req) File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\main.py", line 167, in dispatch if handler.match_request(req): File "C:\Workspace\src\trac\repos\0.11-stable\trac\wiki\intertrac.py", line 39, in match_request match = re.match(r'^/intertrac/(.*)', req.path_info) File "C:\Workspace\src\trac\repos\0.11-stable\trac\web\api.py", line 208, in <lambda> path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8'), File "C:\Dev\Python261\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: ('utf8', '/wiki/\xe9t\xe9', 6, 9, 'invalid data')
Patch:
-
trac/web/api.py
28 28 from trac.core import Interface, TracError 29 29 from trac.util import get_last_traceback, md5, unquote 30 30 from trac.util.datefmt import http_date, localtz 31 from trac.util.text import to_unicode 31 32 from trac.web.href import Href 32 33 from trac.web.wsgi import _FileWrapper 33 34 … … 205 206 206 207 method = property(fget=lambda self: self.environ['REQUEST_METHOD'], 207 208 doc='The HTTP method of the request') 208 path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8'), 209 path_info = property(fget=(lambda self: 210 to_unicode(self.environ.get('PATH_INFO', ''))), 209 211 doc='Path inside the application') 210 212 query_string = property(fget=lambda self: self.environ.get('QUERY_STRING', 211 213 ''),
With the above patch, the same wget
doesn't trigger an internal error anymore:
>wget http://localhost:8000/devel-0.11/wiki/été --2010-03-03 15:53:37-- http://localhost:8000/devel-0.11/wiki/%E9t%E9 Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:8000... connected. HTTP request sent, awaiting response... 404 Not Found 2010-03-03 15:53:39 ERROR 404: Not Found.
comment:20 by , 15 years ago
All tests pass.
-
trac/tests/functional/testcases.py
1 # -*- encoding: utf-8 -*- 1 2 #!/usr/bin/python 2 3 import os 3 4 from subprocess import call … … 154 155 tc.notfind('Second Attachment') 155 156 156 157 158 class RegressionTestTicket3663(FunctionalTwillTestCaseSetup): 159 def runTest(self): 160 """Regression test for non-UTF-8 PATH_INFO (#3663)""" 161 self._tester.logout() # as we start logged in as 'admin' 162 self._tester.go_to_wiki('\xe9t\xe9') # PATH_INFO 'wiki/été' in latin1 163 tc.code(404) # page not found (no WIKI_CREATE) 164 self._tester.login('admin') # the other tests assume 'admin' 165 166 157 167 def functionalSuite(): 158 168 suite = FunctionalTestSuite() 159 169 return suite … … 168 178 suite.addTest(RegressionTestTicket3833c()) 169 179 suite.addTest(RegressionTestTicket5572()) 170 180 suite.addTest(RegressionTestTicket7209()) 181 suite.addTest(RegressionTestTicket3663()) 171 182 172 183 import trac.versioncontrol.tests 173 184 trac.versioncontrol.tests.functionalSuite(suite)
follow-up: 22 comment:21 by , 15 years ago
Replying to cboos:
With the above patch, the same
wget
doesn't trigger an internal error anymore:
The test fails in the XHTML validation on my box, because lxml cannot parse the resulting page (it probably contains the \xe9
characters as-is, which cannot be converted from utf-8). If a disable XHTML validation, the test passes.
(BTW, the test case should probably have the .login('admin')
in a finally:
clause, otherwise a failure of the test case will make the following test cases fail due to "collateral damage".)
I'm unsure what the goal of the patch is:
- Don't throw a 500 error. Then we could just change the utf-8 conversion to "replace" instead of the default "strict". Your test case passes with this change.
-
trac/web/api.py
205 205 206 206 method = property(fget=lambda self: self.environ['REQUEST_METHOD'], 207 207 doc='The HTTP method of the request') 208 path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8' ),208 path_info = property(fget=lambda self: self.environ.get('PATH_INFO', '').decode('utf-8', 'replace'), 209 209 doc='Path inside the application') 210 210 query_string = property(fget=lambda self: self.environ.get('QUERY_STRING', 211 211 ''),
-
- Make the system work with iso-8859-1 encoded URLs as well as utf-8 encoded URLs. Then we should be able to create a wiki page named "été" using an utf-8 URL, then retrieve it correctly using an iso-8859-1 URL. I tried this and it doesn't work (it should have been obvious, but I tried anyway).
It seems that RFC 3986 mandates utf-8 encoding for new URI schemes, so the current behavior is arguably correct (except if URLs cannot be considered a "new URI scheme", but in that case the expected behavior is undefined).
I don't think using a "fallback" encoding will make the system work in all cases, as I'm sure we can find iso-8859-1 encoded strings that are also valid utf-8 strings. So the URL will be ambiguous, and even the best guess won't make it work.
So I wonder if the best solution would be to keep a single encoding, but to make it configurable (e.g. through an environment variable). Everything seems to work for most people by using utf-8, and the few people who have reported that changing the encoding to mbcs fixed the issue can do so.
follow-up: 23 comment:22 by , 15 years ago
Replying to rblank:
Replying to cboos:
With the above patch, the same
wget
doesn't trigger an internal error anymore:The test fails in the XHTML validation on my box, because lxml cannot parse the resulting page (it probably contains the
\xe9
characters as-is, which cannot be converted from utf-8). If a disable XHTML validation, the test passes.
Strange, for me the validation passes, and no, the resulting page should not contain \xe9
, see below.
(BTW, the test case should probably have the
.login('admin')
in afinally:
clause, otherwise a failure of the test case will make the following test cases fail due to "collateral damage".)
OK.
I'm unsure what the goal of the patch is:
- Don't throw a 500 error. …
- Make the system work with iso-8859-1 encoded URLs as well as utf-8 encoded URLs. Then we should be able to create a wiki page named "été" using an utf-8 URL, then retrieve it correctly using an iso-8859-1 URL. I tried this and it doesn't work (it should have been obvious, but I tried anyway).
Both, and for the second point, it's not obvious it doesn't work, given that for me it does!
After creating the "été" page via the browser (the normal way):
C:\Workspace\src\trac\repos\0.11-stable>wget http://localhost:8000/devel-0.11/wiki/été --2010-03-04 07:21:29-- http://localhost:8000/devel-0.11/wiki/%E9t%E9 Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:8000... connected. HTTP request sent, awaiting response... 200 OK Length: 4974 (4.9K) [text/html] Saving to: `été' 100%[==============================================================================>] 4,974 --.-K/s in 0s 2010-03-04 07:21:29 (13.5 MB/s) - `été' saved [4974/4974]
And on the server side:
127.0.0.1 - - [04/Mar/2010 07:21:29] "GET /devel-0.11/wiki/%E9t%E9 HTTP/1.0" 200 -
It seems that RFC 3986 mandates utf-8 encoding for new URI schemes, so the current behavior is arguably correct (except if URLs cannot be considered a "new URI scheme", but in that case the expected behavior is undefined).
OK.
I don't think using a "fallback" encoding will make the system work in all cases, as I'm sure we can find iso-8859-1 encoded strings that are also valid utf-8 strings. So the URL will be ambiguous, and even the best guess won't make it work.
Well, there's certainly an overlap given that any sequence of bytes is valid latin1, so by first attempting a decoding from utf-8, and if this fails, a decoding to latin1, we're guaranteed to always end up with unicode, "valid" unicode if the actual encoding was UTF-8 or latin1, "garbage" unicode (i.e. an non-meaningful sequence of unicode code points from the U+0000 - U+00FF range) if the original encoding was anything else. In that latter case, we avoid at least 1.
This is the typical use case for to_unicode
… or rather, what I thought to_unicode
would do. Looking again at the code, I see that the we now use locale.getpreferredencoding()
as the fallback, and this explains why it worked for me but not for you. This was introduced part of r3569 and in retrospect I think this is wrong, we should have kept 'iso-8859-15' or rather 'iso-8859-1', as using locale always introduces variance in the results (#6953, #8951).
So I wonder if the best solution would be to keep a single encoding, but to make it configurable (e.g. through an environment variable). Everything seems to work for most people by using utf-8, and the few people who have reported that changing the encoding to mbcs fixed the issue can do so.
Now this would even be a third goal: if some servers always deliver PATH_INFO according to a local encoding (I've seen some hints this could be the case for IIS), then giving the possibility to configure the encoding as the first encoding to try would be useful.
I'll come up with some updated patches, with the minimal change from locale.getpreferredencoding()
to 'latin1'
in to_unicode
for 0.11 and some more changes for 0.12. Or we can simply leaves things as they are on 0.11-stable and move the whole thing to 0.12.
follow-up: 24 comment:23 by , 15 years ago
Replying to cboos:
Strange, for me the validation passes, and no, the resulting page should not contain
\xe9
, see below.
I'll test again with your updated patches. I may have made a mistake somewhere.
Both, and for the second point, it's not obvious it doesn't work, given that for me it does!
Well that's the whole point: it depends on the encoding used by the client, which is not communicated to the server. If I use a client that encodes URLs as utf-8 to create the page, and a client that encodes as iso-8859-1 to retrieve it, the latter will fail with a 404 code. That's the test that failed for me.
I'll come up with some updated patches, with the minimal change from
locale.getpreferredencoding()
to'latin1'
into_unicode
for 0.11 and some more changes for 0.12. Or we can simply leaves things as they are on 0.11-stable and move the whole thing to 0.12.
I'm still not convinced that trying two encodings is the right thing to do, and considering this is becoming non-trivial, I would push (gently :) for 0.12.
follow-up: 25 comment:24 by , 15 years ago
Milestone: | 0.11.7 → 0.12 |
---|
Replying to rblank:
it depends on the encoding used by the client, which is not communicated to the server. If I use a client that encodes URLs as utf-8 to create the page, and a client that encodes as iso-8859-1 to retrieve it, the latter will fail with a 404 code.
- client 1, utf-8 → /wiki/%c3%a9t%c3%a9, PATH_INFO → '\xc3\xa9t\xc3\xa9',
to_unicode(self.environ('PATH_INFO'))
→ u'été' - client 2, latin1 → /wiki/%e9t%e9, PATH_INFO is '\xe9t\xe9',
to_unicode(self.environ('PATH_INFO'))
→ u'été' as well… providedlocale.getpreferredencoding()
returns'latin1'
which is not guaranteed of course
Hence:
… with the minimal change from
locale.getpreferredencoding()
to'latin1'
into_unicode
I'm still not convinced that trying two encodings is the right thing to do, and considering this is becoming non-trivial, I would push (gently :) for 0.12.
What do you mean, in this specific case or in general, the way to_unicode
works?
To me, trying an utf-8 decoding first then a latin1 decoding if the former fails is a better approach than an utf-8 decoding in replace mode, as it's a lossless conversion from str
to unicode
. In replace mode, you may well end up with u'??????' strings and therefore have no clue about what the original input was. It's also fast, as the second encoding is a direct mapping.
As for 0.12, sure, as that ticket has already slipped through all the minor milestones since 0.10, one more is no big deal ;-)
See t3663-more-robust-PATH_INFO-decoding-r9325.diff, which is a bit more involved and definitely not for 0.11.7.
by , 15 years ago
Attachment: | t3663-more-robust-PATH_INFO-decoding-r9325.diff added |
---|
web: don't assume PATH_INFO is always UTF-8 encoded.
follow-up: 26 comment:25 by , 15 years ago
Replying to cboos:
- client 1, utf-8 → /wiki/%c3%a9t%c3%a9, PATH_INFO → '\xc3\xa9t\xc3\xa9',
to_unicode(self.environ('PATH_INFO'))
→ u'été'- client 2, latin1 → /wiki/%e9t%e9, PATH_INFO is '\xe9t\xe9',
to_unicode(self.environ('PATH_INFO'))
→ u'été' as well… providedlocale.getpreferredencoding()
returns'latin1'
which is not guaranteed of course
- client 3, iso-8859-1, trying to access the page "ÃtÃ"
→ /wiki/%c3%a9t%c3%a9, PATH_INFO → '\xc3\xa9t\xc3\xa9',
to_unicode(self.environ('PATH_INFO'))
→ u'été'
That is, not the page intended. The issue is probably worse with other encodings, which result in valid utf-8 strings.
What do you mean, in this specific case or in general, the way
to_unicode
works?
In this specific case (PATH_INFO
).
To me, trying an utf-8 decoding first then a latin1 decoding if the former fails is a better approach than an utf-8 decoding in replace mode, as it's a lossless conversion from
str
tounicode
.
We're trying to guess the encoding used by the client, but only in the case where the string is an invalid utf-8 string. If the string is valid utf-8, the result is wrong. So I don't see much gain compared to "replace" mode, which gets it wrong every time it must guess. We always will get it wrong in some cases as soon as the client encoding is not utf-8.
follow-up: 27 comment:26 by , 15 years ago
Replying to rblank:
- client 3, iso-8859-1, trying to access the page "ÃtÃ"
→ /wiki/%c3%a9t%c3%a9, PATH_INFO → '\xc3\xa9t\xc3\xa9',
to_unicode(self.environ('PATH_INFO'))
→ u'été'
I see, but this of course will "fail" in all cases, even when using 'replace' mode.
We're trying to guess the encoding used by the client, but only in the case where the string is an invalid utf-8 string. If the string is valid utf-8, the result is wrong. So I don't see much gain compared to "replace" mode, which gets it wrong every time it must guess.
Well, admittedly the gain is very marginal, being able to "see" the original byte sequence through the resulting unicode object, which might help for debugging in contrast with '?????', for which we will only be able to tell "the encoding was not UTF-8". The fact that (some) latin1 URLs would work is only a side-effect, and not a justification for the change, I agree. I can modify the test to reflect that, as currently it gives the impression that it's a feature ;-)
follow-up: 28 comment:27 by , 15 years ago
Replying to cboos:
Well, admittedly the gain is very marginal, being able to "see" the original byte sequence through the resulting unicode object, which might help for debugging (…)
Slightly different idea: how about doing the conversion from utf-8 in strict mode, catching the UnicodeDecodeError
and raising a TracError
with a good error message describing the cause of the error and possible solutions (if there are any)?
follow-up: 29 comment:28 by , 15 years ago
Replying to rblank:
Replying to cboos:
Well, admittedly the gain is very marginal, being able to "see" the original byte sequence through the resulting unicode object, which might help for debugging (…)
Slightly different idea: how about doing the conversion from utf-8 in strict mode, catching the
UnicodeDecodeError
and raising aTracError
with a good error message describing the cause of the error and possible solutions (if there are any)?
Good suggestion, I'll try to do something along those lines. The hint could be something like:
Your web browser is trying to access non-ascii URLs using an encoding which is not the one expected by the server (%(encoding)s).
Followed by, depending on the value of the server encoding
:
- utf-8:
Please use a more standard compliant browser. If you're already using a standard browser, something wrong is happening on the server side, please contact your site administrator, who perhaps need to adapt the server side encoding.
- not utf-8:
That encoding doesn't seem to be working in all cases. Please contact your site administrator.
But maybe this starts to be too complex, so perhaps the simpler version would also do:
try: encoding = self.pathinfo_encoding or 'utf-8' pathinfo = unicode(path_info, encoding) except UnicodeDecodeError, e: raise TracError(_("Invalid URL encoding (was %(pathinfo)r, " "server expected %(encoding)s)", pathinfo=pathinfo, encoding=encoding))
comment:29 by , 15 years ago
Replying to cboos:
But maybe this starts to be too complex, so perhaps the simpler version would also do:
Yes, that looks good.
comment:30 by , 15 years ago
Perhaps related?
2010-03-15 06:57:23,481 Trac[main] ERROR: Internal Server Error: Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/trac/web/main.py", line 504, in _dispatch_request dispatcher.dispatch(req) File "build/bdist.linux-x86_64/egg/trac/web/main.py", line 209, in dispatch req.redirect(req.href + target, permanent=True) File "build/bdist.linux-x86_64/egg/trac/web/api.py", line 336, in redirect self.send_header('Location', url) File "build/bdist.linux-x86_64/egg/trac/web/api.py", line 274, in send_header self._outheaders.append((name, unicode(value).encode('utf-8'))) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 25: ordinal not in range(128)
for:
66.249.65.35 trac.edgewall.org - [15/Mar/2010:06:57:34 +0100] "GET /%E3%80%82%E5%AE%83%E9%9B%86%E6%88%90%E4%BA%86%E5%A2%9E%E5%BC%BA%E7%9A%84Wiki%E5%8A%9F%E8%83%BD%E5%92%8C%E7%89%88%E6%9C%AC%E6%8E%A7%E5%88%B6%E5%8A%9F%E8%83%BD%EF%BC%8C%E5%B9%B6%E5%8F%AF%E9%80%9A%E8%BF%87%E6%8F%92%E4%BB%B6%E6%89%A9%E5%B1%95%E5%85%B6%E5%8A%9F%E8%83%BD%E3%80%82%E7%94%B1%E4%BA%8E%E6%8F%92%E4%BB%B6%E4%BC%97%E5%A4%9A%E3%80%81%E5%8A%9F%E8%83%BD%E5%85%A8%E9%9D%A2%EF%BC%8C%E7%94%9A%E8%87%B3%E5%8F%AF%E4%BB%A5%E4%B8%8E%E5%BE%88%E5%A4%9A%E5%95%86%E4%B8%9A%E7%9A%84CMS%E7%B3%BB%E7%BB%9F%E5%AA%B2%E7%BE%8E%EF%BC%8C%E5%9B%A0%E6%AD%A4%E5%BA%94%E7%94%A8%E4%B9%9F%E6%97%A5%E7%9B%8A%E5%B9%BF%E6%B3%9B%E3%80%82%E5%AE%83%E7%9A%84ticket%E7%AE%A1%E7%90%86%E5%8F%8A%E5%B7%A5%E4%BD%9C%E6%B5%81%E6%8F%92%E4%BB%B6(http://trac-hacks.org/ HTTP/1.1" 500 9243 "-" "Mediapartners-Google"
comment:31 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Trying to support non-UTF-8 encodings for the PATH_INFO lead to other errors (notably for redirects). I think that it's not worth the trouble to try to support this kind of server setup (which might even be mis-configured ones from the start), so I gave up on the path_info_encoding
setting and simply report the error in r9377.
I tried to address the other problem with non-UTF-8 URLs reported in comment:30, but I couldn't reproduce the traceback involving send_header
. I nevertheless fixed the other traceback I got at this occasion, in [9380-9381].
Some additional code cleanups for source:trunk/trac/web/api.py were done in r9382.
OTOH, I still wanted to do the simplifications to to_unicode
contained in t3663-more-robust-PATH_INFO-decoding-r9325.diff so I committed them separately in r9383.
comment:32 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I now get a failing functional test, in lxml
:
====================================================================== ERROR: Regression test for non-UTF-8 PATH_INFO (#3663) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/joe/src/trac/trunk/trac/tests/functional/testcases.py", line 171, in runTest self._tester.go_to_wiki(u'été'.encode('latin1')) File "/home/joe/src/trac/trunk/trac/tests/functional/tester.py", line 115, in go_to_wiki tc.go(wiki_url) File "/usr/lib/python2.6/site-packages/twill/commands.py", line 112, in go browser.go(url) File "/usr/lib/python2.6/site-packages/twill/browser.py", line 113, in go self._journey('open', u) File "/usr/lib/python2.6/site-packages/twill/browser.py", line 549, in _journey callable(func_name, *args, **kwargs) File "/home/joe/src/trac/trunk/trac/tests/functional/better_twill.py", line 104, in _validate_xhtml etree.parse(StringIO(page), base_url=b.get_url()) File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1518, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71971) File "apihelpers.pxi", line 1381, in lxml.etree._encodeFilenameUTF8 (src/lxml/lxml.etree.c:21091) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 27-29: invalid data
Some relevant data: Linux, de_CH.utf8 locale, lxml 2.2.6.
comment:33 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Can you please dump the byte sequence corresponding to the UTF-8 encoding of that path, so that I can try to reproduce the issue?