#5697 closed defect (worksforme)
Ephemeral wsgi.input stream makes it nearly impossible for plugins to change URL structure
Reported by: | Owned by: | Jonas Borgström | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | general | Version: | devel |
Severity: | normal | Keywords: | |
Cc: | daniel@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
See, for example, the TracForge project list.
Some information is lost as req.args is constructed, and all of that information needs to be reassembled exactly as-is for, e.g. file upload (which uses a multipart-formdata request), even though the plugin may not even be interested in touching it (as in the case of TracForge). Python gives us no facility for undoing the parsing of cgi.FieldStorage
, so anyone attempting this sort of hook will have to repeat an extremely complicated series of steps; it would be better to capture the request in a string that can be reused.
The following simple patch remedies the problem:
-
api.py
old new 434 434 args = _RequestArgs() 435 435 436 436 fp = self.environ['wsgi.input'] 437 # We need to be able to read the data more than once. 438 from StringIO import StringIO 439 data = fp.read() 440 fp = StringIO(data) 441 self.environ['wsgi.input'] = StringIO(data) 437 442 438 443 # Avoid letting cgi.FieldStorage consume the input stream when the 439 444 # request does not contain form data
Attachments (0)
Change History (12)
follow-up: 2 comment:1 by , 17 years ago
comment:2 by , 17 years ago
Cc: | added |
---|
Replying to Graham.Dumpleton@gmail.com:
I don't know the problem you are trying to solve,
The problem of being able to, in a plugin, modify dispatching of URLs without completely taking it over. In other words, I need to be able to change some details of the Request and send it back to Trac for processing.
but a few comments.
First note that the result of calling read() on 'wsgi.input' with no argument will have an undefined result. For some WSGI adapters it will return all data from the request, but on others it will hang indefinitely if HTTP/1.1 keep alive is enabled.
Yes, I discovered as much, the hard way, a few days ago. The patch was actually produced by a colleague of mine and I overlooked that problem.
When calling read() you should always be using the CONTENT_LENGTH from the WSGI environment. Ie.,
clen = int(environ.get('CONTENT_LENGTH', '0')) data = fp.read(clen)
Right.
That said, one shouldn't be doing what you are wanting to do anyway
One shouldn't want to write plugins that alter the URL dispatching structure? I think you should take that up with Noah Kantrowitz, then. TracForge is his plugin; I'm just trying to figure out how to make it work with 0.11.
as it will have the result when doing POST with very large input files of reading the complete files into memory. If you are trying to upload files which are GBs in size that would be really bad.
That's true; that patch is not the best approach. One alternative approach builds a new compound "File-like" stream object that is the lazy concattenation of many smaller streams, including the ones held in any FieldStorage
objects that remain in the stream.
comment:3 by , 17 years ago
Here's an even simpler and better approach:
-
trac/web/api.py
152 152 '_inheaders': Request._parse_headers 153 153 } 154 154 155 if 'parsed_args' in self.environ: 156 self.args = self.environ['parsed_args'] 157 155 158 self.base_url = self.environ.get('trac.base_url') 156 159 if not self.base_url: 157 160 self.base_url = self._reconstruct_url() … … 461 464 else: 462 465 args[name] = value 463 466 467 self.environ['parsed_args'] = args 464 468 return args 465 469 466 470 def _parse_cookies(self):
follow-up: 8 comment:4 by , 17 years ago
Does the parsed_args
name correspond to some sort of standard? If not, we should probably name this WSGI environment variable trac.parsed_args
or something. Also, adding a comment before the if
about the use cases for the variable would be helpful.
follow-up: 6 comment:5 by , 17 years ago
cmlenz and I agreed that this should be worked around in TracForge, not Trac. I believe I have a viable mechanism to do it, just need to test it.
follow-up: 7 comment:6 by , 17 years ago
Replying to nkantrowitz:
cmlenz and I agreed that this should be worked around in TracForge, not Trac.
The problem AFAICT is that the current design of Trac makes it nearly impossible to do that in a plugin. The only reasons I can see for not changing Trac in this case are:
- It's not only possible, but actually easy to do in a plugin
- This sort of thing is considered evil and Trac does not want to support plugins doing it.
I believe I have a viable mechanism to do it, just need to test it.
Well then, I'm eager to see the details, because we tried lots of approaches to doing it in TracForge, none of which ultimately worked. And the solution ought to be prominently documented (unless of course Trac doesn't want to support plugins doing this) because of its non-obviousness.
comment:7 by , 17 years ago
Replying to Dave Abrahams <dave@boost-consulting.com>:
Replying to nkantrowitz:
cmlenz and I agreed that this should be worked around in TracForge, not Trac.
The problem AFAICT is that the current design of Trac makes it nearly impossible to do that in a plugin. The only reasons I can see for not changing Trac in this case are:
- It's not only possible, but actually easy to do in a plugin
- This sort of thing is considered evil and Trac does not want to support plugins doing it.
I believe I have a viable mechanism to do it, just need to test it.
Well then, I'm eager to see the details, because we tried lots of approaches to doing it in TracForge, none of which ultimately worked. And the solution ought to be prominently documented (unless of course Trac doesn't want to support plugins doing this) because of its non-obviousness.
Its not so much not wanting to support it as that this kind of WSGI re-serving is clearly a special case. The workaround I devised is to do the secondary WSGI request in a pre-request filter. req.args
is evaluated lazily, and as long as nothing touches it you should be fine.
comment:8 by , 17 years ago
Replying to cboos:
Does the
parsed_args
name correspond to some sort of standard?
Not that we know of :-)
If not, we should probably name this WSGI environment variable
trac.parsed_args
or something.
Fine with us.
Also, adding a comment before the
if
about the use cases for the variable would be helpful.
If nkantrowitz' idea doesn't pan out, we'll be happy to submit a new patch, but right now it sounds like you guys want to use his approach, whatever that is.
comment:9 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
CodeRanger says this is now handled properly in TracForge. It was broken "because the sub-request was being processed much later in the cycle."
comment:10 by , 17 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:11 by , 17 years ago
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
comment:12 by , 17 years ago
For the record, nothing was changed specifically for this. Just running the sub-request in the match_request
was enough.
I don't know the problem you are trying to solve, but a few comments.
First note that the result of calling read() on 'wsgi.input' with no argument will have an undefined result. For some WSGI adapters it will return all data from the request, but on others it will hang indefinitely if HTTP/1.1 keep alive is enabled.
When calling read() you should always be using the CONTENT_LENGTH from the WSGI environment. Ie.,
That said, one shouldn't be doing what you are wanting to do anyway as it will have the result when doing POST with very large input files of reading the complete files into memory. If you are trying to upload files which are GBs in size that would be really bad.