Edgewall Software
Modify

Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#5697 closed defect (worksforme)

Ephemeral wsgi.input stream makes it nearly impossible for plugins to change URL structure

Reported by: Dave Abrahams <dave@…> Owned by: jonas
Priority: normal Milestone:
Component: general Version: devel
Severity: normal Keywords:
Cc: daniel@…
Release Notes:
API Changes:

Description

See, for example, the TracForge project list.

Some information is lost as req.args is constructed, and all of that information needs to be reassembled exactly as-is for, e.g. file upload (which uses a multipart-formdata request), even though the plugin may not even be interested in touching it (as in the case of TracForge). Python gives us no facility for undoing the parsing of cgi.FieldStorage, so anyone attempting this sort of hook will have to repeat an extremely complicated series of steps; it would be better to capture the request in a string that can be reused.

The following simple patch remedies the problem:

  • api.py

    old new  
    434434        args = _RequestArgs()
    435435
    436436        fp = self.environ['wsgi.input']
     437        # We need to be able to read the data more than once.
     438        from StringIO import StringIO
     439        data = fp.read()
     440        fp = StringIO(data)
     441        self.environ['wsgi.input'] = StringIO(data)
    437442
    438443        # Avoid letting cgi.FieldStorage consume the input stream when the
    439444        # request does not contain form data

Attachments (0)

Change History (12)

comment:1 follow-up: Changed 7 years ago by Graham.Dumpleton@…

I don't know the problem you are trying to solve, but a few comments.

First note that the result of calling read() on 'wsgi.input' with no argument will have an undefined result. For some WSGI adapters it will return all data from the request, but on others it will hang indefinitely if HTTP/1.1 keep alive is enabled.

When calling read() you should always be using the CONTENT_LENGTH from the WSGI environment. Ie.,

clen = int(environ.get('CONTENT_LENGTH', '0'))
data = fp.read(clen)

That said, one shouldn't be doing what you are wanting to do anyway as it will have the result when doing POST with very large input files of reading the complete files into memory. If you are trying to upload files which are GBs in size that would be really bad.

comment:2 in reply to: ↑ 1 Changed 7 years ago by Dave Abrahams <dave@…>

  • Cc daniel@… added

Replying to Graham.Dumpleton@gmail.com:

I don't know the problem you are trying to solve,

The problem of being able to, in a plugin, modify dispatching of URLs without completely taking it over. In other words, I need to be able to change some details of the Request and send it back to Trac for processing.

but a few comments.

First note that the result of calling read() on 'wsgi.input' with no argument will have an undefined result. For some WSGI adapters it will return all data from the request, but on others it will hang indefinitely if HTTP/1.1 keep alive is enabled.

Yes, I discovered as much, the hard way, a few days ago. The patch was actually produced by a colleague of mine and I overlooked that problem.

When calling read() you should always be using the CONTENT_LENGTH from the WSGI environment. Ie.,

clen = int(environ.get('CONTENT_LENGTH', '0'))
data = fp.read(clen)

Right.

That said, one shouldn't be doing what you are wanting to do anyway

One shouldn't want to write plugins that alter the URL dispatching structure? I think you should take that up with Noah Kantrowitz, then. TracForge is his plugin; I'm just trying to figure out how to make it work with 0.11.

as it will have the result when doing POST with very large input files of reading the complete files into memory. If you are trying to upload files which are GBs in size that would be really bad.

That's true; that patch is not the best approach. One alternative approach builds a new compound "File-like" stream object that is the lazy concattenation of many smaller streams, including the ones held in any FieldStorage objects that remain in the stream.

comment:3 Changed 7 years ago by Dave Abrahams <dave@…>

Here's an even simpler and better approach:

  • trac/web/api.py

     
    152152            '_inheaders': Request._parse_headers
    153153        }
    154154
     155        if 'parsed_args' in self.environ:
     156            self.args = self.environ['parsed_args']
     157
    155158        self.base_url = self.environ.get('trac.base_url')
    156159        if not self.base_url:
    157160            self.base_url = self._reconstruct_url()
     
    461464                    else:
    462465                        args[name] = value
    463466
     467        self.environ['parsed_args'] = args
    464468        return args
    465469
    466470    def _parse_cookies(self):

comment:4 follow-up: Changed 7 years ago by cboos

Does the parsed_args name correspond to some sort of standard? If not, we should probably name this WSGI environment variable trac.parsed_args or something. Also, adding a comment before the if about the use cases for the variable would be helpful.

comment:5 follow-up: Changed 7 years ago by nkantrowitz

cmlenz and I agreed that this should be worked around in TracForge, not Trac. I believe I have a viable mechanism to do it, just need to test it.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 7 years ago by Dave Abrahams <dave@…>

Replying to nkantrowitz:

cmlenz and I agreed that this should be worked around in TracForge, not Trac.

The problem AFAICT is that the current design of Trac makes it nearly impossible to do that in a plugin. The only reasons I can see for not changing Trac in this case are:

  1. It's not only possible, but actually easy to do in a plugin
  2. This sort of thing is considered evil and Trac does not want to support plugins doing it.

I believe I have a viable mechanism to do it, just need to test it.

Well then, I'm eager to see the details, because we tried lots of approaches to doing it in TracForge, none of which ultimately worked. And the solution ought to be prominently documented (unless of course Trac doesn't want to support plugins doing this) because of its non-obviousness.

comment:7 in reply to: ↑ 6 Changed 7 years ago by nkantrowitz

Replying to Dave Abrahams <dave@boost-consulting.com>:

Replying to nkantrowitz:

cmlenz and I agreed that this should be worked around in TracForge, not Trac.

The problem AFAICT is that the current design of Trac makes it nearly impossible to do that in a plugin. The only reasons I can see for not changing Trac in this case are:

  1. It's not only possible, but actually easy to do in a plugin
  2. This sort of thing is considered evil and Trac does not want to support plugins doing it.

I believe I have a viable mechanism to do it, just need to test it.

Well then, I'm eager to see the details, because we tried lots of approaches to doing it in TracForge, none of which ultimately worked. And the solution ought to be prominently documented (unless of course Trac doesn't want to support plugins doing this) because of its non-obviousness.

Its not so much not wanting to support it as that this kind of WSGI re-serving is clearly a special case. The workaround I devised is to do the secondary WSGI request in a pre-request filter. req.args is evaluated lazily, and as long as nothing touches it you should be fine.

comment:8 in reply to: ↑ 4 Changed 7 years ago by Dave Abrahams <dave@…>

Replying to cboos:

Does the parsed_args name correspond to some sort of standard?

Not that we know of :-)

If not, we should probably name this WSGI environment variable trac.parsed_args or something.

Fine with us.

Also, adding a comment before the if about the use cases for the variable would be helpful.

If nkantrowitz' idea doesn't pan out, we'll be happy to submit a new patch, but right now it sounds like you guys want to use his approach, whatever that is.

comment:9 Changed 7 years ago by Dave Abrahams <dave@…>

  • Resolution set to fixed
  • Status changed from new to closed

CodeRanger? says this is now handled properly in TracForge?. It was broken "because the sub-request was being processed much later in the cycle."

comment:10 Changed 7 years ago by eblot

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:11 Changed 7 years ago by eblot

  • Resolution set to worksforme
  • Status changed from reopened to closed

comment:12 Changed 7 years ago by nkantrowitz

For the record, nothing was changed specifically for this. Just running the sub-request in the match_request was enough.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain jonas.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from jonas to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.