Opened 19 years ago
Closed 16 years ago
#2983 closed enhancement (fixed)
Proposal to add util.parseargs
Reported by: | Alec Thomas | Owned by: | Christian Boos |
---|---|---|---|
Priority: | normal | Milestone: | 0.11 |
Component: | wiki system | Version: | devel |
Severity: | normal | Keywords: | macro |
Cc: | martin@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
It seems that almost every macro has to reimplement its own argument parser. This doesn't seem ideal.
I've implemented a generic argument parser for the tags plugin, available here. It parses arguments like this: kw=(a, b, c), foo, bar
, and returns a tuple of (positional[], keywords{})
.
If you think this is a good idea, I can clean it up for submission.
Attachments (0)
Change History (24)
comment:1 by , 19 years ago
Milestone: | → 0.10 |
---|---|
Priority: | normal → high |
comment:2 by , 19 years ago
I think having a utility function is preferable, so old-style macros can use it. The linked parseargs module does indeed handle quoted strings, though naively at the moment. eg.
3> from tractags.parseargs import parseargs 4> parseargs('''a="alec thomas", b=10''') <4 ([], {'a': 'alec thomas', 'b': '10'}) 5> parseargs('''a="ain't it nice", b=10''') <5 ([], {'a': "ain't it nice", 'b': '10'})
By naively I mean it doesn't handle escaped (\) characters. I can update it to do so.
comment:3 by , 19 years ago
Another function that I thought would be useful in a number of places is the getbool()
method from config. In fact, I have factored this out in workflow as trac.util.deserialize_bool()
.
comment:5 by , 19 years ago
Sure, I'm not fussed :)
I only chose deserialize primarily because there was no chance of ambiguity.
comment:6 by , 19 years ago
I renamed them in the latest workflow commit. It is much more convenient, can't deny it.
comment:7 by , 19 years ago
How about this implementation. It handles normal Python basic types and has the same calling convention and return type as the above.
def parse_args(args): import compiler def get_value(node): if isinstance(node, compiler.ast.Name): return node.name elif isinstance(node, compiler.ast.Const): return node.value elif isinstance(node, compiler.ast.Tuple) or \ isinstance(node, compiler.ast.List): return tuple([get_value(el) for el in node.nodes]) elif isinstance(node, compiler.ast.Dict): return dict([(get_value(k), get_value(v)) for k, v in node.items]) raise ValueError('invalid arguments near %s' % str(node)) largs = [] kwargs = {} expr = compiler.parse('mock(%s)' % args, 'eval') for node in expr.getChildNodes()[0].args: if isinstance(node, compiler.ast.Keyword): kwargs[node.name] = get_value(node.expr) else: largs.append(get_value(node)) return largs, kwargs
comment:8 by , 19 years ago
… and handles escaped characters and looks much cleaner & shorter than the first proposal. I like it.
But should we really put that in util
?
What other use than for macros do you foresee?
I think we should rather put it in trac/wiki/api.py
,
so that it can be imported together with IWikiMacroProvider
(and maybe WikiMacroBase
could be moved there too?)
comment:10 by , 19 years ago
Keywords: | macro added |
---|---|
Type: | defect → enhancement |
So what about committing the above?
comment:12 by , 19 years ago
Owner: | changed from | to
---|
You really should :)
In the meantime, I'll take care of this one for you.
comment:13 by , 19 years ago
Priority: | high → normal |
---|
Well, I did some additional testing, and it appears that this needs some more work before it can really be used on a bigger scale:
-
macros.py
54 54 raise NotImplementedError 55 55 56 56 57 def parse_args(args): 58 """Utility for parsing the `content` parameter in `render_macro`. 59 60 This follows the usual Python convention for parsing function 61 arguments, including named arguments: 62 63 >>> parse_args('') 64 ([], {}) 65 >>> parse_args('1, text') 66 ([1, 'text'], {}) 67 >>> parse_args('option=yes') 68 ([], {'option': 'yes'}) 69 >>> parse_args('1, text, option=yes') 70 ([1, 'text'], {'option': 'yes'}) 71 72 List and tuple arguments are supported too: 73 74 >>> parse_args('[1,2,3], list=(a,b)') 75 ([(1, 2, 3)], {'list': ('a', 'b')}) 76 77 It's also very useful for handling quotes the expected way: 78 79 >>> parse_args('"foo, bar", color="rgb(ff,00,00)", separator=","') 80 (['foo, bar'], {'color': 'rgb(ff,00,00)', 'separator': ','}) 81 82 '''However''' 83 84 A big limitation of this approach is that any unquoted text argument 85 has to be a proper Python identifier. 86 i.e. it can't contain space, start with a digit, etc. 87 88 A lot of expected usage of macros wouldn't work. 89 90 >>> parse_args('Hello world!') 91 Traceback (most recent call last): 92 ... 93 SyntaxError: invalid syntax 94 95 >>> parse_args('Prefix%%') # NB: second % for doctest's sake 96 Traceback (most recent call last): 97 ... 98 SyntaxError: invalid syntax 99 100 >>> parse_args('TracDev/') 101 Traceback (most recent call last): 102 ... 103 SyntaxError: unexpected EOF while parsing 104 105 >>> parse_args('OtherPage:foo.bmp') 106 Traceback (most recent call last): 107 ... 108 SyntaxError: invalid syntax 109 110 >>> parse_args('photo.jpg, align=right') 111 Traceback (most recent call last): 112 ... 113 ValueError: invalid arguments near Getattr(Name('photo'), 'jpg') 114 """ 115 import compiler 116 117 def get_value(node): 118 if isinstance(node, compiler.ast.Name): 119 return node.name 120 elif isinstance(node, compiler.ast.Const): 121 return node.value 122 elif isinstance(node, compiler.ast.Tuple) or \ 123 isinstance(node, compiler.ast.List): 124 return tuple([get_value(el) for el in node.nodes]) 125 elif isinstance(node, compiler.ast.Dict): 126 return dict([(get_value(k), get_value(v)) for k, v in node.items]) 127 raise ValueError('invalid arguments near %s' % str(node)) 128 129 largs = [] 130 kwargs = {} 131 expr = compiler.parse('mock(%s)' % args, 'eval') 132 for node in expr.getChildNodes()[0].args: 133 if isinstance(node, compiler.ast.Keyword): 134 kwargs[node.name] = get_value(node.expr) 135 else: 136 largs.append(get_value(node)) 137 return largs, kwargs 138 139 57 140 class TitleIndexMacro(WikiMacroBase): 58 141 """Inserts an alphabetic list of all wiki pages into the output.
comment:14 by , 19 years ago
Yes of course. The rewritten version was not meant as a general backwards compatible solution, but for future macros that require Python-esque arguments.
The issues you ran into are exactly why the parseargs from the tags plugin is so much more complex: it handles the parsing itself rather than relying on the Python parser, and consequently allows for more free-form arguments:
1> from tractags.parseargs import parseargs 2> parseargs('alec-thomas, thomas') <2 (['alec-thomas', 'thomas'], {}) 3> parseargs('alec!!!, thomas') <3 (['alec!!!', 'thomas'], {}) 4> parseargs('alec!!!=(1, 2, 3), thomas') <4 (['thomas'], {'alec!!!': ['1', '2', '3']})
It doesn't support spaces in arguments, but could be modified to do so.
comment:17 by , 18 years ago
Component: | general → wiki |
---|---|
Owner: | changed from | to
What about this simpler parse_args()
?
The rules would be:
- every "," character inside a macro call is an argument separator, unless escaped by a "\"
- every identifier directly followed by an "=" corresponds to an assignement to a named argument
The point being that Wiki authors won't necessarily want to follow Python syntactic rules when writing macros, and that the simple rules above are backward compatible, unambiguous, and simple to implement.
def parse_args(args): largs, kwargs = [], {} if args: for arg in re.split(r'(?<!\\),', args): m = re.match(r'\s*[a-zA-Z_]\w=', arg) if m: kwargs[arg[:m.end()-1]] = arg[m.end():] else: largs.append(arg) return largs, kwargs
Typical use, e.g. for the last arguments of the TicketQuery macro (the first one being always the query itself):
>>> parse_args('format=compact, order=priority, title=Priority List') ([], {' title': 'Priority List', ' order': 'priority', 'format': 'compact'})
comment:18 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Hm, well, the simple parse_args
from above slipped in with r3694.
I think that's enough to cover most needs.
follow-ups: 21 24 comment:19 by , 16 years ago
Cc: | added |
---|---|
Milestone: | 0.11 → 0.11.2 |
Resolution: | fixed |
Status: | closed → reopened |
I like to suggest the following enhancement to parse_args
, which allows non-escaped commas inside double-quotes, e.g.:
[[Macro(address="My street, city, county",telephon=12345)]]
This can be done by changing the split regex (see patch below). I found a suitable one on this website.
I tested it a little in the console with different non-trivial sets, but further testing should be done.
There is also the possibility to introduce a new parameter to parse_args
which controls if the old or the new regex is used to ensure 100% backward compatibility for unforeseen cases where the new regex might fail, e.g.:
largs, kwargs = parse_args(content, quotes=True)
-
api.
old new 139 139 """ 140 140 largs, kwargs = [], {} 141 141 if args: 142 for arg in re.split(r'(?<!\\), ', args):142 for arg in re.split(r'(?<!\\),(?!(?:[^",]|[^"],[^"])+")/', args): 143 143 arg = arg.replace(r'\,', ',') 144 144 if strict: 145 145 m = re.match(r'\s*[a-zA-Z_]\w+=', arg)
follow-up: 22 comment:21 by , 16 years ago
Replying to martin@…:
+ for arg in re.split(r'(?<\\),(?!(?:[",]|["],["])+")/', args):
I just figured out that this regex actually doesn't work with python in the current form. It does work with Ruby, PHP and Perl, however. The '[",]' seems to be the problem. Sorry, but I used Perl in my earlier test, because it's easier for me and I thought the regex syntax is the same.
Anyway, I googled a little more and would like now to suggest the use of pythons csv
module:
import csv for arg in csv.reader([args],delimiter=',',escapechar='\\').next():
See http://www.python.org/doc/2.5.2/lib/module-csv.html for an description.
comment:22 by , 16 years ago
I checked now the csv
module more deeply and found out that while it supports the handling of comma-separated, possible quoted 'values' very well it doesn't work with partially quoted content, i.e. both key and value must be quoted together, not just the value part:
"key=value" instead of key="value", which isn't what we want. I rechecked the previous posted RegEx and found out that it fails at the same issue.
comment:23 by , 16 years ago
Some friends of mine and I now spend some time over this and we came to the solution that an own small parser, not a regex, is needed for the split operation to allow (non-escaped) commas in quoted values. Also the case were a back-slash should be the last character value of an argument (e.g. path=c:\windows\,other=pair
) can be handled correctly by the parser, which isn't the case in the current implementation.
The below parser ignores commas in double quotes and lets the backslash escape any following character (e.g. path=c:\\windows\\,other=pair
would result in { 'path': r'c:\windows\', 'other':'pair }
).
def parse_args (args,strict=True): """ parses a comma separated string were the values can include multiple quotes """ esc = 0 # last char was escape char quote = 0 # inside quote start = 0 # start of current field pos = 0 # current position largs = [] kwargs = {} def checkkey (arg): import re arg = arg.replace(r'\,', ',') if strict: m = re.match(r'\s*[a-zA-Z_]\w+=', arg) else: m = re.match(r'\s*[^=]+=', arg) if m: kw = arg[:m.end()-1].strip() if strict: kw = unicode(kw).encode('utf-8') kwargs[kw] = arg[m.end():] else: largs.append(arg) if args: # Small parser to split on commas # * Ignores commas in double quotes # * Backslash escapes any following character for char in args: if esc: esc = 0 elif char == '"': quote = not quote elif char == '\\': esc = 1 elif char == ',' and not quote: checkkey( args[start:pos] ) start = pos + 1 pos = pos + 1 checkkey( args[start:] ) return largs, kwargs
In addition the quote, escape and delimiter characters could be made optional arguments to further generalize the parser. Some special macro might need to re-parse single values with other delimiters which might also include quotes (for example my th:GoogleMapMacro macro for the marker
argument).
comment:24 by , 16 years ago
Milestone: | 0.11.3 → 0.11 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
Replying to martin@…:
I like to suggest the following enhancement to
parse_args
, which allows non-escaped commas inside double-quotes, e.g.: [[Macro(address="My street, city, county",telephon=12345)]]
If you still like to get this enhancement, then please create a new ticket for it (with the code in comment:23 as a patch), together with a bunch of unit-tests for it.
Re-closing this ticket as a 0.11 one.
+1 on this.
But access to the unprocessed "arguments" (i.e. content) should be preserved, in case the arguments are not really arguments… e.g.
I wonder if we couldn't distinguish this kind of macro (as well as stay compatible with older macros) simply by looking at the arity of the
render_macro
method:(first form has an arity of 3, second of 2)
Alternatively, we could keep the
render_macro(req, name, content)
form and make use of an utility function, e.g.Last comment on this: this ideally should also take care of quote/double quotes, like in:
I'm setting the prio/milestone for this, as it would really help me to rework a bit the Image macro…