Edgewall Software
Modify

Opened 19 years ago

Closed 16 years ago

#2983 closed enhancement (fixed)

Proposal to add util.parseargs

Reported by: Alec Thomas Owned by: Christian Boos
Priority: normal Milestone: 0.11
Component: wiki system Version: devel
Severity: normal Keywords: macro
Cc: martin@… Branch:
Release Notes:
API Changes:
Internal Changes:

Description

It seems that almost every macro has to reimplement its own argument parser. This doesn't seem ideal.

I've implemented a generic argument parser for the tags plugin, available here. It parses arguments like this: kw=(a, b, c), foo, bar, and returns a tuple of (positional[], keywords{}).

If you think this is a good idea, I can clean it up for submission.

Attachments (0)

Change History (24)

comment:1 by Christian Boos, 19 years ago

Milestone: 0.10
Priority: normalhigh

+1 on this.

But access to the unprocessed "arguments" (i.e. content) should be preserved, in case the arguments are not really arguments… e.g.

[[TODO(Do it, or do it not)]]

I wonder if we couldn't distinguish this kind of macro (as well as stay compatible with older macros) simply by looking at the arity of the render_macro method:

class IWikiMacroProvider(Interface):
    ...
    def render_macro(req, name, content):
        """Return the HTML output of the macro."""

    def render_macro(req, name, *args, **kwd):
        """Same as above, but use prepared arguments"""

(first form has an arity of 3, second of 2)

Alternatively, we could keep the render_macro(req, name, content) form and make use of an utility function, e.g.

    def render_macro(req, name, content):
        args, kwd = parse_args(content)

Last comment on this: this ideally should also take care of quote/double quotes, like in:

[[Image(coolpic.PNG, title="Ain't that nice, dude?")]]

I'm setting the prio/milestone for this, as it would really help me to rework a bit the Image macro…

comment:2 by Alec Thomas, 19 years ago

I think having a utility function is preferable, so old-style macros can use it. The linked parseargs module does indeed handle quoted strings, though naively at the moment. eg.

3> from tractags.parseargs import parseargs
4> parseargs('''a="alec thomas", b=10''')
<4 ([], {'a': 'alec thomas', 'b': '10'})
5> parseargs('''a="ain't it nice", b=10''')
<5 ([], {'a': "ain't it nice", 'b': '10'})

By naively I mean it doesn't handle escaped (\) characters. I can update it to do so.

comment:3 by Alec Thomas, 19 years ago

Another function that I thought would be useful in a number of places is the getbool() method from config. In fact, I have factored this out in workflow as trac.util.deserialize_bool().

comment:4 by Christian Boos, 19 years ago

Yes, but why not simpler names :to_bool and to_list?

comment:5 by Alec Thomas, 19 years ago

Sure, I'm not fussed :)

I only chose deserialize primarily because there was no chance of ambiguity.

comment:6 by Alec Thomas, 19 years ago

I renamed them in the latest workflow commit. It is much more convenient, can't deny it.

comment:7 by Alec Thomas, 19 years ago

How about this implementation. It handles normal Python basic types and has the same calling convention and return type as the above.

def parse_args(args):
    import compiler

    def get_value(node):
        if isinstance(node, compiler.ast.Name):
            return node.name
        elif isinstance(node, compiler.ast.Const):
            return node.value
        elif isinstance(node, compiler.ast.Tuple) or \
             isinstance(node, compiler.ast.List):
            return tuple([get_value(el) for el in node.nodes])
        elif isinstance(node, compiler.ast.Dict):
            return dict([(get_value(k), get_value(v)) for k, v in node.items])
        raise ValueError('invalid arguments near %s' % str(node))

    largs = []
    kwargs = {}
    expr = compiler.parse('mock(%s)' % args, 'eval')
    for node in expr.getChildNodes()[0].args:
        if isinstance(node, compiler.ast.Keyword):
            kwargs[node.name] = get_value(node.expr)
        else:
            largs.append(get_value(node))
    return largs, kwargs

comment:8 by Christian Boos, 19 years ago

… and handles escaped characters and looks much cleaner & shorter than the first proposal. I like it.

But should we really put that in util? What other use than for macros do you foresee?

I think we should rather put it in trac/wiki/api.py, so that it can be imported together with IWikiMacroProvider (and maybe WikiMacroBase could be moved there too?)

comment:9 by Alec Thomas, 19 years ago

Yes, agreed. Chris and I had come to the same conclusion in #trac.

comment:10 by Christian Boos, 19 years ago

Keywords: macro added
Type: defectenhancement

So what about committing the above?

comment:11 by Alec Thomas, 19 years ago

Sounds good to me, but I don't have commit access to trunk :)

comment:12 by Christian Boos, 19 years ago

Owner: changed from Jonas Borgström to Alec Thomas

You really should :)

In the meantime, I'll take care of this one for you.

comment:13 by Christian Boos, 19 years ago

Priority: highnormal

Well, I did some additional testing, and it appears that this needs some more work before it can really be used on a bigger scale:

  • macros.py

     
    5454        raise NotImplementedError
    5555
    5656
     57def parse_args(args):
     58    """Utility for parsing the `content` parameter in `render_macro`.
     59
     60    This follows the usual Python convention for parsing function
     61    arguments, including named arguments:
     62
     63    >>> parse_args('')
     64    ([], {})
     65    >>> parse_args('1, text')
     66    ([1, 'text'], {})
     67    >>> parse_args('option=yes')
     68    ([], {'option': 'yes'})
     69    >>> parse_args('1, text, option=yes')
     70    ([1, 'text'], {'option': 'yes'})
     71
     72    List and tuple arguments are supported too:
     73
     74    >>> parse_args('[1,2,3], list=(a,b)')
     75    ([(1, 2, 3)], {'list': ('a', 'b')})
     76
     77    It's also very useful for handling quotes the expected way:
     78
     79    >>> parse_args('"foo, bar", color="rgb(ff,00,00)", separator=","')
     80    (['foo, bar'], {'color': 'rgb(ff,00,00)', 'separator': ','})
     81
     82    '''However'''
     83
     84    A big limitation of this approach is that any unquoted text argument
     85    has to be a proper Python identifier.
     86    i.e. it can't contain space, start with a digit, etc.
     87
     88    A lot of expected usage of macros wouldn't work.
     89
     90    >>> parse_args('Hello world!')
     91    Traceback (most recent call last):
     92         ...
     93    SyntaxError: invalid syntax
     94
     95    >>> parse_args('Prefix%%')             # NB: second % for doctest's sake
     96    Traceback (most recent call last):
     97         ...
     98    SyntaxError: invalid syntax
     99
     100    >>> parse_args('TracDev/')
     101    Traceback (most recent call last):
     102         ...
     103    SyntaxError: unexpected EOF while parsing
     104
     105    >>> parse_args('OtherPage:foo.bmp')
     106    Traceback (most recent call last):
     107         ...
     108    SyntaxError: invalid syntax
     109
     110    >>> parse_args('photo.jpg, align=right')
     111    Traceback (most recent call last):
     112         ...
     113    ValueError: invalid arguments near Getattr(Name('photo'), 'jpg')
     114    """
     115    import compiler
     116       
     117    def get_value(node):
     118        if isinstance(node, compiler.ast.Name):
     119            return node.name
     120        elif isinstance(node, compiler.ast.Const):
     121            return node.value
     122        elif isinstance(node, compiler.ast.Tuple) or \
     123             isinstance(node, compiler.ast.List):
     124            return tuple([get_value(el) for el in node.nodes])
     125        elif isinstance(node, compiler.ast.Dict):
     126            return dict([(get_value(k), get_value(v)) for k, v in node.items])
     127        raise ValueError('invalid arguments near %s' % str(node))
     128
     129    largs = []
     130    kwargs = {}
     131    expr = compiler.parse('mock(%s)' % args, 'eval')
     132    for node in expr.getChildNodes()[0].args:
     133        if isinstance(node, compiler.ast.Keyword):
     134            kwargs[node.name] = get_value(node.expr)
     135        else:
     136            largs.append(get_value(node))
     137    return largs, kwargs
     138
     139
    57140class TitleIndexMacro(WikiMacroBase):
    58141    """Inserts an alphabetic list of all wiki pages into the output.

comment:14 by Alec Thomas, 19 years ago

Yes of course. The rewritten version was not meant as a general backwards compatible solution, but for future macros that require Python-esque arguments.

The issues you ran into are exactly why the parseargs from the tags plugin is so much more complex: it handles the parsing itself rather than relying on the Python parser, and consequently allows for more free-form arguments:

1> from tractags.parseargs import parseargs
2> parseargs('alec-thomas, thomas')
<2 (['alec-thomas', 'thomas'], {})
3> parseargs('alec!!!, thomas')
<3 (['alec!!!', 'thomas'], {})
4> parseargs('alec!!!=(1, 2, 3), thomas')
<4 (['thomas'], {'alec!!!': ['1', '2', '3']})

It doesn't support spaces in arguments, but could be modified to do so.

comment:15 by Christian Boos, 18 years ago

Ok to postpone this to 0.11?

comment:16 by Alec Thomas, 18 years ago

Milestone: 0.100.11
Version: 0.9.4devel

Yeah I agree.

comment:17 by Christian Boos, 18 years ago

Component: generalwiki
Owner: changed from Alec Thomas to Christian Boos

What about this simpler parse_args() ? The rules would be:

  • every "," character inside a macro call is an argument separator, unless escaped by a "\"
  • every identifier directly followed by an "=" corresponds to an assignement to a named argument

The point being that Wiki authors won't necessarily want to follow Python syntactic rules when writing macros, and that the simple rules above are backward compatible, unambiguous, and simple to implement.

def parse_args(args):
    largs, kwargs = [], {}
    if args:
        for arg in re.split(r'(?<!\\),', args):
            m = re.match(r'\s*[a-zA-Z_]\w=', arg)
            if m:
                kwargs[arg[:m.end()-1]] = arg[m.end():]
            else:
                largs.append(arg)
    return largs, kwargs

Typical use, e.g. for the last arguments of the TicketQuery macro (the first one being always the query itself):

>>> parse_args('format=compact, order=priority, title=Priority List')
([], {' title': 'Priority List', ' order': 'priority', 'format': 'compact'})

comment:18 by Christian Boos, 18 years ago

Resolution: fixed
Status: newclosed

Hm, well, the simple parse_args from above slipped in with r3694. I think that's enough to cover most needs.

comment:19 by martin@…, 16 years ago

Cc: martin@… added
Milestone: 0.110.11.2
Resolution: fixed
Status: closedreopened

I like to suggest the following enhancement to parse_args, which allows non-escaped commas inside double-quotes, e.g.:

[[Macro(address="My street, city, county",telephon=12345)]]

This can be done by changing the split regex (see patch below). I found a suitable one on this website.

I tested it a little in the console with different non-trivial sets, but further testing should be done.

There is also the possibility to introduce a new parameter to parse_args which controls if the old or the new regex is used to ensure 100% backward compatibility for unforeseen cases where the new regex might fail, e.g.:

 largs, kwargs = parse_args(content, quotes=True)
  • api.

    old new  
    139139    """   
    140140    largs, kwargs = [], {}
    141141    if args:
    142         for arg in re.split(r'(?<!\\),', args):
     142        for arg in re.split(r'(?<!\\),(?!(?:[^",]|[^"],[^"])+")/', args):
    143143            arg = arg.replace(r'\,', ',')
    144144            if strict:
    145145                m = re.match(r'\s*[a-zA-Z_]\w+=', arg)

comment:20 by Christian Boos, 16 years ago

Milestone: 0.11.20.11.3

Will have a look at it, but not for 0.11.2.

in reply to:  19 ; comment:21 by martin@…, 16 years ago

Replying to martin@…:

+ for arg in re.split(r'(?<\\),(?!(?:[",]|["],["])+")/', args):

I just figured out that this regex actually doesn't work with python in the current form. It does work with Ruby, PHP and Perl, however. The '[",]' seems to be the problem. Sorry, but I used Perl in my earlier test, because it's easier for me and I thought the regex syntax is the same.

Anyway, I googled a little more and would like now to suggest the use of pythons csv module:

import csv
for arg in csv.reader([args],delimiter=',',escapechar='\\').next():

See http://www.python.org/doc/2.5.2/lib/module-csv.html for an description.

in reply to:  21 comment:22 by martin@…, 16 years ago

I checked now the csv module more deeply and found out that while it supports the handling of comma-separated, possible quoted 'values' very well it doesn't work with partially quoted content, i.e. both key and value must be quoted together, not just the value part:

"key=value" instead of key="value", which isn't what we want. I rechecked the previous posted RegEx and found out that it fails at the same issue.

comment:23 by martin@…, 16 years ago

Some friends of mine and I now spend some time over this and we came to the solution that an own small parser, not a regex, is needed for the split operation to allow (non-escaped) commas in quoted values. Also the case were a back-slash should be the last character value of an argument (e.g. path=c:\windows\,other=pair) can be handled correctly by the parser, which isn't the case in the current implementation.

The below parser ignores commas in double quotes and lets the backslash escape any following character (e.g. path=c:\\windows\\,other=pair would result in { 'path': r'c:\windows\', 'other':'pair }).

def parse_args (args,strict=True):
    """ parses a comma separated string were the values can include multiple
    quotes """
    esc   = 0   # last char was escape char
    quote = 0   # inside quote
    start = 0   # start of current field
    pos   = 0   # current position
    largs  = []
    kwargs = {}

    def checkkey (arg):
        import re
        arg = arg.replace(r'\,', ',')
        if strict:
            m = re.match(r'\s*[a-zA-Z_]\w+=', arg)
        else:
            m = re.match(r'\s*[^=]+=', arg)
        if m:
            kw = arg[:m.end()-1].strip()
            if strict:
                kw = unicode(kw).encode('utf-8')
            kwargs[kw] = arg[m.end():]
        else:
            largs.append(arg)

    if args:
        # Small parser to split on commas
        #  * Ignores commas in double quotes
        #  * Backslash escapes any following character
        for char in args:
            if esc:
                esc = 0
            elif char == '"':
                quote = not quote
            elif char == '\\':
                esc = 1
            elif char == ',' and not quote:
                checkkey( args[start:pos] )
                start = pos + 1
            pos = pos + 1
        checkkey( args[start:] )
    return largs, kwargs

In addition the quote, escape and delimiter characters could be made optional arguments to further generalize the parser. Some special macro might need to re-parse single values with other delimiters which might also include quotes (for example my th:GoogleMapMacro macro for the marker argument).

in reply to:  19 comment:24 by Christian Boos, 16 years ago

Milestone: 0.11.30.11
Resolution: fixed
Status: reopenedclosed

Replying to martin@…:

I like to suggest the following enhancement to parse_args, which allows non-escaped commas inside double-quotes, e.g.: [[Macro(address="My street, city, county",telephon=12345)]]

If you still like to get this enhancement, then please create a new ticket for it (with the code in comment:23 as a patch), together with a bunch of unit-tests for it.

Re-closing this ticket as a 0.11 one.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Christian Boos.
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from Christian Boos to the specified user.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.