Context Navigation

Modify ↓

#6930 closed defect (fixed)

showworkflow script can't handle accented characters in workflow states

Reported by:	abli@…	Owned by:	Eli Carter
Priority:	normal	Milestone:	0.12.3
Component:	ticket system	Version:	0.11b1
Severity:	minor	Keywords:	workflow unicode
Cc:		Branch:
Release Notes:
API Changes:
Internal Changes:

Description (last modified by Tim Hatch)

trac seems to be able to handle accented characters in state names or transition names. The showworkflow script, however, fails with the following exception when run on a .ini file that trac can handle:

Traceback (most recent call last):
  File "./workflow_parser.py", line 109, in ?
    main(args[0], show_ops, show_perms)
  File "./workflow_parser.py", line 76, in main
    sys.stdout.write(''.join(digraph_lines))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 189: ordinal not in range(128)
Failed to parse "inventory-workflow.ini", exiting.

The bug is actually in workflow_parser.py (and python's handling of sys.stdout): showworkflow runs workflow_parser.py and redirects the output. Because sys.stdout is redirected, its encoding is set to None, which means that ascii encoding is used. This can't handle most accented characters, result in the exception.

A possible fix is to set encoding of sys.stdout in workflow_parser.py, by replacing

sys.stdout.write(''.join(digraph_lines))

with

    import locale, codecs
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout);
    sys.stdout.write(''.join(digraph_lines))

(see, for example http://wiki.python.org/moin/PrintFails and http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/)

After this, showworkflow script will run and produce correct .png output. .ps output, however, will be wrong as graphviz doesn't appear to be able to handle non- latin-1 chars in ps output (see "More generally, how do I use non-ASCII character sets?" in http://www.graphviz.org/doc/FAQ.html)

.pdf output, however appears to work, so I think instead of using ps2pdf, .pdf should be seperatelly generated with

dot -T pdf -o ...filenames...

Attachments (1)

ticket-6930-v1.patch (3.4 KB ) - added by Eli Carter 14 years ago.: proposed fix

Download all attachments as: .zip

Change History (9)

follow-up: 2 comment:1 by Tim Hatch, 17 years ago

Description:	modified (diff)

First, make sure your $LANG is set correctly, and that Python is picking it up for stdout.

>>> import sys
>>> sys.stdout.encoding
'UTF-8'

Then, if you change the ''.join to u''.join, does it work correctly? I've never had to resort to codecs.getwriter just to print Unicode.

in reply to: 1 comment:2 by abli@…, 17 years ago

Thanks for fixing the markup of the exception in my report.

Replying to thatch:

First, make sure your $LANG is set correctly, and that Python is picking it up for stdout.

Then, if you change the ''.join to u''.join, does it work correctly? I've never had to resort to codecs.getwriter just to print Unicode.

As noted in the links I included in my report, the problem is that if stdout is redirected (i.e. is not a terminal) python won't care about $LANG. As such using u''.join doesn't matter.

On my system (debian lenny on amd64):

abeld@csik:0:~$ echo $LANG
en_US.UTF-8
abeld@csik:0:~$ python -c "import sys; print sys.stdout.encoding"
UTF-8
abeld@csik:0:~$ python -c "import sys; print sys.stdout.encoding" | cat
None

which means that:

abeld@csik:0:~$  python -c "print u'\\N{LATIN SMALL LETTER O WITH ACUTE}'"
ó
abeld@csik:0:~$ python -c "print u'\\N{LATIN SMALL LETTER O WITH ACUTE}'" | cat
Traceback (most recent call last):
  File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 0: ordinal not in range(128)

And this is "working as designed", i.e. there is nothing broken on my system.

comment:3 by Piotr Kuczynski <piotr.kuczynski@…>, 17 years ago

Component:	general → ticket system
Keywords:	workflow added
Milestone:	→ 0.11.1

comment:4 by Christian Boos, 17 years ago

Keywords:	unicode added
Milestone:	0.11.2 → 0.11.3
Owner:	changed from Jonas Borgström to Christian Boos
Severity:	normal → minor

I'll look into this.

by Eli Carter, 14 years ago

Attachment:	ticket-6930-v1.patch added

proposed fix

comment:5 by Eli Carter, 14 years ago

Owner:	changed from Christian Boos to Eli Carter
Status:	new → assigned

comment:6 by Eli Carter, 14 years ago

Ah, ignore the delta on the .ini file; that was just to create a testcase I could work against, and is not intended to be committed.

comment:7 by Eli Carter, 14 years ago

Resolution:	→ fixed
Status:	assigned → closed

Fixed for 0.12-stable in [10646] and 0.13dev in [10647].

comment:8 by Remy Blank, 14 years ago

Milestone:	next-minor-0.12.x → 0.12.3

Modify Ticket

Change Properties

Summary:
Description:	trac seems to be able to handle accented characters in state names or transition names. The showworkflow script, however, fails with the following exception when run on a .ini file that trac can handle: {{{ Traceback (most recent call last): File "./workflow_parser.py", line 109, in ? main(args[0], show_ops, show_perms) File "./workflow_parser.py", line 76, in main sys.stdout.write(''.join(digraph_lines)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 189: ordinal not in range(128) Failed to parse "inventory-workflow.ini", exiting. }}} The bug is actually in workflow_parser.py (and python's handling of sys.stdout): showworkflow runs workflow_parser.py and redirects the output. Because sys.stdout is redirected, its encoding is set to None, which means that ascii encoding is used. This can't handle most accented characters, result in the exception. A possible fix is to set encoding of sys.stdout in workflow_parser.py, by replacing {{{ sys.stdout.write(''.join(digraph_lines)) }}} with {{{ import locale, codecs sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); sys.stdout.write(''.join(digraph_lines)) }}} (see, for example http://wiki.python.org/moin/PrintFails and http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/) After this, showworkflow script will run and produce correct .png output. .ps output, however, will be wrong as graphviz doesn't appear to be able to handle non- latin-1 chars in ps output (see "More generally, how do I use non-ASCII character sets?" in http://www.graphviz.org/doc/FAQ.html) .pdf output, however appears to work, so I think instead of using ps2pdf, .pdf should be seperatelly generated with {{{ dot -T pdf -o ...filenames... }}} You may use WikiFormatting here.
Type:		Priority:
Milestone:		Component:
Version:		Severity:
Keywords:		Cc:	Set your email in Preferences
Branch:
Release Notes:
API Changes:
Internal Changes:

Action

leave as closed The owner will remain Eli Carter.

reopen The resolution will be deleted. Next status will be 'reopened'.

change ownership to The owner will be changed from Eli Carter to the specified user.

Add Comment

Your email or username:

E-mail address and name can be saved in the Preferences .

You may use WikiFormatting here.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats: