Another Python 2 Unicode Mystery Solved


by
08 Jun
June 8, 2012

The challenges of Unicode with Python 2.x are decried throughout the internet. This little devil had me scratching my head for quite some time. The test script is simple enough:

#!/usr/bin/python
desc = u"The Quick Brown Fox Jumped over the lazy Dog\u2019s Back!"
print desc

One can easily see that a Unicode string is being created with a special character the \u2019 which is a right apostrophe. Python handles this just fine when run from the command line.

$ python test.py

The Quick Brown Fox Jumped over the lazy Dog’s Back!

No Mystery here… let’s redirect standard out to a file:

$ python test.py >test.log

Traceback (most recent call last):

File "test.py", line 4, in <module>
print desc

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 44: ordinal not in range(128)

Gah! ….What gives here? I never asked for the ‘ascii’ codec to do anything!

Unfortunately, python did!

When redirecting to a file, python 2.x uses the default Unicode encoder which is ascii, so the Unicode string gets converted to a sequence of bytes that must be between 0 and 127. Since the \u2019 is outside of the 0 to 127 range, the encoder cannot encode it. Ergo the incredibly helpful exception text.

To rectify this, a default ascii encoding a python environment variable can be set:

PYTHONIOENCODING
$ export PYTHONIOENCODING=UTF-8
$ python test.py >test.log
$ cat test.log

The Quick Brown Fox Jumped over the lazy Dog’s Back!

This is handy to know if like me you utilize print statements and redirect output to a log file as a mechanism of debugging code.

A list of helpful PYTHON command line switches and environment variables is available at: http://docs.python.org/using/cmdline.html

Don Zickefoose

Don is a seasoned consultant whose purpose is to use his drive and creativity to make Findaway World a global leader in the production and distribution of digital media.

More Posts

Tags: , ,
© Copyright 2017 Findaway. All rights reserved.