Another Python 2 Unicode Mystery Solved

08 Jun
June 8, 2012

The challenges of Unicode with Python 2.x are decried throughout the internet. This little devil had me scratching my head for quite some time. The test script is simple enough:

desc = u"The Quick Brown Fox Jumped over the lazy Dog\u2019s Back!"
print desc

One can easily see that a Unicode string is being created with a special character the \u2019 which is a right apostrophe. Python handles this just fine when run from the command line.

$ python

The Quick Brown Fox Jumped over the lazy Dog’s Back!

No Mystery here… let’s redirect standard out to a file:

$ python >test.log

Traceback (most recent call last):

File "", line 4, in <module>
print desc

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 44: ordinal not in range(128)

Gah! ….What gives here? I never asked for the ‘ascii’ codec to do anything!

Unfortunately, python did!

When redirecting to a file, python 2.x uses the default Unicode encoder which is ascii, so the Unicode string gets converted to a sequence of bytes that must be between 0 and 127. Since the \u2019 is outside of the 0 to 127 range, the encoder cannot encode it. Ergo the incredibly helpful exception text.

To rectify this, a default ascii encoding a python environment variable can be set:

$ python >test.log
$ cat test.log

The Quick Brown Fox Jumped over the lazy Dog’s Back!

This is handy to know if like me you utilize print statements and redirect output to a log file as a mechanism of debugging code.

A list of helpful PYTHON command line switches and environment variables is available at:

Don Zickefoose

Don is a seasoned consultant whose purpose is to use his drive and creativity to make Findaway World a global leader in the production and distribution of digital media.

More Posts

Tags: , ,
© Copyright 2017 Findaway. All rights reserved.