#11 MediaInfo.__unicode__ can throw UnicodeDecodeError

open
nobody
None
5
2005-02-01
2005-02-01
Tom Tobin
No

When the MediaInfo class's __unicode__ method is called
(e.g., when passed to unicode()), an uncaught exception
can occur when one of MediaInfo's attributes is encoded
as latin_1 and contains an ordinal character higher
than 128. Specifically, on line 156 of mediainfo.py,
unicode(self[b]) apparently throws the exception before
being fed via string substitution into the string on
the previous line.

To reproduce, one can run mminfo (or run the parse()
function) on "donnie_hi.mov", a trailer for "The Sims
2". This file should be available through the website
http://thesims2.ea.com/ or via a Google search for
"donnie_hi.zip".

Extracted, the example amounts to:

>>> unicode("'\xa9 2004 Electronic Arts Inc. All rights
reserved.'")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte
0xa9 in position 1: ordinal not in range(128)

A complete traceback via pdb follows:

korpios@nemesis ~ $ python -m pdb ~/bin/mminfo
donnie_hi.mov
> /home/korpios/bin/mminfo(43)?()
-> import sys
(Pdb) c
mmpython media info

filename : donnie_hi.mov
Traceback (most recent call last):
File "/usr/lib/python2.4/pdb.py", line 1057, in main
pdb._runscript(mainpyfile)
File "/usr/lib/python2.4/pdb.py", line 982, in _runscript
self.run(statement, globals=globals_, locals=locals_)
File "/usr/lib/python2.4/bdb.py", line 366, in run
exec cmd in globals, locals
File "<string>", line 1, in ?
File "/home/korpios/bin/mminfo", line 87, in ?
print unicode(medium).encode('latin-1', 'replace')
File "/home/korpios/py3p/mmpython/mediainfo.py", line
350, in __unicode__
result += MediaInfo.__unicode__(self)
File "/home/korpios/py3p/mmpython/mediainfo.py", line
156, in __unicode__
(a, unicode(b), unicode(self[b])) or a, keys, u'' )
File "/home/korpios/py3p/mmpython/mediainfo.py", line
155, in <lambda>
result += reduce( lambda a,b: self[b] and b !=
u'url' and u'%s\n %s: %s' % \ UnicodeDecodeError: 'ascii' codec can't decode byte
0xa9 in position 0: ordinal not in range(128)
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/korpios/py3p/mmpython/mediainfo.py(155)<lambda>()
-> result += reduce( lambda a,b: self[b] and b !=
u'url' and u'%s\n %s: %s' % \ (Pdb) print `b`
'copyright'
(Pdb) whatis b
<type 'str'>
(Pdb) print `self[b]`
'\xa9 2004 Electronic Arts Inc. All rights reserved.'
(Pdb) whatis self[b]
<type 'str'>
(Pdb) print `ord(self[b][0])`
169
(Pdb) print `hex(ord(self[b][0]))`
'0xa9'

Discussion