#111 Excel converter doesn't work with special chars

TextIndexNG
closed-wont-fix
nobody
Indexer (49)
5
2009-12-08
2009-09-17
Tom
No

The default excel converter cannot handle special chars like german umlauts correct. I have some good results with changing the xls2csv call to:

result = (self.execute('xls2csv -s 8859-1 -d utf-8 -q 0 "%s" 2> %s' % (tmp_name, str(err))), 'utf-8')

It can be tested with (assuming there is a test-excel containing the two terms 'Ölpest' and 'Mängelprotokoll''):

import unittest
from os.path import join, dirname
from Products.PortalTransforms.libtransforms.utils import bin_search
from Products.PortalTransforms.libtransforms.utils import MissingBinary

class ConverterTestCase(unittest.TestCase):

def test_xlsconverter(self):
from fhnw.adjustments.txng.xls import Converter

try:
bin_search('xls2csv')
except MissingBinary:
print "'xls2csv' utility not available in sys-path. skipping test."
return

xls = open(join(dirname(__file__), 'data', 'testexcel.xls')).read()
conv = Converter()
text, encoding = conv.convert(xls, None, '')
self.assertEqual(encoding, 'utf-8')
self.failUnless(u'Ölpest' in unicode(text, encoding))
self.failUnless(u'Mängelprotokoll' in unicode(text, encoding))

Discussion

  • Andreas Jung
    Andreas Jung
    2009-12-08

    Making other assuptions about the encoding will raise problems for other people. I suggest you create your own converter (subclassing should not be that complicated) and re-register it as CSV converter.

     
  • Andreas Jung
    Andreas Jung
    2009-12-08

    • status: open --> closed-wont-fix