Hi,
r2w so far does a great job of dealing with non ASCII encodings. But I
found a glitch: I have a user-value in a page that uses a non ascii
character. Python isn't to happy about that: the line
print node_title + ": " + title
Now if node_tile contains, say "=EB", I get this
Traceback (most recent call last):
[err] File "./r2w.py", line 185, in ?
[err] count =3D main(options, config)
[err] File "./r2w.py", line 94, in main
[err] return processor.walk()
[err] File "/home/varoquau/www/rest2web/rest2web/restprocessor.py",
line 385, in walk
[err] self.buildsection()
[err] File "/home/varoquau/www/rest2web/rest2web/restprocessor.py",
line 1325, in buildsection
[err] uservalues =3D enc_uni_dict(page['uservalues'], final_encoding)
[err] File "/home/varoquau/www/rest2web/rest2web/restutils.py", line
227, in enc_uni_dict
[err] val =3D uni_dict[entry].encode(encoding)
[err] File "/usr/lib/python2.4/encodings/iso8859_1.py", line 18, in
encode
[err] return codecs.charmap_encode(input,errors,encoding_map)
[err] UnicodeDecodeError: 'ascii' codec can't decode byte 0xeb in
position 2: ordinal not in range(128)
[err]
The encoding of this string is most probably the encoding of the page,
therefore Latin1. A fix would probably to have the user-value be
translated from Latin1 to unicode when it is read, as I recon the parser
knows what the encoding of the page is, at this point.
On a side note, when r2w fails with such an error it still return a
return value of 0, with means success in the Unix world, and is used in
makefile, or building script a lot. On my website I check for the return
value of r2w before propagating the website, but I cannot trap the rror,
as it is not reported.
Cheers,
Ga=EBl
=20
|