Share

majix

Tracker: Bugs

5 It can't deal with double byte content - ID: 1224653
Last Update: Comment added ( brandelune )

It can't process CJK(Chinese, Japan, Korea) characters,
and I believe it can't process other double byte
characters. It can't process some special characters,
such as '×', '…'. . I briefly browse the source code
and find that the token.getData return's string is in
fact a char. For double byte characters, it will
divided into two parts. and will display hex value.
For example, it can't process the attached simple rtf file.


Stone ( stonegump ) - 2005-06-21 07:19

5

Open

None

Nobody/Anonymous

XML-Generator

v2.0

Public


Comment ( 1 )




Date: 2005-07-05 17:01
Sender: brandelune

Logged In: YES
user_id=915082

I get exactly the same result. I tried rtf2xml which had the same issue
too.

There is also the problem that the proper codepage does not seem to be
set. It seems to me it would be more secure to set the xml encoding to
utf-8
and parse the rtf accordingly.


Log in to comment.

Attached File ( 1 )

Filename Description Download
doubleTest.rtf contains double byte characters and special characters Download

Change ( 1 )

Field Old Value Date By
File Added 139195: doubleTest.rtf 2005-06-21 07:19 stonegump