While Unicode chars beyond OT1 (e.g., accented characters) are properly treated in the (Linux) current version of openccg, they are discarded in tex/dvi output - what you get, for instance, if you parse sentences after activating :vison in openccg.
However, a very simple patch allows users to solve this problems: just add below line 137 in Visualizer.java (in src/opennlp/ccg/util) the following line of code:
(it's to be inserted just below another \usepackage directive).
Recompile - by calling ccg-build in the main openccg folder - and anything works like a charm. Of course, developers might choice to add this patch in the main openccg distribution if they deem it useful and harmless.
Hope this helps,
A small update: to include chars in extended Unicode sets (e.g., Latin Extended Additional), utf8 in previous message should be changed to utf8x
A possible drawback of this patch is that the rendering engine hangs up if characters in some non-Latin scrit (e.g., Arabic) are included. This is a problem for myself, since I am especially interested in Arabic, so :vison should be disabled if a non-Latin script is included, but can be enabled if a transcription with non-Latin1 character is needed. Perhaps the older version should be left in the standard distribution, since it limits itself not to print the unknown character, instead than hanging.
Thanks for this suggestion! It seems we should wait on changing the main code base until a way can be found to avoid the hanging issue with non-Latin scripts.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.