We are working on an interface between a legacy system and DSpace 1.5.1 and I keep running into problems with special characters in the text. NASA research documents have lots of different special characters in them – some of them are common ones such as the degree symbol - ° and some of them are more uncommon ones such as “right ceiling” - ⌉ (see http://myhandbook.info/codes_htmlchr.html for a pretty good list of symbols and their equivalent “character references”). The interface is fairly new and so far we’ve just been adding code to the extract program that outputs an xml file, to replace the special character or symbol with the equivalent “character reference” as we identify them. Inevitably though, the program is going to abend when it finds a symbol we haven’t coded for and we’re going to have to keep changing it to replace new symbols.
I did some Googling today, trying to find an already-existing JAVA method or class that replaces symbols with the equivalent character reference, hoping that I don’t have to write one myself, but so far have not found one. Does anyone know of one?
Thanks in advance,
Software Developer/Database Administrator
NASA Langley Research Center|LITES Contract
SGT, Inc.|130 Research Drive
Hampton, Va. 23666
Office: (757) 224-4074
Mobile: (757) 506-9903
Fax: (757) 224-4001