Multivalent / Discussion / Help: Phonetic characters

Chris von See - 2005-05-04

Hi -

I'm trying to extract the text from a document which contains phonetic spellings of words. The phonetics display fine in the Multivalent browser, but when I extract the text I get garbage. Can you tell me how to correctly extract these characters? I'd even be happy to have them encoded as numeric character entities...

Thanks
Chris

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tom Phelps - 2005-05-04
  
  It may be the case that text is extracted correctly as Unicode but that the receiving application does not correctly interpret Unicode or have the fonts to display it. Or it may be the case that the text in the PDF uses a special font that happens to display ordinary text characters to appear this way. Without looking at the PDF I can't say.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris von See - 2005-05-04
  
  The viewing application is Multivalent. The one doing the extracting is Multivalent's ExtractText tool.
  
  If you give me an FTP site I'll send you the PDF...
  
  Thanks
  Chris
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tom Phelps - 2005-05-05
  
  You can email it to an address I only look at when I know there is something to retrieve: usenetposter atsign comcast period net, and I'll take a look.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tom Phelps - 2005-05-05
  
  I looked at the PDF you emailed. The font with the phoenetic glyphs is named Unispell. With such a name you would expect Unicode compatibility, and furthermore the PDF has an explicit ToUnicode mapping for it. Inexplicably, however, the Unicode of the phoenetic glyphs is given as various Latin 1 uppercase letters. The PDF's producer, "Creo Normalizer JTP", must have a bug.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris von See - 2005-05-05
  
  Hmmm. If there is indeed a bug in the FTP producer software, how would the Multivalent viewer generate the correct characters? I'm thoroughly confused.
  
  Thanks
  Chris
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tom Phelps - 2005-05-05
  
  All of the letters are just numbers. When mapped into the embedded font to get shapes, it corresponds to a shape that looks like a phonetic character. When mapped into Unicode or ASCII, it corresponds to an uppercase Latin character.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Phonetic characters

Forums

Help

Phonetic characters document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Phonetic characters