[Refdb-users] character encoding stuff

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 21 Mar 2003, ref...@li... wrote:

> Message: 6
> Date: Fri, 21 Mar 2003 12:30:11 -0500
> From: "Bruce D'Arcus" <bd...@fa...>
> To: ref...@li...
> Subject: [Refdb-users] character encoding stuff
>
> More setting up issues:
>
> I have a variety of characters that -- for whatever reason -- are not
> making it through the translation from Endnote to RIS.  The most common
> problem is that curly single and double quotes get replaced by ?.  So I
> have notes with ?quotes like this?.  I also have words like "don?t."
>
> Beyond figuring out how to fix this in a huge file (am not sure the
> regular expression code to use to find this in jEdit, because the ?
> character has a special meaning), will I need to worry about this in
> the future?  Ideally I'd like a clean database where I can move the
> data in and out without worrying about these encoding issues.  I
> understand MySQL supports Latin-1 encoding by default, so I assume
> there's no problem there.  Is that right?
>

In case you do not already know, be aware that microsoft software
(Word etc.) frequently uses non-latin1 characters like the
(in-)famous "smart quote", which infest web pages for instance.

Extract from: <http://www.fourmilab.ch/webtools/demoroniser/>

  You see, "state of the art" Microsoft Office applications sport a
  nifty feature called "smart quotes." (Rule of thumb--every time
  Microsoft use the word "smart," be on the lookout for something
  dumb). This feature is on by default in both Word and PowerPoint, and
  can be disabled only by finding the little box buried among the dozens
  of bewildering option panels these products contain. If enabled, and
  you type the string,

      "Halt," he cried, "this is the police!"

  "smart quotes" transforms the ASCII quote characters automatically
  into the incompatible Microsoft opening and closing quotes.