Re: [Treebase-devel] [eX-purgate bulk]: Re: db2->pg data transfer

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On May 29, 2009, at 11:12 AM, Jon Auman wrote:

> If the SQL inserts all contain UTF8 characters, then there should be
> no problem with the import into a UTF8 postgresql database. If there
> are non-UTF8 characters in the SQL file, they can be stripped out with
> iconv or converted with a shell program called "recode"

Given the history of the legacy TreeBASE data, I believe that the vast  
majority of diacriticals will be properly formed in utf8, but there  
will be some malformed ones (1) dating from when we were entering data  
through a Mac application (Apple8 characters) and (2) as a result of  
people submitting data via web browsers that don't comply with our  
meta tags regarding character codings. I think it's fine to leave  
these malformed ones in (rather than auto-stripping them out) because  
we will want to fix them by hand later on, and they help alert us to  
where things need fixing.

bp