I forgot to copy everyone on my original reply to Doug. Doug provided several good patches that will be incorporated.  The unicode/latin-1 problem concerns me, so here is my extended response on the character encoding problem with python 2.0.  

---------------

Doug,

I'd thought I'd give you a better explaination about what I think is wrong with the unicode-latin1 issue.  While your patches work, I think they are treating the symptom, and not the source of the problem.

All internal strings in gramps should be latin-1 encoded.  The fact that you have to translate them to latin indicates that there is a problem somewhere that is allowing non-latin-1 characters to get into the data.  There are three sources of input at this time - entry into the interface, gramps input file, and GEDCOM.

The entry into the interface should always return latin-1, since gnome does not currently handle unicode.  I think we can probably eliminate this one.  The gramps input file under python 2.0 uses an encoded input file, that is supposed to translate from unicode to latin-1 as it data is read in. In ReadXML.py you should see around line 71 the following line:

       xml_file = EncodedFile(gzip.open(filename,"rb"),'utf-8','latin-1')

It is possible that this is not doing that I think it should.

The third possiblity is in the GEDCOM import.  My bet is on this one.  It looks as if under 2.0 I am not decoding unicode properly.  My guess is that changing lines 32-36 of latin_utf8.py to:

   def utf8_to_latin(s):
       return s.encode('latin-1')

   def latin_to_utf8(s):
       return s.encode('utf-8')

might to the trick.  I think this patch probably needs to be made.

A couple of questions:

1. Did you originally import your data from a GEDCOM file?
2. Was the GEDCOM file encoded as ASCII, ANSEL, UNICODE, or UTF-8? (check the CHAR line towards the top of the file)

-- Don






Don Allingham
donaldallingham@home.com