From: Doug H. <djh...@te...> - 2001-05-28 05:49:22
|
Here are some patches which I applied to gramps 0.1.5 in order to get it to start working using python 2.0. I have not tested it extensively - just started adding people. In a day or two, I hope to try importing my main GEDCOM file. 1. exporting a gedcom with the living filter enabled fails - missing getYear() method for Date class. 2. python 2.0 supports unicode strings in a very transparent fashion. The PyGTK binding does not, perhaps cannot at this time, translate to something gtk can display. There appears to be one or more projects underway to provide internationalization for gnome and gtk, but I couldn't get a clear idea of how it work. So, my solution has been to patch the code where unicode strings get sent to the gtk routines. This has been a trial and error process, and is by no means complete. (A quick grep suggests that there are many more possible locations for these patches.) The general form of my patch is to translate the unicode 16-bit string to a latin-1 encoded 8-bit string just before display. This will fail when an imported GEDCOM contains non-latin-1 unicode characters, e.g. cyrillic or greek, etc. Without this patch, gramps simply will not work on my system. Here is one example: $ diff ListColors.py.ORIG ListColors.py 49c49,50 < self.clist.append(list) --- > my_list = map(lambda x: x.encode('latin-1'), list) > self.clist.append(my_list) 3. Force RelLib objects to be accessed through their get/set functions. This shows a couple of places where encapsulation was bypassed. My initial idea for solving the unicode problem involved fixing the data as it was loaded, or perhaps as it was retrieved from the objects. I don't really like that idea as it throws away information. From a philosophical point of view, I would rather keep the data in its proper form but display it incorrectly. I have not included the diff's for this change as it is large, and basically just the result of applying the vi command ":%s/self\./self._/g" and cleaning up the compare function with "s/other\./other._/g" plus I added setBookmark() and getBookmark() functions to the RelDataBase class. 4. Some of the plugin modules reference a function called '_' which is not defined. I defined a simple one: "def _(x): return x" which seems to do the trick. I found this in WriteGedcom.py and HtmlDoc.py, but there may be more locations. |
From: Don A. <dal...@us...> - 2001-05-28 16:43:28
|
I forgot to copy everyone on my original reply to Doug. Doug provided several good patches that will be incorporated. The unicode/latin-1 problem concerns me, so here is my extended response on the character encoding problem with python 2.0. --------------- Doug, I'd thought I'd give you a better explaination about what I think is wrong with the unicode-latin1 issue. While your patches work, I think they are treating the symptom, and not the source of the problem. All internal strings in gramps should be latin-1 encoded. The fact that you have to translate them to latin indicates that there is a problem somewhere that is allowing non-latin-1 characters to get into the data. There are three sources of input at this time - entry into the interface, gramps input file, and GEDCOM. The entry into the interface should always return latin-1, since gnome does not currently handle unicode. I think we can probably eliminate this one. The gramps input file under python 2.0 uses an encoded input file, that is supposed to translate from unicode to latin-1 as it data is read in. In ReadXML.py you should see around line 71 the following line: xml_file = EncodedFile(gzip.open(filename,"rb"),'utf-8','latin-1') It is possible that this is not doing that I think it should. The third possiblity is in the GEDCOM import. My bet is on this one. It looks as if under 2.0 I am not decoding unicode properly. My guess is that changing lines 32-36 of latin_utf8.py to: def utf8_to_latin(s): return s.encode('latin-1') def latin_to_utf8(s): return s.encode('utf-8') might to the trick. I think this patch probably needs to be made. A couple of questions: 1. Did you originally import your data from a GEDCOM file? 2. Was the GEDCOM file encoded as ASCII, ANSEL, UNICODE, or UTF-8? (check the CHAR line towards the top of the file) -- Don ________________________________________________________________________ Don Allingham don...@ho... |