From: Nick H. <nic...@ho...> - 2010-04-09 22:08:18
|
Peter, I had a look at your patches. In clidbman, you convert paths back to strings when they are stored in self.current_names - this means we have a mix of string and unicode types for paths. Would it be better to keep them as unicode until inserted into the TreeStore? You also don't need the string conversions when using encode because it returns a string. I also noticed that the database path comes from a configuration setting. I looked to see why this was not stored in unicode. I followed the code through from config.py to const.py and found the following: # Tried also coversion of HOME_DIR, but that caused a lot of problems # in Windows. Leave it unconverted for now. #HOME_DIR = unicode(HOME_DIR, sys.getfilesystemencoding()) This is interesting because if you uncomment the line, then the conversions are not necessary in clidbman. Perhaps the problems in Windows were the ones that you are now fixing? In Utils, I can't see why both Windows and Linux shouldn't use the same code in get_unicode_path. You were quite right to fix this, but is it correct not to use the file system encoding for Linux? I also looked at some other functions in Utils: fix_encoding seems to have a similar use to get_unicode_path and is only used in one file. force_unicode appears to be unused. Can we remove it? find_folder and find_file try a number of encodings. I don't understand why they can't just use the file system encoding. There are also 3 functions that are used for media only: relative_path, media_path and media_path_full. It would be nice to tidy this up, but it might turn into a big job. Regards, Nick. Peter Landgren wrote: > Nick, > > I see the logic in this. > > I have uploaded four patches to > http://www.gramps-project.org/bugs/view.php?id=3800 > What I did: > I arranged so that pathnames are of type 'str' when the are put into a TreeStore > and converted back to unicode when they are fetched from the TreeStore. > I can now create, rename and delete in both Win and Linux with and without non ascii characters in > path, but I have a problem with archiving in Linux. Archiving does not work in Windows, not for me > with RCS installed. > > /Peter > > > >> Peter, >> >> The problem is that the ListStore and TreeStore store strings, not >> unicode. The C representation is a char array. >> >> When adding unicode to a string column it is converted using utf-8 >> encoding. When you read the column it is not converted back to unicode. >> >> In my opinion we should always store filenames and paths as unicode >> within Gramps. This would involve converting to unicode as soon as >> possible when reading in paths and converting back to a string with the >> file system encoding only when needed (for example when opening a >> file). When reading from a ListStore or TreeStore we would also have to >> convert back to unicode. >> >> There is a useful howto: >> http://docs.python.org/release/3.0.1/howto/unicode.html >> >> The tips section, >> http://docs.python.org/release/3.0.1/howto/unicode.html#tips-for-writing-un >> icode-aware-programs >> >> says: "The most important tip is: >> >> Software should only work with Unicode strings internally, >> converting to a particular encoding on output." >> >> Another good reference is: http://www.python.org/dev/peps/pep-0100/ >> >> The sections "Unicode output" and "Coercion" are interesting and explain >> the behaviour of the str(), repr() and the print statement. >> >> For your error message problem earlier, perhaps it would be better to >> convert to unicode before the error dialog is called rather than in the >> dialog. >> >> >> Regards, >> >> >> Nick. >> >> Peter Landgren wrote: >> >>> Think I have found a solution: >>> >>> Convert from unicode to str before putting path name into the >>> gtkTreeStore >>> >>> Can create, import and delete with path containing Pär now. >>> >>> /Peter >>> >>> >>>> Nick, >>>> Thanks for the hint where the table is populated. >>>> >>>> Create a new family tree without problem. >>>> >>>> Try to load that family tree: >>>> prints in clidbman.py function "_populate_cli": >>>> name= 'Sl\xc3\xa4kttr\xc3\xa4d 1' >>>> dbdir= u'C:\\Documents and Settings\\P\xe4r\\Application >>>> Data\\gramps\\grampsdb' (Correct coding of "ä" for Windows to understand >>>> the path.) >>>> >>>> When I press the load button: >>>> prints in dbman.py function "run", where information in the populated >>>> "store" is fetched: current_names: >>>> store.get_value(node, NAME_COL): >>>> 'Sl\xc3\xa4kttr\xc3\xa4d 1' >>>> store.get_value(node, PATH_COL): >>>> 'C:\\Documents and Settings\\P\xc3\xa4r\\Application >>>> Data\\gramps\\grampsdb\\4bbee5ff' >>>> >>>> Note now that string type has changed from unicode to str and also the >>>> coding has changed. Think this is done somewhere in the handling of >>>> "store" by the command similar to >>>> "xxxx.decode(sys.getfilesystemencoding())" Example: >>>> I have >>>> dbdir= u'C:\\Documents and Settings\\P\xe4r\\Application >>>> Data\\gramps\\grampsdb' and do >>>> dbdir1 = dbdir.decode(sys.getfilesystemencoding()) >>>> I get >>>> dbdir1= u'C:\\Documents and Settings\\P\xc3\xa4r\\Application >>>> Data\\gramps\\grampsdb' Here the coding has changed but NOT the string >>>> type. >>>> >>>> If I do >>>> print repr(store.get_value(node, PATH_COL)) >>>> print repr(unicode(store.get_value(node, PATH_COL), >>>> sys.getfilesystemencoding())) I get >>>> 'C:\\Documents and Settings\\P\xc3\xa4r\\Application >>>> Data\\gramps\\grampsdb\\4bbee5ff' u'C:\\Documents and >>>> Settings\\P\xc3\xa4r\\Application Data\\gramps\\grampsdb\\4bbee5ff' >>>> >>>> I have a memory of 'gtk.TreeStore' having problems with unicode in >>>> another part of Gramps. >>>> >>>> /Peter >>>> >>>> >>>>> Peter Landgren wrote: >>>>> >>>>>> Need some assistance. >>>>>> >>>>>> 1. Allow user name in Windows to have non ascii characters >>>>>> I have a four patches for that . There is one little problem though, >>>>>> which I don't understand. The patches do not cause any problem in >>>>>> Linux. >>>>>> >>>>>> Create a family tree under the user "Pär". This is "P\xe4r" for >>>>>> correct path in Windows. >>>>>> >>>>>> First time "Load Family Tree" button is pressed: >>>>>> "Nothing" happens, I can't do anything with the family tree. >>>>>> I have a print statement in "dbman.py" function "run" where I print >>>>>> store.get_value(node,PATH_COL): 'C:\\Documents and >>>>>> Settings\\P\xc3\xa4r\\Application Data\\gramps\\grampsdb\\4bbdb502' >>>>>> ^^^^^^ wrong coding for Win to understand path name. This is the utf-8 >>>>>> sequence. >>>>>> Where is the value set for "PATH_COL"? >>>>>> (Can't find it as my knowledge of gui programming is limited.) >>>>>> >>>>> PATH_COL is a constant defined at the top of the dbman.py file. >>>>> >>>>> The more interesting parts of the code are the methods in clidbman.py >>>>> which populate the current_names variable. This is used to populate >>>>> the liststore which holds the data to be displayed. >>>>> >>>>> I would suggest converting the paths and filenames to unicode as soon >>>>> as possible in _populate_cli and create_new_db_cli. >>>>> >>>>> >>>>>> However, the second time "Load Family Tree" button is pressed: >>>>>> 'C:\\Documents and Settings\\P\xe4r\\Application >>>>>> Data\\gramps\\grampsdb\\4bbdb502' ^^^^ correct coding for Win to >>>>>> understand path name >>>>>> >>>>>> And now I can do anything with the family tree. >>>>>> >>>>>> This difference in coding leads later to: >>>>>> C:\Program\gramps320\RecentFiles.py:106: UnicodeWarning: Unicode equal >>>>>> comparison failed to convert both arguments to Unicode - interpreting >>>>>> them as being unequal >>>>>> if item.get_path() == item2add.get_path(): >>>>>> >>>>>> where >>>>>> item.get_path() = u'C:\\Documents and >>>>>> Settings\\P\xe4r\\Application Data\\gramps\\grampsdb\\4bbdb502' >>>>>> and >>>>>> item2add.get_path() = 'C:\\Documents and >>>>>> Settings\\P\xe4r\\Application Data\\gramps\\grampsdb\\4bbdb502' >>>>>> >>>>>> But they ARE equal!!! >>>>>> >>>>> One is unicode and the other is a string. The comparison will attempt >>>>> to coerce the string to unicode using the default encoding which is >>>>> ascii. This will fail. >>>>> >>>>> For the conversion to work you will need to specify the encoding. >>>>> >>>>> s = 'P\xe4r' >>>>> print unicode(s) # Fails >>>>> print s.decode('iso_8859_1') # OK >>>>> >>>>> >>>>>> 2. Allow user name and database directory in Windows to have non ascii >>>>>> characters If I add a directory "Åke" and change Gramps to look there >>>>>> for the database, I get other coding problems. Like: >>>>>> 12343: ERROR: gramps.py: line 138: Unhandled exception >>>>>> Traceback (most recent call last): >>>>>> File "C:\Program\gramps320\gui\dbman.py", line 693, in >>>>>> __new_db self._create_new_db() >>>>>> File "C:\Program\gramps320\gui\dbman.py", line 703, in >>>>>> _create_new_db new_path, title = self.create_new_db_cli(title) >>>>>> File "C:\Program\gramps320\cli\clidbman.py", line 228, in >>>>>> create_new_db_cli new_path = unicode(new_path, >>>>>> sys.getfilesystemencoding()) TypeError: decoding Unicode is not >>>>>> supported >>>>>> >>>>>> I have not worked on this yet. >>>>>> /Peter >>>>>> >>>>> Try: >>>>> >>>>> new_path = new_path.decode(sys.getfilesystemencoding()) >>>>> >>>>> Perhaps find_next_db_dir should return unicode? >>>>> >>>>> >>>>> >>>>> Nick. >>>>> >>>>> >>>>>>> 2010/4/5 Peter Landgren <pet...@te...> >>>>>>> >>>>>>> >>>>>>>> My question is simple: >>>>>>>> Shall I fix when Gramps is running in Windows only. Its not a >>>>>>>> problem in Linux >>>>>>>> >>>>>>> I think the code works on both, so no need to make a distinction for >>>>>>> windows. >>>>>>> So after test, commit. >>>>>>> >>>>>>> About the problem of repair greyed out: if a user could never close >>>>>>> the family tree, there is also no saved backup, so there is nothing >>>>>>> to do a repair ==> repair not active. >>>>>>> >>>>>>> Combine this with the problem I posted about a while back that repair >>>>>>> is a very dangerous thing. Best would be to update the error message >>>>>>> to link to the http page on our wiki on how to proceed on database >>>>>>> error. A user should really read that before clicking repair. >>>>>>> >>>>>>> Benny >>>>>>> >>>>>>> >>>>>>> /Peter >>>>>>> >>>>>>> >>>>>>>>> I have been working a little on: >>>>>>>>> http://www.gramps-project.org/bugs/view.php?id=3800 >>>>>>>>> >>>>>>>>> and hopefully found the cause. When there is a database error and >>>>>>>>> the >>>>>>>>> >>>>>>>> path >>>>>>>> >>>>>>>> >>>>>>>>> to the Gramps database directory contains a non ascii character >>>>>>>>> this type of error will pop up. With this code: >>>>>>>>> class DBErrorDialog(ErrorDialog): >>>>>>>>> def __init__(self, msg, parent=None): >>>>>>>>> print msg # Added my me for this test >>>>>>>>> ErrorDialog.__init__( >>>>>>>>> self, >>>>>>>>> _("Low level database corruption detected"), >>>>>>>>> _("Gramps has detected a problem in the underlying " >>>>>>>>> "Berkeley database. This can be repaired by from " >>>>>>>>> "the Family Tree Manager. Select the database and " >>>>>>>>> 'click on the Repair button') + '\n\n' + str(msg), >>>>>>>>> parent) >>>>>>>>> >>>>>>>>> No "ErrorDialog" pops up and I get this error message when I force >>>>>>>>> DB error, after I deleted all log files: >>>>>>>>> >>>>>>>>> C:\Program\gramps320>python.exe -O gramps.py >>>>>>>>> Invalid argument -- C:\Documents and Settings\Pär\Application >>>>>>>>> Data\gramps\grampsdb\4b6eaf19\meta_data.db: unexpected file type or >>>>>>>>> >>>>>>>> format >>>>>>>> >>>>>>>> >>>>>>>>> 22984: ERROR: grampsgui.py: line 356: Gramps failed to start. >>>>>>>>> .... >>>>>>>>> File "C:\Program\gramps320\QuestionDialog.py", line 215, in >>>>>>>>> __init__ 'click on the Repair button') + '\n\n' + str(msg), parent) >>>>>>>>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position >>>>>>>>> 47-49: invalid data >>>>>>>>> >>>>>>>>> As can be seen pos 48 is "ä" in "Pär". >>>>>>>>> This is the same as in the original bug report. >>>>>>>>> >>>>>>>>> With this code: >>>>>>>>> class DBErrorDialog(ErrorDialog): >>>>>>>>> def __init__(self, msg, parent=None): >>>>>>>>> msg = unicode(msg.decode(sys.getfilesystemencoding())) >>>>>>>>> ErrorDialog.__init__( >>>>>>>>> self, >>>>>>>>> _("Low level database corruption detected"), >>>>>>>>> _("Gramps has detected a problem in the underlying " >>>>>>>>> "Berkeley database. This can be repaired by from " >>>>>>>>> "the Family Tree Manager. Select the database and " >>>>>>>>> 'click on the Repair button') + '\n\n' + msg, parent) >>>>>>>>> >>>>>>>>> The correct error message is now shown in the "ErrorDialog". >>>>>>>>> >>>>>>>>> But the repair button is grayed out. >>>>>>>>> >>>>>>>>> This type of error will pop up for DB errors when there is a non >>>>>>>>> ascii character in the msg string. This happens only if the msg >>>>>>>>> string contains >>>>>>>>> >>>>>>>> a >>>>>>>> >>>>>>>> >>>>>>>>> path name. >>>>>>>>> >>>>>>>>> How shall I proceed? >>>>>>>>> /Peter >>>>>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> - -- ----- Download Intel® Parallel Studio Eval >>>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>>> proactively, and fine-tune applications for parallel performance. >>>>>> See why Intel Parallel Studio got high marks during beta. >>>>>> http://p.sf.net/sfu/intel-sw-dev >>>>>> _______________________________________________ >>>>>> Gramps-devel mailing list >>>>>> Gra...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >>>>>> > > |