From: Doug B. <dou...@gm...> - 2009-09-26 15:45:08
|
Gerald (et al), I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: with self.db.get_person_cursor(update=True, commit=True) as cursor: for handle, data in cursor: person = Person(data) change = False for name in [person.get_primary_name()] + person.get_alternate_names(): sname = name.get_surname() if sname in changelist: change = True sname = self.name_cap(sname) name.set_surname(sname) if change: cursor.update(handle, person.serialize()) and it looks like the cursor is skipping around the order as the names change, and missing some of the people, and thus names. To test this out, you could download the following zipped GEDCOM of the Tudor royal family of England, and try to fix the capital letters by running Tools -> Database Processing -> Fix Capitalization of Family Names. There are 9882 names in the file, FYI. When I go through the above loop, I only hit 6758 people the first time, 7367 the second time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes to go through all 9882. Do you think what I describe is the problem? Is there a way to get a cursor that won't change order as you update? Thanks for any insight, -Doug http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip |
From: Gerald B. <ger...@gm...> - 2009-09-26 16:41:14
|
Not sure I know what you mean. Accessing by cursor means accessing in key (that is handle) order, not in surname order. That means that the cursor shouldn't skip over records since the name is just a piece of data in the record and not related to the order in which records are processed. So the order (that is, handle order) won't change as updates are made. There must be something else going on. On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: > Gerald (et al), > > I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: > > with self.db.get_person_cursor(update=True, commit=True) as cursor: > for handle, data in cursor: > person = Person(data) > change = False > for name in [person.get_primary_name()] + > person.get_alternate_names(): > sname = name.get_surname() > if sname in changelist: > change = True > sname = self.name_cap(sname) > name.set_surname(sname) > if change: > cursor.update(handle, person.serialize()) > > and it looks like the cursor is skipping around the order as the names > change, and missing some of the people, and thus names. > > To test this out, you could download the following zipped GEDCOM of > the Tudor royal family of England, and try to fix the capital letters > by running Tools -> Database Processing -> Fix Capitalization of > Family Names. There are 9882 names in the file, FYI. When I go through > the above loop, I only hit 6758 people the first time, 7367 the second > time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes > to go through all 9882. > > Do you think what I describe is the problem? Is there a way to get a > cursor that won't change order as you update? > > Thanks for any insight, > > -Doug > > http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > -- Gerald Britton |
From: Doug B. <dou...@gm...> - 2009-09-26 17:34:16
|
On Sat, Sep 26, 2009 at 12:40 PM, Gerald Britton <ger...@gm...> wrote: > Not sure I know what you mean. Accessing by cursor means accessing in > key (that is handle) order, not in surname order. That means that the > cursor shouldn't skip over records since the name is just a piece of > data in the record and not related to the order in which records are > processed. So the order (that is, handle order) won't change as > updates are made. There must be something else going on. I'm not sure what order they are in, but updating the cursor appears to be making it so that not all of the people are iterated through. I'll dig a little more... -Doug > On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >> Gerald (et al), >> >> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >> >> with self.db.get_person_cursor(update=True, commit=True) as cursor: >> for handle, data in cursor: >> person = Person(data) >> change = False >> for name in [person.get_primary_name()] + >> person.get_alternate_names(): >> sname = name.get_surname() >> if sname in changelist: >> change = True >> sname = self.name_cap(sname) >> name.set_surname(sname) >> if change: >> cursor.update(handle, person.serialize()) >> >> and it looks like the cursor is skipping around the order as the names >> change, and missing some of the people, and thus names. >> >> To test this out, you could download the following zipped GEDCOM of >> the Tudor royal family of England, and try to fix the capital letters >> by running Tools -> Database Processing -> Fix Capitalization of >> Family Names. There are 9882 names in the file, FYI. When I go through >> the above loop, I only hit 6758 people the first time, 7367 the second >> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >> to go through all 9882. >> >> Do you think what I describe is the problem? Is there a way to get a >> cursor that won't change order as you update? >> >> Thanks for any insight, >> >> -Doug >> >> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >> > > > > -- > Gerald Britton > |
From: Doug B. <dou...@gm...> - 2009-09-26 19:47:57
|
On Sat, Sep 26, 2009 at 1:27 PM, Doug Blank <dou...@gm...> wrote: > On Sat, Sep 26, 2009 at 12:40 PM, Gerald Britton > <ger...@gm...> wrote: >> Not sure I know what you mean. Accessing by cursor means accessing in >> key (that is handle) order, not in surname order. That means that the >> cursor shouldn't skip over records since the name is just a piece of >> data in the record and not related to the order in which records are >> processed. So the order (that is, handle order) won't change as >> updates are made. There must be something else going on. > > I'm not sure what order they are in, but updating the cursor appears > to be making it so that not all of the people are iterated through. > I'll dig a little more... > Looks like the update... just tried this variation: >>> count = 0 >>> with db.get_person_cursor(update=True, commit=True) as cursor: for handle, data in cursor: count += 1 person = Person(data) name = person.get_primary_name() name.set_surname(name.get_surname().upper()) cursor.update(handle, person.serialize()) >>> count 8731 Should have been 9882. The only thing I'm doing is changing the name. I'm using: Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] BSDDB version: 4.7.3 Gramps version: 3.2.0-0.SVN13094M LANG: C OS: Linux Distribution: 2.6.29.6-217.2.3.fc11.i586 -Doug > -Doug > >> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>> Gerald (et al), >>> >>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>> >>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>> for handle, data in cursor: >>> person = Person(data) >>> change = False >>> for name in [person.get_primary_name()] + >>> person.get_alternate_names(): >>> sname = name.get_surname() >>> if sname in changelist: >>> change = True >>> sname = self.name_cap(sname) >>> name.set_surname(sname) >>> if change: >>> cursor.update(handle, person.serialize()) >>> >>> and it looks like the cursor is skipping around the order as the names >>> change, and missing some of the people, and thus names. >>> >>> To test this out, you could download the following zipped GEDCOM of >>> the Tudor royal family of England, and try to fix the capital letters >>> by running Tools -> Database Processing -> Fix Capitalization of >>> Family Names. There are 9882 names in the file, FYI. When I go through >>> the above loop, I only hit 6758 people the first time, 7367 the second >>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>> to go through all 9882. >>> >>> Do you think what I describe is the problem? Is there a way to get a >>> cursor that won't change order as you update? >>> >>> Thanks for any insight, >>> >>> -Doug >>> >>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>> >> >> >> >> -- >> Gerald Britton >> > |
From: Gerald B. <ger...@gm...> - 2009-09-26 22:23:06
|
Now that's just weird. The 'update' method just calls DBCursor.put, which is a standard bsddb method. Nuttin' fancy. Is the behaviour consistent (same counts every time)? If not, it will surely be harder to pin down. Still, I don't believe that the implementation is faulty. There's just not that many variables. You can read the doc here: http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/dbcput.html On Sat, Sep 26, 2009 at 3:47 PM, Doug Blank <dou...@gm...> wrote: > On Sat, Sep 26, 2009 at 1:27 PM, Doug Blank <dou...@gm...> wrote: >> On Sat, Sep 26, 2009 at 12:40 PM, Gerald Britton >> <ger...@gm...> wrote: >>> Not sure I know what you mean. Accessing by cursor means accessing in >>> key (that is handle) order, not in surname order. That means that the >>> cursor shouldn't skip over records since the name is just a piece of >>> data in the record and not related to the order in which records are >>> processed. So the order (that is, handle order) won't change as >>> updates are made. There must be something else going on. >> >> I'm not sure what order they are in, but updating the cursor appears >> to be making it so that not all of the people are iterated through. >> I'll dig a little more... >> > > Looks like the update... just tried this variation: > >>>> count = 0 >>>> with db.get_person_cursor(update=True, commit=True) as cursor: > for handle, data in cursor: > count += 1 > person = Person(data) > name = person.get_primary_name() > name.set_surname(name.get_surname().upper()) > cursor.update(handle, person.serialize()) > >>>> count > 8731 > > Should have been 9882. The only thing I'm doing is changing the name. I'm using: > > Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) [GCC 4.4.0 > 20090506 (Red Hat 4.4.0-4)] > BSDDB version: 4.7.3 > Gramps version: 3.2.0-0.SVN13094M > LANG: C > OS: Linux > Distribution: 2.6.29.6-217.2.3.fc11.i586 > > -Doug > > >> -Doug >> >>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>> Gerald (et al), >>>> >>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>> >>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>> for handle, data in cursor: >>>> person = Person(data) >>>> change = False >>>> for name in [person.get_primary_name()] + >>>> person.get_alternate_names(): >>>> sname = name.get_surname() >>>> if sname in changelist: >>>> change = True >>>> sname = self.name_cap(sname) >>>> name.set_surname(sname) >>>> if change: >>>> cursor.update(handle, person.serialize()) >>>> >>>> and it looks like the cursor is skipping around the order as the names >>>> change, and missing some of the people, and thus names. >>>> >>>> To test this out, you could download the following zipped GEDCOM of >>>> the Tudor royal family of England, and try to fix the capital letters >>>> by running Tools -> Database Processing -> Fix Capitalization of >>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>> to go through all 9882. >>>> >>>> Do you think what I describe is the problem? Is there a way to get a >>>> cursor that won't change order as you update? >>>> >>>> Thanks for any insight, >>>> >>>> -Doug >>>> >>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>> >>> >>> >>> >>> -- >>> Gerald Britton >>> >> > -- Gerald Britton |
From: Doug B. <dou...@gm...> - 2009-09-26 22:55:14
|
On Sat, Sep 26, 2009 at 6:22 PM, Gerald Britton <ger...@gm...> wrote: > Now that's just weird. The 'update' method just calls DBCursor.put, > which is a standard bsddb method. Nuttin' fancy. Is the behaviour > consistent (same counts every time)? If not, it will surely be harder > to pin down. Still, I don't believe that the implementation is > faulty. There's just not that many variables. You can read the doc > here: > > http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/dbcput.html > Ok, thanks for the link. I see that it says "If DBcursor->put() succeeds and an item is inserted into the database, the cursor is always positioned to refer to the newly inserted item." I can't tell what order the data is in, but the behavior looks like the position is changing. I'll have to do some more tests to see if it deterministically gives the same results... -Doug > On Sat, Sep 26, 2009 at 3:47 PM, Doug Blank <dou...@gm...> wrote: >> On Sat, Sep 26, 2009 at 1:27 PM, Doug Blank <dou...@gm...> wrote: >>> On Sat, Sep 26, 2009 at 12:40 PM, Gerald Britton >>> <ger...@gm...> wrote: >>>> Not sure I know what you mean. Accessing by cursor means accessing in >>>> key (that is handle) order, not in surname order. That means that the >>>> cursor shouldn't skip over records since the name is just a piece of >>>> data in the record and not related to the order in which records are >>>> processed. So the order (that is, handle order) won't change as >>>> updates are made. There must be something else going on. >>> >>> I'm not sure what order they are in, but updating the cursor appears >>> to be making it so that not all of the people are iterated through. >>> I'll dig a little more... >>> >> >> Looks like the update... just tried this variation: >> >>>>> count = 0 >>>>> with db.get_person_cursor(update=True, commit=True) as cursor: >> for handle, data in cursor: >> count += 1 >> person = Person(data) >> name = person.get_primary_name() >> name.set_surname(name.get_surname().upper()) >> cursor.update(handle, person.serialize()) >> >>>>> count >> 8731 >> >> Should have been 9882. The only thing I'm doing is changing the name. I'm using: >> >> Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) [GCC 4.4.0 >> 20090506 (Red Hat 4.4.0-4)] >> BSDDB version: 4.7.3 >> Gramps version: 3.2.0-0.SVN13094M >> LANG: C >> OS: Linux >> Distribution: 2.6.29.6-217.2.3.fc11.i586 >> >> -Doug >> >> >>> -Doug >>> >>>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>>> Gerald (et al), >>>>> >>>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>>> >>>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>>> for handle, data in cursor: >>>>> person = Person(data) >>>>> change = False >>>>> for name in [person.get_primary_name()] + >>>>> person.get_alternate_names(): >>>>> sname = name.get_surname() >>>>> if sname in changelist: >>>>> change = True >>>>> sname = self.name_cap(sname) >>>>> name.set_surname(sname) >>>>> if change: >>>>> cursor.update(handle, person.serialize()) >>>>> >>>>> and it looks like the cursor is skipping around the order as the names >>>>> change, and missing some of the people, and thus names. >>>>> >>>>> To test this out, you could download the following zipped GEDCOM of >>>>> the Tudor royal family of England, and try to fix the capital letters >>>>> by running Tools -> Database Processing -> Fix Capitalization of >>>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>>> to go through all 9882. >>>>> >>>>> Do you think what I describe is the problem? Is there a way to get a >>>>> cursor that won't change order as you update? >>>>> >>>>> Thanks for any insight, >>>>> >>>>> -Doug >>>>> >>>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>> >>>> >>>> >>>> >>>> -- >>>> Gerald Britton >>>> >>> >> > > > > -- > Gerald Britton > |
From: Doug B. <dou...@gm...> - 2009-09-26 23:42:46
|
On Sat, Sep 26, 2009 at 6:55 PM, Doug Blank <dou...@gm...> wrote: > On Sat, Sep 26, 2009 at 6:22 PM, Gerald Britton > <ger...@gm...> wrote: >> Now that's just weird. The 'update' method just calls DBCursor.put, >> which is a standard bsddb method. Nuttin' fancy. Is the behaviour >> consistent (same counts every time)? If not, it will surely be harder >> to pin down. Still, I don't believe that the implementation is >> faulty. There's just not that many variables. You can read the doc >> here: >> >> http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/dbcput.html >> > > Ok, thanks for the link. I see that it says "If DBcursor->put() > succeeds and an item is inserted into the database, the cursor is > always positioned to refer to the newly inserted item." > > I can't tell what order the data is in, but the behavior looks like > the position is changing. I'll have to do some more tests to see if it > deterministically gives the same results... The for loop over the cursor processes a different number of rows each time I loop over a new database cursor after a fresh import from the GECOM: 1. 7015 2. 6848 3. 6942 This indicates that the cursor is changing in a non-deterministic manner. I suspect that after an update, the cursor moves to a new location sometimes skipping some items in the cursor. Hope that helps! -Doug > -Doug > > >> On Sat, Sep 26, 2009 at 3:47 PM, Doug Blank <dou...@gm...> wrote: >>> On Sat, Sep 26, 2009 at 1:27 PM, Doug Blank <dou...@gm...> wrote: >>>> On Sat, Sep 26, 2009 at 12:40 PM, Gerald Britton >>>> <ger...@gm...> wrote: >>>>> Not sure I know what you mean. Accessing by cursor means accessing in >>>>> key (that is handle) order, not in surname order. That means that the >>>>> cursor shouldn't skip over records since the name is just a piece of >>>>> data in the record and not related to the order in which records are >>>>> processed. So the order (that is, handle order) won't change as >>>>> updates are made. There must be something else going on. >>>> >>>> I'm not sure what order they are in, but updating the cursor appears >>>> to be making it so that not all of the people are iterated through. >>>> I'll dig a little more... >>>> >>> >>> Looks like the update... just tried this variation: >>> >>>>>> count = 0 >>>>>> with db.get_person_cursor(update=True, commit=True) as cursor: >>> for handle, data in cursor: >>> count += 1 >>> person = Person(data) >>> name = person.get_primary_name() >>> name.set_surname(name.get_surname().upper()) >>> cursor.update(handle, person.serialize()) >>> >>>>>> count >>> 8731 >>> >>> Should have been 9882. The only thing I'm doing is changing the name. I'm using: >>> >>> Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) [GCC 4.4.0 >>> 20090506 (Red Hat 4.4.0-4)] >>> BSDDB version: 4.7.3 >>> Gramps version: 3.2.0-0.SVN13094M >>> LANG: C >>> OS: Linux >>> Distribution: 2.6.29.6-217.2.3.fc11.i586 >>> >>> -Doug >>> >>> >>>> -Doug >>>> >>>>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>>>> Gerald (et al), >>>>>> >>>>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>>>> >>>>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>>>> for handle, data in cursor: >>>>>> person = Person(data) >>>>>> change = False >>>>>> for name in [person.get_primary_name()] + >>>>>> person.get_alternate_names(): >>>>>> sname = name.get_surname() >>>>>> if sname in changelist: >>>>>> change = True >>>>>> sname = self.name_cap(sname) >>>>>> name.set_surname(sname) >>>>>> if change: >>>>>> cursor.update(handle, person.serialize()) >>>>>> >>>>>> and it looks like the cursor is skipping around the order as the names >>>>>> change, and missing some of the people, and thus names. >>>>>> >>>>>> To test this out, you could download the following zipped GEDCOM of >>>>>> the Tudor royal family of England, and try to fix the capital letters >>>>>> by running Tools -> Database Processing -> Fix Capitalization of >>>>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>>>> to go through all 9882. >>>>>> >>>>>> Do you think what I describe is the problem? Is there a way to get a >>>>>> cursor that won't change order as you update? >>>>>> >>>>>> Thanks for any insight, >>>>>> >>>>>> -Doug >>>>>> >>>>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Gerald Britton >>>>> >>>> >>> >> >> >> >> -- >> Gerald Britton >> > |
From: Doug B. <dou...@gm...> - 2009-09-27 02:39:07
|
This may be a bsddb bug, but I've now confirmed the behavior on the following two different computer systems and they don't have much in common: Machine 1: Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] BSDDB version: 4.7.3 Gramps version: 3.2.0-0.SVN13094M LANG: C OS: Linux Distribution: 2.6.29.6-217.2.3.fc11.i586 Machine 2: Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] BSDDB version: 4.4.5.3 Gramps version: 3.2.0-0.SVN13263 LANG: C OS: Linux Distribution: 2.6.27.24-170.2.68.fc10.i686 If someone wants to test this, here is one way (not the fastest): 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip 2. load the BUELL001.GED into an empty database 3. run the Tools -> Database Processing -> Fix Capitalization of Family Names (if you are running a Python prior to 2.6, you'll need "from __future__ import with_statement" at the top of src/plugins/tools/ChangeNames.py) 4. Select all names, and apply changes 5. If there are any names that are still all caps, then it skipped them I'd be very interested to see if this works on Python 2.6.2. We may need a work around in any event. -Doug |
From: Gerald B. <ger...@gm...> - 2009-09-27 12:56:52
|
Those are old bsddb versions (currently at 4.7.3 iirc). I'll try it later today on one of my machines. Remember that trunk requires 2.6 Python On 9/26/09, Doug Blank <dou...@gm...> wrote: > This may be a bsddb bug, but I've now confirmed the behavior on the > following two different computer systems and they don't have much in > common: > > Machine 1: > > Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) > [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] > BSDDB version: 4.7.3 > Gramps version: 3.2.0-0.SVN13094M > LANG: C > OS: Linux > Distribution: 2.6.29.6-217.2.3.fc11.i586 > > Machine 2: > > Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) > [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] > BSDDB version: 4.4.5.3 > Gramps version: 3.2.0-0.SVN13263 > LANG: C > OS: Linux > Distribution: 2.6.27.24-170.2.68.fc10.i686 > > If someone wants to test this, here is one way (not the fastest): > > 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > 2. load the BUELL001.GED into an empty database > 3. run the Tools -> Database Processing -> Fix Capitalization of > Family Names (if you are running a Python prior to 2.6, you'll need > "from __future__ import with_statement" at the top of > src/plugins/tools/ChangeNames.py) > 4. Select all names, and apply changes > 5. If there are any names that are still all caps, then it skipped them > > I'd be very interested to see if this works on Python 2.6.2. We may > need a work around in any event. > > -Doug > -- Sent from my mobile device Gerald Britton |
From: Jérôme <rom...@ya...> - 2009-09-27 16:06:45
|
> To test this out, you could download the following zipped GEDCOM of > the Tudor royal family of England, and try to fix the capital letters > by running Tools -> Database Processing -> Fix Capitalization of > Family Names. There are 9882 names in the file, FYI. When I go through > the above loop, I only hit 6758 people the first time, 7367 the second > time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes > to go through all 9882. same result under Ubuntu 9.04 (python2.6.2-bsddb 4.7.3) - no performance issues except disk space by using python -3 gramps.py : lib/python2.6/site-packages/gramps/gen/db/write.py:1694: DeprecationWarning: comparing unequal types not supported in 3.x old_version = map(int, db.__version__.split(".",2)[:2]) < (4, 7) /usr/lib/python2.6/bsddb/dbshelve.py:237: DeprecationWarning: apply() not supported in 3.x; use func(*args, **kwargs) data = apply(self.db.get, args, kw) lib/python2.6/site-packages/gramps/BasicUtils/_NameDisplay.py:352: DeprecationWarning: the cmp argument is not supported in 3.x d_keys.sort(_make_cmp) # reverse sort by ikeyword lib/python2.6/site-packages/gramps/BasicUtils/_NameDisplay.py:368: DeprecationWarning: the cmp argument is not supported in 3.x d_keys.sort(_make_cmp) # reverse sort by keyword lib/python2.6/site-packages/gramps/plugins/tool/ChangeNames.py:183: RuntimeWarning: missing handler 'on_delete_event' "on_help_clicked" : self.on_help_clicked, Jérôme Gerald Britton a écrit : > Those are old bsddb versions (currently at 4.7.3 iirc). I'll try it > later today on one of my machines. Remember that trunk requires 2.6 > Python > > > > On 9/26/09, Doug Blank <dou...@gm...> wrote: >> This may be a bsddb bug, but I've now confirmed the behavior on the >> following two different computer systems and they don't have much in >> common: >> >> Machine 1: >> >> Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) >> [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] >> BSDDB version: 4.7.3 >> Gramps version: 3.2.0-0.SVN13094M >> LANG: C >> OS: Linux >> Distribution: 2.6.29.6-217.2.3.fc11.i586 >> >> Machine 2: >> >> Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) >> [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] >> BSDDB version: 4.4.5.3 >> Gramps version: 3.2.0-0.SVN13263 >> LANG: C >> OS: Linux >> Distribution: 2.6.27.24-170.2.68.fc10.i686 >> >> If someone wants to test this, here is one way (not the fastest): >> >> 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >> 2. load the BUELL001.GED into an empty database >> 3. run the Tools -> Database Processing -> Fix Capitalization of >> Family Names (if you are running a Python prior to 2.6, you'll need >> "from __future__ import with_statement" at the top of >> src/plugins/tools/ChangeNames.py) >> 4. Select all names, and apply changes >> 5. If there are any names that are still all caps, then it skipped them >> >> I'd be very interested to see if this works on Python 2.6.2. We may >> need a work around in any event. >> >> -Doug >> > |
From: Peter L. <pet...@te...> - 2009-09-27 13:03:58
|
Hi, I have a windows box with Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] and a gramps trunk version 3.2.0-0.SVN12762M I also have 2.5.1 on that box. I created a empty database and imported it with: 2.5.1 took about 3 minutes 2.6.2 took about one hour. Tried to run "Tools -> Database Processing -> Fix Capitalization of Family Names" After 50 minutes I had to kill it with 2.6.2. I killed the 2.5.1 version after 20 minutes. When I tested in my Liuix box with 2.6.1 and svn13257 I had to run the tool 10 times before all surnames were converted. /Peter > This may be a bsddb bug, but I've now confirmed the behavior on the > following two different computer systems and they don't have much in > common: > > Machine 1: > > Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) > [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] > BSDDB version: 4.7.3 > Gramps version: 3.2.0-0.SVN13094M > LANG: C > OS: Linux > Distribution: 2.6.29.6-217.2.3.fc11.i586 > > Machine 2: > > Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) > [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] > BSDDB version: 4.4.5.3 > Gramps version: 3.2.0-0.SVN13263 > LANG: C > OS: Linux > Distribution: 2.6.27.24-170.2.68.fc10.i686 > > If someone wants to test this, here is one way (not the fastest): > > 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > 2. load the BUELL001.GED into an empty database > 3. run the Tools -> Database Processing -> Fix Capitalization of > Family Names (if you are running a Python prior to 2.6, you'll need > "from __future__ import with_statement" at the top of > src/plugins/tools/ChangeNames.py) > 4. Select all names, and apply changes > 5. If there are any names that are still all caps, then it skipped them > > I'd be very interested to see if this works on Python 2.6.2. We may > need a work around in any event. > > -Doug > |
From: Peter L. <pet...@te...> - 2009-09-27 14:43:50
|
When I tried a later trunk SVN, I got errors: 1969: ERROR: grampsgui.py: line 347: Gramps failed to start. Traceback (most recent call last): File "C:\Program\grampstrunk\gui\grampsgui.py", line 327, in __startgramps Gramps(argparser) File "C:\Program\grampstrunk\gui\grampsgui.py", line 240, in __init__ from viewmanager import ViewManager File "C:\Program\grampstrunk\gui\viewmanager.py", line 58, in <module> from cli.grampscli import CLIManager File "C:\Program\grampstrunk\cli\grampscli.py", line 56, in <module> from Utils import get_researcher File "C:\Program\grampstrunk\Utils.py", line 50, in <module> from const import TEMP_DIR, USER_HOME, WINDOWS ImportError: cannot import name WINDOWS /Peter > Hi, > > I have a windows box with Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) > [MSC v.1500 32 bit (Intel)] > and a gramps trunk version 3.2.0-0.SVN12762M > > I also have 2.5.1 on that box. > > I created a empty database and imported it with: > 2.5.1 took about 3 minutes > 2.6.2 took about one hour. > > Tried to run "Tools -> Database Processing -> Fix Capitalization of > Family Names" > After 50 minutes I had to kill it with 2.6.2. I killed the 2.5.1 version > after 20 minutes. > > When I tested in my Liuix box with 2.6.1 and svn13257 I had to run the tool > 10 times before all surnames were converted. > > /Peter > > > This may be a bsddb bug, but I've now confirmed the behavior on the > > following two different computer systems and they don't have much in > > common: > > > > Machine 1: > > > > Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) > > [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] > > BSDDB version: 4.7.3 > > Gramps version: 3.2.0-0.SVN13094M > > LANG: C > > OS: Linux > > Distribution: 2.6.29.6-217.2.3.fc11.i586 > > > > Machine 2: > > > > Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) > > [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] > > BSDDB version: 4.4.5.3 > > Gramps version: 3.2.0-0.SVN13263 > > LANG: C > > OS: Linux > > Distribution: 2.6.27.24-170.2.68.fc10.i686 > > > > If someone wants to test this, here is one way (not the fastest): > > > > 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > > 2. load the BUELL001.GED into an empty database > > 3. run the Tools -> Database Processing -> Fix Capitalization of > > Family Names (if you are running a Python prior to 2.6, you'll need > > "from __future__ import with_statement" at the top of > > src/plugins/tools/ChangeNames.py) > > 4. Select all names, and apply changes > > 5. If there are any names that are still all caps, then it skipped them > > > > I'd be very interested to see if this works on Python 2.6.2. We may > > need a work around in any event. > > > > -Doug > > --------------------------------------------------------------------------- >--- Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel -- Peter Landgren Talken Hagen 671 94 BRUNSKOG 0570-530 21 070-635 4719 pet...@te... Skype: pgl4820.2 |
From: Doug B. <dou...@gm...> - 2009-09-27 15:48:31
|
On Sun, Sep 27, 2009 at 9:33 AM, Peter Landgren <pet...@te...> wrote: > When I tried a later trunk SVN, I got errors: > 1969: ERROR: grampsgui.py: line 347: Gramps failed to start. > Traceback (most recent call last): > File "C:\Program\grampstrunk\gui\grampsgui.py", line 327, in __startgramps > Gramps(argparser) > File "C:\Program\grampstrunk\gui\grampsgui.py", line 240, in __init__ > from viewmanager import ViewManager > File "C:\Program\grampstrunk\gui\viewmanager.py", line 58, in <module> > from cli.grampscli import CLIManager > File "C:\Program\grampstrunk\cli\grampscli.py", line 56, in <module> > from Utils import get_researcher > File "C:\Program\grampstrunk\Utils.py", line 50, in <module> > from const import TEMP_DIR, USER_HOME, WINDOWS > ImportError: cannot import name WINDOWS > > /Peter > I think you need to run ./autogen.sh -Doug >> Hi, >> >> I have a windows box with Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) >> [MSC v.1500 32 bit (Intel)] >> and a gramps trunk version 3.2.0-0.SVN12762M >> >> I also have 2.5.1 on that box. >> >> I created a empty database and imported it with: >> 2.5.1 took about 3 minutes >> 2.6.2 took about one hour. >> >> Tried to run "Tools -> Database Processing -> Fix Capitalization of >> Family Names" >> After 50 minutes I had to kill it with 2.6.2. I killed the 2.5.1 version >> after 20 minutes. >> >> When I tested in my Liuix box with 2.6.1 and svn13257 I had to run the >> tool >> 10 times before all surnames were converted. >> >> /Peter >> >> > This may be a bsddb bug, but I've now confirmed the behavior on the >> > following two different computer systems and they don't have much in >> > common: >> > >> > Machine 1: >> > >> > Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) >> > [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] >> > BSDDB version: 4.7.3 >> > Gramps version: 3.2.0-0.SVN13094M >> > LANG: C >> > OS: Linux >> > Distribution: 2.6.29.6-217.2.3.fc11.i586 >> > >> > Machine 2: >> > >> > Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) >> > [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] >> > BSDDB version: 4.4.5.3 >> > Gramps version: 3.2.0-0.SVN13263 >> > LANG: C >> > OS: Linux >> > Distribution: 2.6.27.24-170.2.68.fc10.i686 >> > >> > If someone wants to test this, here is one way (not the fastest): >> > >> > 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >> > 2. load the BUELL001.GED into an empty database >> > 3. run the Tools -> Database Processing -> Fix Capitalization of >> > Family Names (if you are running a Python prior to 2.6, you'll need >> > "from __future__ import with_statement" at the top of >> > src/plugins/tools/ChangeNames.py) >> > 4. Select all names, and apply changes >> > 5. If there are any names that are still all caps, then it skipped them >> > >> > I'd be very interested to see if this works on Python 2.6.2. We may >> > need a work around in any event. >> > >> > -Doug >> >> >> --------------------------------------------------------------------------- >>--- Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register >> now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel > > -- > Peter Landgren > Talken Hagen > 671 94 BRUNSKOG > 0570-530 21 > 070-635 4719 > pet...@te... > Skype: pgl4820.2 > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > |
From: Peter L. <pet...@te...> - 2009-09-27 17:54:33
|
Not on Windows XP? I think I made mistakes when I copied files from Linux to Windows. More carefully done, I takes 36 minutes to import the GEDCOM file and to an empty database. The Fix capitalization shows the same behavior as before. I used gramps svn 13257 and python 2.6.2 on Windows XP. /Peter /Peter > > I think you need to run ./autogen.sh > > -Doug > > >> Hi, > >> > >> I have a windows box with Python 2.6.2 (r262:71605, Apr 14 2009, > >> 22:40:02) [MSC v.1500 32 bit (Intel)] > >> and a gramps trunk version 3.2.0-0.SVN12762M > >> > >> I also have 2.5.1 on that box. > >> > >> I created a empty database and imported it with: > >> 2.5.1 took about 3 minutes > >> 2.6.2 took about one hour. > >> > >> Tried to run "Tools -> Database Processing -> Fix Capitalization of > >> Family Names" > >> After 50 minutes I had to kill it with 2.6.2. I killed the 2.5.1 version > >> after 20 minutes. > >> > >> When I tested in my Liuix box with 2.6.1 and svn13257 I had to run the > >> tool > >> 10 times before all surnames were converted. > >> > >> /Peter > >> > >> > This may be a bsddb bug, but I've now confirmed the behavior on the > >> > following two different computer systems and they don't have much in > >> > common: > >> > > >> > Machine 1: > >> > > >> > Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) > >> > [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] > >> > BSDDB version: 4.7.3 > >> > Gramps version: 3.2.0-0.SVN13094M > >> > LANG: C > >> > OS: Linux > >> > Distribution: 2.6.29.6-217.2.3.fc11.i586 > >> > > >> > Machine 2: > >> > > >> > Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) > >> > [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] > >> > BSDDB version: 4.4.5.3 > >> > Gramps version: 3.2.0-0.SVN13263 > >> > LANG: C > >> > OS: Linux > >> > Distribution: 2.6.27.24-170.2.68.fc10.i686 > >> > > >> > If someone wants to test this, here is one way (not the fastest): > >> > > >> > 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > >> > 2. load the BUELL001.GED into an empty database > >> > 3. run the Tools -> Database Processing -> Fix Capitalization of > >> > Family Names (if you are running a Python prior to 2.6, you'll need > >> > "from __future__ import with_statement" at the top of > >> > src/plugins/tools/ChangeNames.py) > >> > 4. Select all names, and apply changes > >> > 5. If there are any names that are still all caps, then it skipped > >> > them > >> > > >> > I'd be very interested to see if this works on Python 2.6.2. We may > >> > need a work around in any event. > >> > > >> > -Doug > >> > >> ------------------------------------------------------------------------ > >>--- --- Come build with us! The BlackBerry® Developer Conference in > >> SF, CA is the only developer event you need to attend this year. > >> Jumpstart your developing skills, take BlackBerry mobile applications to > >> market and stay ahead of the curve. Join us from November 9-12, > >> 2009. Register now! > >> http://p.sf.net/sfu/devconf > >> _______________________________________________ > >> Gramps-devel mailing list > >> Gra...@li... > >> https://lists.sourceforge.net/lists/listinfo/gramps-devel > > > > -- > > Peter Landgren > > Talken Hagen > > 671 94 BRUNSKOG > > 0570-530 21 > > 070-635 4719 > > pet...@te... > > Skype: pgl4820.2 > > > > > > ------------------------------------------------------------------------- > >----- Come build with us! The BlackBerry® Developer Conference in SF, > > CA is the only developer event you need to attend this year. Jumpstart > > your developing skills, take BlackBerry mobile applications to market and > > stay ahead of the curve. Join us from November 9-12, 2009. Register > > now! http://p.sf.net/sfu/devconf > > _______________________________________________ > > Gramps-devel mailing list > > Gra...@li... > > https://lists.sourceforge.net/lists/listinfo/gramps-devel -- Peter Landgren Talken Hagen 671 94 BRUNSKOG 0570-530 21 070-635 4719 pet...@te... Skype: pgl4820.2 |
From: Jérôme <rom...@ya...> - 2009-09-28 07:04:15
|
> File "C:\Program\grampstrunk\Utils.py", line 50, in <module> > from const import TEMP_DIR, USER_HOME, WINDOWS > ImportError: cannot import name WINDOWS const.py : line 202 WINDOWS = ["Windows", "win32"] Maybe there is a wrong value ? Windows ? win32 ? nt ? python >> import platform >> platform.system() >> platform.machine() >> platform.win32_ver() >> import os >> os.name >> os.environ ... etc ... Doug Blank a écrit : > On Sun, Sep 27, 2009 at 9:33 AM, Peter Landgren <pet...@te...> wrote: >> When I tried a later trunk SVN, I got errors: >> 1969: ERROR: grampsgui.py: line 347: Gramps failed to start. >> Traceback (most recent call last): >> File "C:\Program\grampstrunk\gui\grampsgui.py", line 327, in __startgramps >> Gramps(argparser) >> File "C:\Program\grampstrunk\gui\grampsgui.py", line 240, in __init__ >> from viewmanager import ViewManager >> File "C:\Program\grampstrunk\gui\viewmanager.py", line 58, in <module> >> from cli.grampscli import CLIManager >> File "C:\Program\grampstrunk\cli\grampscli.py", line 56, in <module> >> from Utils import get_researcher >> File "C:\Program\grampstrunk\Utils.py", line 50, in <module> >> from const import TEMP_DIR, USER_HOME, WINDOWS >> ImportError: cannot import name WINDOWS >> >> /Peter >> > > I think you need to run ./autogen.sh > > -Doug > > >>> Hi, >>> >>> I have a windows box with Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) >>> [MSC v.1500 32 bit (Intel)] >>> and a gramps trunk version 3.2.0-0.SVN12762M >>> >>> I also have 2.5.1 on that box. >>> >>> I created a empty database and imported it with: >>> 2.5.1 took about 3 minutes >>> 2.6.2 took about one hour. >>> >>> Tried to run "Tools -> Database Processing -> Fix Capitalization of >>> Family Names" >>> After 50 minutes I had to kill it with 2.6.2. I killed the 2.5.1 version >>> after 20 minutes. >>> >>> When I tested in my Liuix box with 2.6.1 and svn13257 I had to run the >>> tool >>> 10 times before all surnames were converted. >>> >>> /Peter >>> >>>> This may be a bsddb bug, but I've now confirmed the behavior on the >>>> following two different computer systems and they don't have much in >>>> common: >>>> >>>> Machine 1: >>>> >>>> Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) >>>> [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] >>>> BSDDB version: 4.7.3 >>>> Gramps version: 3.2.0-0.SVN13094M >>>> LANG: C >>>> OS: Linux >>>> Distribution: 2.6.29.6-217.2.3.fc11.i586 >>>> >>>> Machine 2: >>>> >>>> Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) >>>> [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] >>>> BSDDB version: 4.4.5.3 >>>> Gramps version: 3.2.0-0.SVN13263 >>>> LANG: C >>>> OS: Linux >>>> Distribution: 2.6.27.24-170.2.68.fc10.i686 >>>> >>>> If someone wants to test this, here is one way (not the fastest): >>>> >>>> 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>> 2. load the BUELL001.GED into an empty database >>>> 3. run the Tools -> Database Processing -> Fix Capitalization of >>>> Family Names (if you are running a Python prior to 2.6, you'll need >>>> "from __future__ import with_statement" at the top of >>>> src/plugins/tools/ChangeNames.py) >>>> 4. Select all names, and apply changes >>>> 5. If there are any names that are still all caps, then it skipped them >>>> >>>> I'd be very interested to see if this works on Python 2.6.2. We may >>>> need a work around in any event. >>>> >>>> -Doug >>> >>> --------------------------------------------------------------------------- >>> --- Come build with us! The BlackBerry® Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart your >>> developing skills, take BlackBerry mobile applications to market and stay >>> ahead of the curve. Join us from November 9-12, 2009. Register >>> now! >>> http://p.sf.net/sfu/devconf >>> _______________________________________________ >>> Gramps-devel mailing list >>> Gra...@li... >>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> -- >> Peter Landgren >> Talken Hagen >> 671 94 BRUNSKOG >> 0570-530 21 >> 070-635 4719 >> pet...@te... >> Skype: pgl4820.2 >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> >> > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Gerald B. <ger...@gm...> - 2009-09-28 11:30:45
|
Folks - you're mixing up three or four unrelated issues here. This is not helpful. Let's stay on topic. You can open bug reports for the other things. On 9/28/09, Jérôme <rom...@ya...> wrote: >> File "C:\Program\grampstrunk\Utils.py", line 50, in <module> >> from const import TEMP_DIR, USER_HOME, WINDOWS >> ImportError: cannot import name WINDOWS > > const.py : line 202 > WINDOWS = ["Windows", "win32"] > > Maybe there is a wrong value ? > Windows ? win32 ? nt ? > > python > >> import platform > >> platform.system() > >> platform.machine() > >> platform.win32_ver() > >> import os > >> os.name > >> os.environ > ... etc ... > > > > > > > Doug Blank a écrit : >> On Sun, Sep 27, 2009 at 9:33 AM, Peter Landgren <pet...@te...> >> wrote: >>> When I tried a later trunk SVN, I got errors: >>> 1969: ERROR: grampsgui.py: line 347: Gramps failed to start. >>> Traceback (most recent call last): >>> File "C:\Program\grampstrunk\gui\grampsgui.py", line 327, in >>> __startgramps >>> Gramps(argparser) >>> File "C:\Program\grampstrunk\gui\grampsgui.py", line 240, in __init__ >>> from viewmanager import ViewManager >>> File "C:\Program\grampstrunk\gui\viewmanager.py", line 58, in <module> >>> from cli.grampscli import CLIManager >>> File "C:\Program\grampstrunk\cli\grampscli.py", line 56, in <module> >>> from Utils import get_researcher >>> File "C:\Program\grampstrunk\Utils.py", line 50, in <module> >>> from const import TEMP_DIR, USER_HOME, WINDOWS >>> ImportError: cannot import name WINDOWS >>> >>> /Peter >>> >> >> I think you need to run ./autogen.sh >> >> -Doug >> >> >>>> Hi, >>>> >>>> I have a windows box with Python 2.6.2 (r262:71605, Apr 14 2009, >>>> 22:40:02) >>>> [MSC v.1500 32 bit (Intel)] >>>> and a gramps trunk version 3.2.0-0.SVN12762M >>>> >>>> I also have 2.5.1 on that box. >>>> >>>> I created a empty database and imported it with: >>>> 2.5.1 took about 3 minutes >>>> 2.6.2 took about one hour. >>>> >>>> Tried to run "Tools -> Database Processing -> Fix Capitalization of >>>> Family Names" >>>> After 50 minutes I had to kill it with 2.6.2. I killed the 2.5.1 version >>>> after 20 minutes. >>>> >>>> When I tested in my Liuix box with 2.6.1 and svn13257 I had to run the >>>> tool >>>> 10 times before all surnames were converted. >>>> >>>> /Peter >>>> >>>>> This may be a bsddb bug, but I've now confirmed the behavior on the >>>>> following two different computer systems and they don't have much in >>>>> common: >>>>> >>>>> Machine 1: >>>>> >>>>> Python version: 2.6 (r26:66714, Jun 8 2009, 16:07:26) >>>>> [GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] >>>>> BSDDB version: 4.7.3 >>>>> Gramps version: 3.2.0-0.SVN13094M >>>>> LANG: C >>>>> OS: Linux >>>>> Distribution: 2.6.29.6-217.2.3.fc11.i586 >>>>> >>>>> Machine 2: >>>>> >>>>> Python version: 2.5.2 (r252:60911, Sep 30 2008, 15:41:38) >>>>> [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] >>>>> BSDDB version: 4.4.5.3 >>>>> Gramps version: 3.2.0-0.SVN13263 >>>>> LANG: C >>>>> OS: Linux >>>>> Distribution: 2.6.27.24-170.2.68.fc10.i686 >>>>> >>>>> If someone wants to test this, here is one way (not the fastest): >>>>> >>>>> 1. download http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>> 2. load the BUELL001.GED into an empty database >>>>> 3. run the Tools -> Database Processing -> Fix Capitalization of >>>>> Family Names (if you are running a Python prior to 2.6, you'll need >>>>> "from __future__ import with_statement" at the top of >>>>> src/plugins/tools/ChangeNames.py) >>>>> 4. Select all names, and apply changes >>>>> 5. If there are any names that are still all caps, then it skipped them >>>>> >>>>> I'd be very interested to see if this works on Python 2.6.2. We may >>>>> need a work around in any event. >>>>> >>>>> -Doug >>>> >>>> --------------------------------------------------------------------------- >>>> --- Come build with us! The BlackBerry® Developer Conference in SF, >>>> CA >>>> is the only developer event you need to attend this year. Jumpstart your >>>> developing skills, take BlackBerry mobile applications to market and >>>> stay >>>> ahead of the curve. Join us from November 9-12, 2009. Register >>>> now! >>>> http://p.sf.net/sfu/devconf >>>> _______________________________________________ >>>> Gramps-devel mailing list >>>> Gra...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >>> -- >>> Peter Landgren >>> Talken Hagen >>> 671 94 BRUNSKOG >>> 0570-530 21 >>> 070-635 4719 >>> pet...@te... >>> Skype: pgl4820.2 >>> >>> >>> ------------------------------------------------------------------------------ >>> Come build with us! The BlackBerry® Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart your >>> developing skills, take BlackBerry mobile applications to market and stay >>> ahead of the curve. Join us from November 9-12, 2009. Register >>> now! >>> http://p.sf.net/sfu/devconf >>> _______________________________________________ >>> Gramps-devel mailing list >>> Gra...@li... >>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >>> >>> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register >> now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> > > -- Sent from my mobile device Gerald Britton |
From: Gerald B. <ger...@gm...> - 2009-09-28 13:22:55
|
Just got around to trying this and I couldn't reproduce the error on the sample database (3 changes) nor my own database (28 changes, all over the alphabet, more than 5000 individuals). Could you please tell me more about the data base you started with? On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: > Gerald (et al), > > I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: > > with self.db.get_person_cursor(update=True, commit=True) as cursor: > for handle, data in cursor: > person = Person(data) > change = False > for name in [person.get_primary_name()] + > person.get_alternate_names(): > sname = name.get_surname() > if sname in changelist: > change = True > sname = self.name_cap(sname) > name.set_surname(sname) > if change: > cursor.update(handle, person.serialize()) > > and it looks like the cursor is skipping around the order as the names > change, and missing some of the people, and thus names. > > To test this out, you could download the following zipped GEDCOM of > the Tudor royal family of England, and try to fix the capital letters > by running Tools -> Database Processing -> Fix Capitalization of > Family Names. There are 9882 names in the file, FYI. When I go through > the above loop, I only hit 6758 people the first time, 7367 the second > time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes > to go through all 9882. > > Do you think what I describe is the problem? Is there a way to get a > cursor that won't change order as you update? > > Thanks for any insight, > > -Doug > > http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > -- Gerald Britton |
From: Doug B. <dou...@gm...> - 2009-09-28 14:05:03
|
On Mon, Sep 28, 2009 at 9:22 AM, Gerald Britton <ger...@gm...> wrote: > Just got around to trying this and I couldn't reproduce the error on > the sample database (3 changes) nor my own database (28 changes, all > over the alphabet, more than 5000 individuals). > > Could you please tell me more about the data base you started with? The link to it is here: http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip I don't know anything else about it. It is GEDCOM imported, and then applied the Fix case of Surname tool. But perhaps you're on to something: maybe the original handle is changing (maybe unicode vs string or something) so that it isn't merely an update? In any event, can you try out this database? -Doug > > On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >> Gerald (et al), >> >> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >> >> with self.db.get_person_cursor(update=True, commit=True) as cursor: >> for handle, data in cursor: >> person = Person(data) >> change = False >> for name in [person.get_primary_name()] + >> person.get_alternate_names(): >> sname = name.get_surname() >> if sname in changelist: >> change = True >> sname = self.name_cap(sname) >> name.set_surname(sname) >> if change: >> cursor.update(handle, person.serialize()) >> >> and it looks like the cursor is skipping around the order as the names >> change, and missing some of the people, and thus names. >> >> To test this out, you could download the following zipped GEDCOM of >> the Tudor royal family of England, and try to fix the capital letters >> by running Tools -> Database Processing -> Fix Capitalization of >> Family Names. There are 9882 names in the file, FYI. When I go through >> the above loop, I only hit 6758 people the first time, 7367 the second >> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >> to go through all 9882. >> >> Do you think what I describe is the problem? Is there a way to get a >> cursor that won't change order as you update? >> >> Thanks for any insight, >> >> -Doug >> >> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >> > > > > -- > Gerald Britton > |
From: Gerald B. <ger...@gm...> - 2009-09-28 15:49:55
|
Interesting! So here is a case where every surname must change. I don't yet understand why it causes some records to be skipped. In some cases it does some surnames (John DOE becomes John Doe) while ignoring others with the same original surname (Jane DOE remains Jane DOE) I can't see that this is a fault in the cursor.update method since it just calls DBCusor.put with the DB_CURRENT flag. I'll see if I can find some answers from the maintainers and maybe work up a simple test case. In the meantime I put it back to the old way -- retrieve a list of handles first, then work on that list. I hope I can restore the cursor.update approach as it is more performant and uses less storage, but until that time we have a working ChangeNames again. On Mon, Sep 28, 2009 at 9:37 AM, Doug Blank <dou...@gm...> wrote: > On Mon, Sep 28, 2009 at 9:22 AM, Gerald Britton > <ger...@gm...> wrote: >> Just got around to trying this and I couldn't reproduce the error on >> the sample database (3 changes) nor my own database (28 changes, all >> over the alphabet, more than 5000 individuals). >> >> Could you please tell me more about the data base you started with? > > The link to it is here: > > http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip > > I don't know anything else about it. It is GEDCOM imported, and then > applied the Fix case of Surname tool. > > But perhaps you're on to something: maybe the original handle is > changing (maybe unicode vs string or something) so that it isn't > merely an update? In any event, can you try out this database? > > -Doug > >> >> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>> Gerald (et al), >>> >>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>> >>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>> for handle, data in cursor: >>> person = Person(data) >>> change = False >>> for name in [person.get_primary_name()] + >>> person.get_alternate_names(): >>> sname = name.get_surname() >>> if sname in changelist: >>> change = True >>> sname = self.name_cap(sname) >>> name.set_surname(sname) >>> if change: >>> cursor.update(handle, person.serialize()) >>> >>> and it looks like the cursor is skipping around the order as the names >>> change, and missing some of the people, and thus names. >>> >>> To test this out, you could download the following zipped GEDCOM of >>> the Tudor royal family of England, and try to fix the capital letters >>> by running Tools -> Database Processing -> Fix Capitalization of >>> Family Names. There are 9882 names in the file, FYI. When I go through >>> the above loop, I only hit 6758 people the first time, 7367 the second >>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>> to go through all 9882. >>> >>> Do you think what I describe is the problem? Is there a way to get a >>> cursor that won't change order as you update? >>> >>> Thanks for any insight, >>> >>> -Doug >>> >>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>> >> >> >> >> -- >> Gerald Britton >> > -- Gerald Britton |
From: Doug B. <dou...@gm...> - 2009-10-01 02:39:50
|
On Mon, Sep 28, 2009 at 11:49 AM, Gerald Britton <ger...@gm...> wrote: > Interesting! So here is a case where every surname must change. I > don't yet understand why it causes some records to be skipped. In > some cases it does some surnames (John DOE becomes John Doe) while > ignoring others with the same original surname (Jane DOE remains Jane > DOE) > > I can't see that this is a fault in the cursor.update method since it > just calls DBCusor.put with the DB_CURRENT flag. I'll see if I can > find some answers from the maintainers and maybe work up a simple test > case. > > In the meantime I put it back to the old way -- retrieve a list of > handles first, then work on that list. I hope I can restore the > cursor.update approach as it is more performant and uses less storage, > but until that time we have a working ChangeNames again. > I think I have an idea of what is causing the mysterious jumping cursor. It looks like a problem using a binary protocol of pickle.dumps in the bsddb. It looks like that might be causing some undefined behavior. The low-level string representation in bsddb looks like this for a person record after the GEDCOM import: '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]q\x03NX\x08\x00\x00\x00Philippaq\x04X\x06\x00\x00\x00SOWTERq\x05X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]q\x0bK\x00X\x04\x00\x00\x00REFNq\x0c\x86X\x01\x00\x00\x00+tq\ra]]]]q\x0eJ\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86\x89]tq\x0f.' However, if you merely load and dump the data it then looks like: "(S'b6888def2f339e6a84b'\nVI226\nI0\n(I00\n(l(lNVPhilippa\nVSOWTER\nV\nS''\n(I2\nV\ntS''\nS''\nS''\nI0\nI0\nV\nt(lI-1\nI-1\n(l(lp1\nS'b6888def17102a28c2a'\np2\na(lp3\nS'b6888def2f7189807b0'\np4\na(l(l(lp5\n(I00\n(l(l(I0\nVREFN\ntV+\ntp6\na(l(l(l(lI1254360081\n(I-1\nV\ntI00\n(lt." That update code might just look like: for handle, data in cursor: person = Person(data) cursor.update(handle, person.serialize()) Of course, that causes the cursor to jump around. But, if you force it to update all the records then that seems to fix the weird jumping cursor. If you pickle.dumps the data with protocol 2, then you get data that looks very similar to the original imported data: >>> pickle.dumps(('b6888def2f339e6a84b', u'I226', 0, (False, [], [], None, u'Philippa', u'SOWTER', u'', '', (2, u''), '', '', '', 0, 0, u''), [], -1, -1, [], ['b6888def17102a28c2a'], ['b6888def2f7189807b0'], [], [], [(False, [], [], (0, u'REFN'), u'+')], [], [], [], [], 1254360081, (-1, u''), False, []), 2) '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]NX\x08\x00\x00\x00Philippaq\x03X\x06\x00\x00\x00SOWTERq\x04X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86q\x05U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]K\x00X\x04\x00\x00\x00REFNq\x0b\x86q\x0cX\x01\x00\x00\x00+tq\ra]]]]J\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86q\x0e\x89]t.' The funny thing is, I can't find where GRAMPS might be saving the data with protocol 2!? Anyone see a difference in the way that GEDCOM saves the data, and how the update saves it? -Doug > > > On Mon, Sep 28, 2009 at 9:37 AM, Doug Blank <dou...@gm...> wrote: >> On Mon, Sep 28, 2009 at 9:22 AM, Gerald Britton >> <ger...@gm...> wrote: >>> Just got around to trying this and I couldn't reproduce the error on >>> the sample database (3 changes) nor my own database (28 changes, all >>> over the alphabet, more than 5000 individuals). >>> >>> Could you please tell me more about the data base you started with? >> >> The link to it is here: >> >> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >> >> I don't know anything else about it. It is GEDCOM imported, and then >> applied the Fix case of Surname tool. >> >> But perhaps you're on to something: maybe the original handle is >> changing (maybe unicode vs string or something) so that it isn't >> merely an update? In any event, can you try out this database? >> >> -Doug >> >>> >>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>> Gerald (et al), >>>> >>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>> >>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>> for handle, data in cursor: >>>> person = Person(data) >>>> change = False >>>> for name in [person.get_primary_name()] + >>>> person.get_alternate_names(): >>>> sname = name.get_surname() >>>> if sname in changelist: >>>> change = True >>>> sname = self.name_cap(sname) >>>> name.set_surname(sname) >>>> if change: >>>> cursor.update(handle, person.serialize()) >>>> >>>> and it looks like the cursor is skipping around the order as the names >>>> change, and missing some of the people, and thus names. >>>> >>>> To test this out, you could download the following zipped GEDCOM of >>>> the Tudor royal family of England, and try to fix the capital letters >>>> by running Tools -> Database Processing -> Fix Capitalization of >>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>> to go through all 9882. >>>> >>>> Do you think what I describe is the problem? Is there a way to get a >>>> cursor that won't change order as you update? >>>> >>>> Thanks for any insight, >>>> >>>> -Doug >>>> >>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>> >>> >>> >>> >>> -- >>> Gerald Britton >>> >> > > > > -- > Gerald Britton > |
From: Gerald B. <ger...@gm...> - 2009-10-01 04:12:53
|
Sorry Doug but what your describe cannot cause the cursor to jump around. The actual cursor is nothing more than a record pointer in the underlying C-code. The cursor.update call only ever updates the data portion of the rrecord -- never the key (in fact, the key portion of the update call is ignored and I'll probably just removed the parameter altogether. The pickle protocol is also not an issue, since the records that ARE updated this way, are updated correctly. If it were the pickle protocol, the data would be corrupted. I'm currently pursuing this issue with the maintainers, though it looks like I might have to rewrite the proof code in C to get their attention. Sigh... On Wed, Sep 30, 2009 at 10:39 PM, Doug Blank <dou...@gm...> wrote: > On Mon, Sep 28, 2009 at 11:49 AM, Gerald Britton > <ger...@gm...> wrote: >> Interesting! So here is a case where every surname must change. I >> don't yet understand why it causes some records to be skipped. In >> some cases it does some surnames (John DOE becomes John Doe) while >> ignoring others with the same original surname (Jane DOE remains Jane >> DOE) >> >> I can't see that this is a fault in the cursor.update method since it >> just calls DBCusor.put with the DB_CURRENT flag. I'll see if I can >> find some answers from the maintainers and maybe work up a simple test >> case. >> >> In the meantime I put it back to the old way -- retrieve a list of >> handles first, then work on that list. I hope I can restore the >> cursor.update approach as it is more performant and uses less storage, >> but until that time we have a working ChangeNames again. >> > > I think I have an idea of what is causing the mysterious jumping > cursor. It looks like a problem using a binary protocol of > pickle.dumps in the bsddb. It looks like that might be causing some > undefined behavior. > > The low-level string representation in bsddb looks like this for a > person record after the GEDCOM import: > > '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]q\x03NX\x08\x00\x00\x00Philippaq\x04X\x06\x00\x00\x00SOWTERq\x05X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]q\x0bK\x00X\x04\x00\x00\x00REFNq\x0c\x86X\x01\x00\x00\x00+tq\ra]]]]q\x0eJ\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86\x89]tq\x0f.' > > However, if you merely load and dump the data it then looks like: > > "(S'b6888def2f339e6a84b'\nVI226\nI0\n(I00\n(l(lNVPhilippa\nVSOWTER\nV\nS''\n(I2\nV\ntS''\nS''\nS''\nI0\nI0\nV\nt(lI-1\nI-1\n(l(lp1\nS'b6888def17102a28c2a'\np2\na(lp3\nS'b6888def2f7189807b0'\np4\na(l(l(lp5\n(I00\n(l(l(I0\nVREFN\ntV+\ntp6\na(l(l(l(lI1254360081\n(I-1\nV\ntI00\n(lt." > > That update code might just look like: > > for handle, data in cursor: > person = Person(data) > cursor.update(handle, person.serialize()) > > Of course, that causes the cursor to jump around. But, if you force it > to update all the records then that seems to fix the weird jumping > cursor. > > If you pickle.dumps the data with protocol 2, then you get data that > looks very similar to the original imported data: > >>>> pickle.dumps(('b6888def2f339e6a84b', u'I226', 0, (False, [], [], None, u'Philippa', u'SOWTER', u'', '', (2, u''), '', '', '', 0, 0, u''), [], -1, -1, [], ['b6888def17102a28c2a'], ['b6888def2f7189807b0'], [], [], [(False, [], [], (0, u'REFN'), u'+')], [], [], [], [], 1254360081, (-1, u''), False, []), 2) > > '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]NX\x08\x00\x00\x00Philippaq\x03X\x06\x00\x00\x00SOWTERq\x04X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86q\x05U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]K\x00X\x04\x00\x00\x00REFNq\x0b\x86q\x0cX\x01\x00\x00\x00+tq\ra]]]]J\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86q\x0e\x89]t.' > > The funny thing is, I can't find where GRAMPS might be saving the data > with protocol 2!? > > Anyone see a difference in the way that GEDCOM saves the data, and how > the update saves it? > > -Doug > >> >> >> On Mon, Sep 28, 2009 at 9:37 AM, Doug Blank <dou...@gm...> wrote: >>> On Mon, Sep 28, 2009 at 9:22 AM, Gerald Britton >>> <ger...@gm...> wrote: >>>> Just got around to trying this and I couldn't reproduce the error on >>>> the sample database (3 changes) nor my own database (28 changes, all >>>> over the alphabet, more than 5000 individuals). >>>> >>>> Could you please tell me more about the data base you started with? >>> >>> The link to it is here: >>> >>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>> >>> I don't know anything else about it. It is GEDCOM imported, and then >>> applied the Fix case of Surname tool. >>> >>> But perhaps you're on to something: maybe the original handle is >>> changing (maybe unicode vs string or something) so that it isn't >>> merely an update? In any event, can you try out this database? >>> >>> -Doug >>> >>>> >>>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>>> Gerald (et al), >>>>> >>>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>>> >>>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>>> for handle, data in cursor: >>>>> person = Person(data) >>>>> change = False >>>>> for name in [person.get_primary_name()] + >>>>> person.get_alternate_names(): >>>>> sname = name.get_surname() >>>>> if sname in changelist: >>>>> change = True >>>>> sname = self.name_cap(sname) >>>>> name.set_surname(sname) >>>>> if change: >>>>> cursor.update(handle, person.serialize()) >>>>> >>>>> and it looks like the cursor is skipping around the order as the names >>>>> change, and missing some of the people, and thus names. >>>>> >>>>> To test this out, you could download the following zipped GEDCOM of >>>>> the Tudor royal family of England, and try to fix the capital letters >>>>> by running Tools -> Database Processing -> Fix Capitalization of >>>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>>> to go through all 9882. >>>>> >>>>> Do you think what I describe is the problem? Is there a way to get a >>>>> cursor that won't change order as you update? >>>>> >>>>> Thanks for any insight, >>>>> >>>>> -Doug >>>>> >>>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>> >>>> >>>> >>>> >>>> -- >>>> Gerald Britton >>>> >>> >> >> >> >> -- >> Gerald Britton >> > -- Gerald Britton |
From: Doug B. <dou...@gm...> - 2009-10-02 14:19:48
|
On Thu, Oct 1, 2009 at 12:12 AM, Gerald Britton <ger...@gm...> wrote: > Sorry Doug but what your describe cannot cause the cursor to jump > around. The actual cursor is nothing more than a record pointer in > the underlying C-code. The cursor.update call only ever updates the > data portion of the rrecord -- never the key (in fact, the key portion > of the update call is ignored and I'll probably just removed the > parameter altogether. The pickle protocol is also not an issue, since > the records that ARE updated this way, are updated correctly. If it > were the pickle protocol, the data would be corrupted. > > I'm currently pursuing this issue with the maintainers, though it > looks like I might have to rewrite the proof code in C to get their > attention. Sigh... > I understand that what I am saying cannot cause the error, and yet I don't see anything that could possibly cause the error, if we have everything correct. I notice that even if you replace the data with an exact copy, then it still will move the cursor: with self.db.get_person_cursor(update=True, commit=True) as cursor: count = 0 for handle, data in cursor: cursor.update(handle, data) count += 1 In addition, it appears that this is the case with our code and many past versions of BSDDB. Either they have a low-level cursor bug (and they have had it for a while) or we have a bug. I'm still looking at the possibility that we are changing the key in some manner (perhaps unicode vs str, pickle protocol, key function, etc) that causes the update to move to the wrong record. But if that were the case, you'd expect the number of records to change. Perhaps this is a difference between string types from C to Python. In other words, perhaps C uses some equals function to see if the keys are the same and reports that they are not, but when it actually saves the data it puts the data over the same key. I assume that this is your conversation with Oracle: http://forums.oracle.com/forums/thread.jspa?threadID=964075&tstart=0 So far I guess whatever this is, it only effects traversing a cursor if we change items? If we only use cursors for read-only tasks, we appear to be ok? -Doug > On Wed, Sep 30, 2009 at 10:39 PM, Doug Blank <dou...@gm...> wrote: >> On Mon, Sep 28, 2009 at 11:49 AM, Gerald Britton >> <ger...@gm...> wrote: >>> Interesting! So here is a case where every surname must change. I >>> don't yet understand why it causes some records to be skipped. In >>> some cases it does some surnames (John DOE becomes John Doe) while >>> ignoring others with the same original surname (Jane DOE remains Jane >>> DOE) >>> >>> I can't see that this is a fault in the cursor.update method since it >>> just calls DBCusor.put with the DB_CURRENT flag. I'll see if I can >>> find some answers from the maintainers and maybe work up a simple test >>> case. >>> >>> In the meantime I put it back to the old way -- retrieve a list of >>> handles first, then work on that list. I hope I can restore the >>> cursor.update approach as it is more performant and uses less storage, >>> but until that time we have a working ChangeNames again. >>> >> >> I think I have an idea of what is causing the mysterious jumping >> cursor. It looks like a problem using a binary protocol of >> pickle.dumps in the bsddb. It looks like that might be causing some >> undefined behavior. >> >> The low-level string representation in bsddb looks like this for a >> person record after the GEDCOM import: >> >> '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]q\x03NX\x08\x00\x00\x00Philippaq\x04X\x06\x00\x00\x00SOWTERq\x05X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]q\x0bK\x00X\x04\x00\x00\x00REFNq\x0c\x86X\x01\x00\x00\x00+tq\ra]]]]q\x0eJ\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86\x89]tq\x0f.' >> >> However, if you merely load and dump the data it then looks like: >> >> "(S'b6888def2f339e6a84b'\nVI226\nI0\n(I00\n(l(lNVPhilippa\nVSOWTER\nV\nS''\n(I2\nV\ntS''\nS''\nS''\nI0\nI0\nV\nt(lI-1\nI-1\n(l(lp1\nS'b6888def17102a28c2a'\np2\na(lp3\nS'b6888def2f7189807b0'\np4\na(l(l(lp5\n(I00\n(l(l(I0\nVREFN\ntV+\ntp6\na(l(l(l(lI1254360081\n(I-1\nV\ntI00\n(lt." >> >> That update code might just look like: >> >> for handle, data in cursor: >> person = Person(data) >> cursor.update(handle, person.serialize()) >> >> Of course, that causes the cursor to jump around. But, if you force it >> to update all the records then that seems to fix the weird jumping >> cursor. >> >> If you pickle.dumps the data with protocol 2, then you get data that >> looks very similar to the original imported data: >> >>>>> pickle.dumps(('b6888def2f339e6a84b', u'I226', 0, (False, [], [], None, u'Philippa', u'SOWTER', u'', '', (2, u''), '', '', '', 0, 0, u''), [], -1, -1, [], ['b6888def17102a28c2a'], ['b6888def2f7189807b0'], [], [], [(False, [], [], (0, u'REFN'), u'+')], [], [], [], [], 1254360081, (-1, u''), False, []), 2) >> >> '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]NX\x08\x00\x00\x00Philippaq\x03X\x06\x00\x00\x00SOWTERq\x04X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86q\x05U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]K\x00X\x04\x00\x00\x00REFNq\x0b\x86q\x0cX\x01\x00\x00\x00+tq\ra]]]]J\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86q\x0e\x89]t.' >> >> The funny thing is, I can't find where GRAMPS might be saving the data >> with protocol 2!? >> >> Anyone see a difference in the way that GEDCOM saves the data, and how >> the update saves it? >> >> -Doug >> >>> >>> >>> On Mon, Sep 28, 2009 at 9:37 AM, Doug Blank <dou...@gm...> wrote: >>>> On Mon, Sep 28, 2009 at 9:22 AM, Gerald Britton >>>> <ger...@gm...> wrote: >>>>> Just got around to trying this and I couldn't reproduce the error on >>>>> the sample database (3 changes) nor my own database (28 changes, all >>>>> over the alphabet, more than 5000 individuals). >>>>> >>>>> Could you please tell me more about the data base you started with? >>>> >>>> The link to it is here: >>>> >>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>> >>>> I don't know anything else about it. It is GEDCOM imported, and then >>>> applied the Fix case of Surname tool. >>>> >>>> But perhaps you're on to something: maybe the original handle is >>>> changing (maybe unicode vs string or something) so that it isn't >>>> merely an update? In any event, can you try out this database? >>>> >>>> -Doug >>>> >>>>> >>>>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>>>> Gerald (et al), >>>>>> >>>>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>>>> >>>>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>>>> for handle, data in cursor: >>>>>> person = Person(data) >>>>>> change = False >>>>>> for name in [person.get_primary_name()] + >>>>>> person.get_alternate_names(): >>>>>> sname = name.get_surname() >>>>>> if sname in changelist: >>>>>> change = True >>>>>> sname = self.name_cap(sname) >>>>>> name.set_surname(sname) >>>>>> if change: >>>>>> cursor.update(handle, person.serialize()) >>>>>> >>>>>> and it looks like the cursor is skipping around the order as the names >>>>>> change, and missing some of the people, and thus names. >>>>>> >>>>>> To test this out, you could download the following zipped GEDCOM of >>>>>> the Tudor royal family of England, and try to fix the capital letters >>>>>> by running Tools -> Database Processing -> Fix Capitalization of >>>>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>>>> to go through all 9882. >>>>>> >>>>>> Do you think what I describe is the problem? Is there a way to get a >>>>>> cursor that won't change order as you update? >>>>>> >>>>>> Thanks for any insight, >>>>>> >>>>>> -Doug >>>>>> >>>>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Gerald Britton >>>>> >>>> >>> >>> >>> >>> -- >>> Gerald Britton >>> >> > > > > -- > Gerald Britton > |
From: Gerald B. <ger...@gm...> - 2009-10-02 14:30:31
|
You have it in a nutshell. If we don't update anything, traversal with a cursor produces the same results each time. (In fact, under the covers, the python wrappers for bsddb use cursor traversal to implement the keys() method and the cursor functions next(), prev(), etc.) If we update anything (even replacing with the same data), it screws up the cursor position. That's why I'm discussing it in the bsddb forum. There has to be a bug in there or some undocumented way to to read-modify-write while keeping the cursor consistent. Note that, according to the docs, the DBCursor.put() function ignores the key parameter for DB_CURRENT operations. So these are equivalent: cursor.update(handle, data) cursor.udpate('foobar', data) cursor.update(None, data) etc. On Fri, Oct 2, 2009 at 10:19 AM, Doug Blank <dou...@gm...> wrote: > On Thu, Oct 1, 2009 at 12:12 AM, Gerald Britton > <ger...@gm...> wrote: >> Sorry Doug but what your describe cannot cause the cursor to jump >> around. The actual cursor is nothing more than a record pointer in >> the underlying C-code. The cursor.update call only ever updates the >> data portion of the rrecord -- never the key (in fact, the key portion >> of the update call is ignored and I'll probably just removed the >> parameter altogether. The pickle protocol is also not an issue, since >> the records that ARE updated this way, are updated correctly. If it >> were the pickle protocol, the data would be corrupted. >> >> I'm currently pursuing this issue with the maintainers, though it >> looks like I might have to rewrite the proof code in C to get their >> attention. Sigh... >> > > I understand that what I am saying cannot cause the error, and yet I > don't see anything that could possibly cause the error, if we have > everything correct. I notice that even if you replace the data with an > exact copy, then it still will move the cursor: > > with self.db.get_person_cursor(update=True, commit=True) as cursor: > count = 0 > for handle, data in cursor: > cursor.update(handle, data) > count += 1 > > In addition, it appears that this is the case with our code and many > past versions of BSDDB. > > Either they have a low-level cursor bug (and they have had it for a > while) or we have a bug. I'm still looking at the possibility that we > are changing the key in some manner (perhaps unicode vs str, pickle > protocol, key function, etc) that causes the update to move to the > wrong record. But if that were the case, you'd expect the number of > records to change. Perhaps this is a difference between string types > from C to Python. In other words, perhaps C uses some equals function > to see if the keys are the same and reports that they are not, but > when it actually saves the data it puts the data over the same key. > > I assume that this is your conversation with Oracle: > > http://forums.oracle.com/forums/thread.jspa?threadID=964075&tstart=0 > > So far I guess whatever this is, it only effects traversing a cursor > if we change items? If we only use cursors for read-only tasks, we > appear to be ok? > > -Doug > >> On Wed, Sep 30, 2009 at 10:39 PM, Doug Blank <dou...@gm...> wrote: >>> On Mon, Sep 28, 2009 at 11:49 AM, Gerald Britton >>> <ger...@gm...> wrote: >>>> Interesting! So here is a case where every surname must change. I >>>> don't yet understand why it causes some records to be skipped. In >>>> some cases it does some surnames (John DOE becomes John Doe) while >>>> ignoring others with the same original surname (Jane DOE remains Jane >>>> DOE) >>>> >>>> I can't see that this is a fault in the cursor.update method since it >>>> just calls DBCusor.put with the DB_CURRENT flag. I'll see if I can >>>> find some answers from the maintainers and maybe work up a simple test >>>> case. >>>> >>>> In the meantime I put it back to the old way -- retrieve a list of >>>> handles first, then work on that list. I hope I can restore the >>>> cursor.update approach as it is more performant and uses less storage, >>>> but until that time we have a working ChangeNames again. >>>> >>> >>> I think I have an idea of what is causing the mysterious jumping >>> cursor. It looks like a problem using a binary protocol of >>> pickle.dumps in the bsddb. It looks like that might be causing some >>> undefined behavior. >>> >>> The low-level string representation in bsddb looks like this for a >>> person record after the GEDCOM import: >>> >>> '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]q\x03NX\x08\x00\x00\x00Philippaq\x04X\x06\x00\x00\x00SOWTERq\x05X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]q\x0bK\x00X\x04\x00\x00\x00REFNq\x0c\x86X\x01\x00\x00\x00+tq\ra]]]]q\x0eJ\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86\x89]tq\x0f.' >>> >>> However, if you merely load and dump the data it then looks like: >>> >>> "(S'b6888def2f339e6a84b'\nVI226\nI0\n(I00\n(l(lNVPhilippa\nVSOWTER\nV\nS''\n(I2\nV\ntS''\nS''\nS''\nI0\nI0\nV\nt(lI-1\nI-1\n(l(lp1\nS'b6888def17102a28c2a'\np2\na(lp3\nS'b6888def2f7189807b0'\np4\na(l(l(lp5\n(I00\n(l(l(I0\nVREFN\ntV+\ntp6\na(l(l(l(lI1254360081\n(I-1\nV\ntI00\n(lt." >>> >>> That update code might just look like: >>> >>> for handle, data in cursor: >>> person = Person(data) >>> cursor.update(handle, person.serialize()) >>> >>> Of course, that causes the cursor to jump around. But, if you force it >>> to update all the records then that seems to fix the weird jumping >>> cursor. >>> >>> If you pickle.dumps the data with protocol 2, then you get data that >>> looks very similar to the original imported data: >>> >>>>>> pickle.dumps(('b6888def2f339e6a84b', u'I226', 0, (False, [], [], None, u'Philippa', u'SOWTER', u'', '', (2, u''), '', '', '', 0, 0, u''), [], -1, -1, [], ['b6888def17102a28c2a'], ['b6888def2f7189807b0'], [], [], [(False, [], [], (0, u'REFN'), u'+')], [], [], [], [], 1254360081, (-1, u''), False, []), 2) >>> >>> '\x80\x02(U\x13b6888def2f339e6a84bq\x01X\x04\x00\x00\x00I226q\x02K\x00(\x89]]NX\x08\x00\x00\x00Philippaq\x03X\x06\x00\x00\x00SOWTERq\x04X\x00\x00\x00\x00U\x00K\x02X\x00\x00\x00\x00\x86q\x05U\x00U\x00U\x00K\x00K\x00X\x00\x00\x00\x00t]J\xff\xff\xff\xffJ\xff\xff\xff\xff]]q\x06U\x13b6888def17102a28c2aq\x07a]q\x08U\x13b6888def2f7189807b0q\ta]]]q\n(\x89]]K\x00X\x04\x00\x00\x00REFNq\x0b\x86q\x0cX\x01\x00\x00\x00+tq\ra]]]]J\x11\x04\xc4JJ\xff\xff\xff\xffX\x00\x00\x00\x00\x86q\x0e\x89]t.' >>> >>> The funny thing is, I can't find where GRAMPS might be saving the data >>> with protocol 2!? >>> >>> Anyone see a difference in the way that GEDCOM saves the data, and how >>> the update saves it? >>> >>> -Doug >>> >>>> >>>> >>>> On Mon, Sep 28, 2009 at 9:37 AM, Doug Blank <dou...@gm...> wrote: >>>>> On Mon, Sep 28, 2009 at 9:22 AM, Gerald Britton >>>>> <ger...@gm...> wrote: >>>>>> Just got around to trying this and I couldn't reproduce the error on >>>>>> the sample database (3 changes) nor my own database (28 changes, all >>>>>> over the alphabet, more than 5000 individuals). >>>>>> >>>>>> Could you please tell me more about the data base you started with? >>>>> >>>>> The link to it is here: >>>>> >>>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>> >>>>> I don't know anything else about it. It is GEDCOM imported, and then >>>>> applied the Fix case of Surname tool. >>>>> >>>>> But perhaps you're on to something: maybe the original handle is >>>>> changing (maybe unicode vs string or something) so that it isn't >>>>> merely an update? In any event, can you try out this database? >>>>> >>>>> -Doug >>>>> >>>>>> >>>>>> On Sat, Sep 26, 2009 at 11:44 AM, Doug Blank <dou...@gm...> wrote: >>>>>>> Gerald (et al), >>>>>>> >>>>>>> I'm looking at some code in trunk/src/plugins/tool/ChangeNames.py: >>>>>>> >>>>>>> with self.db.get_person_cursor(update=True, commit=True) as cursor: >>>>>>> for handle, data in cursor: >>>>>>> person = Person(data) >>>>>>> change = False >>>>>>> for name in [person.get_primary_name()] + >>>>>>> person.get_alternate_names(): >>>>>>> sname = name.get_surname() >>>>>>> if sname in changelist: >>>>>>> change = True >>>>>>> sname = self.name_cap(sname) >>>>>>> name.set_surname(sname) >>>>>>> if change: >>>>>>> cursor.update(handle, person.serialize()) >>>>>>> >>>>>>> and it looks like the cursor is skipping around the order as the names >>>>>>> change, and missing some of the people, and thus names. >>>>>>> >>>>>>> To test this out, you could download the following zipped GEDCOM of >>>>>>> the Tudor royal family of England, and try to fix the capital letters >>>>>>> by running Tools -> Database Processing -> Fix Capitalization of >>>>>>> Family Names. There are 9882 names in the file, FYI. When I go through >>>>>>> the above loop, I only hit 6758 people the first time, 7367 the second >>>>>>> time, 7846 the third time, 8278 the fourth time, ... it takes 9 passes >>>>>>> to go through all 9882. >>>>>>> >>>>>>> Do you think what I describe is the problem? Is there a way to get a >>>>>>> cursor that won't change order as you update? >>>>>>> >>>>>>> Thanks for any insight, >>>>>>> >>>>>>> -Doug >>>>>>> >>>>>>> http://www.genealogyforum.com/gedcom/gedcom2a/gedr2090.zip >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Gerald Britton >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Gerald Britton >>>> >>> >> >> >> >> -- >> Gerald Britton >> > -- Gerald Britton |