Re: [Gramps-devel] [Gramps-users] Gramps vs BIG database

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, 2007-01-15 at 21:58 +0100, ben...@ug... wrote:
> This should not really be the case. If you filter on a .gramps xml text=20
> file  of
> 20 Mb with good regex construction, it should not take more than a couple=
 of
> seconds. I need 3sec to regex a flat text file of 300Mb. It all depends o=
n how
> you design search and view. Filter now loops over all handles I have the
> impression, but this can be written for some to loop over indices first, =
then
> records.

Yes, for some filters. But not many, unless we want to increase
the number of secondary indices. Besides, filter rules differ in
complexity as follows:

1. Simples rule is such that can yield a yes/no answer based
on the data stored in the db for the handle. E.g. the gender is
the integer number, and when the person is stored in the db,
the gender is the nth component of the tuple corresponding
to the person. So to work, the rule could just do:
   value =3D db.get(handle)
   gender =3D value[n]
(I don't remember the n now :-)

An important thing is that one does not even need to instantiate
the Person() object, and that's a huge gain in performance. Again,
the DB access is really fast, but the overheads in python code
are adding up.

2. More complex rules, e.g. Name contains...: we need to instantiate
the Person object to gain access to the Name class. Then we work with
the attributes of the Name (and the items of the alternate name
list), but still we only retrieve one record per person to determine
yes/no with this rule. More expensive than (1) but still bearable.

3. Yet more complex rules: e.g. Has Birth... : we retrieve person object
to find the reference to the birth event. Then we retrieve birth event
from the event table (maybe several, if the person has more than one).
So in the end we need to retrieve several db records from two tables.

4. Yet more complex rules, e.g. Has Family Event: we retrieve person,
find out families it belongs to. Then we retrieve each family from the
family table. Each has a list of events. Then we retrieve each event
from the event table. So it's 3 tables and potentially many more
DB lookups, to reach the yes/no decision.

This can go on and on. Wait until we get to "matches the ancestor
of the filter match" and the like. My point is, we're asking gramps
to really do a lot of work. I doubt this can be done with grep,
except for the type 1 and 2 rules.

> On a database implementation there should be no reason to have a slow
> application. Let's be honoust, 100.000 records should be peanuts, you=20
> never see more than 50 on a single screen.

But you would need to process all 100 000 to filter them, or have
some pre-processing done before hand, such as the secondary indices.

Alex

--=20
Alexander Roitman   http://www.gramps-project.org

Re: [Gramps-devel] [Gramps-users] Gramps vs BIG database

Gramps, the open source genealogy program

Re: [Gramps-devel] [Gramps-users] Gramps vs BIG database