From: Alex R. <sh...@gr...> - 2007-01-15 22:30:17
|
On Mon, 2007-01-15 at 21:58 +0100, ben...@ug... wrote: > This should not really be the case. If you filter on a .gramps xml text=20 > file of > 20 Mb with good regex construction, it should not take more than a couple= of > seconds. I need 3sec to regex a flat text file of 300Mb. It all depends o= n how > you design search and view. Filter now loops over all handles I have the > impression, but this can be written for some to loop over indices first, = then > records. Yes, for some filters. But not many, unless we want to increase the number of secondary indices. Besides, filter rules differ in complexity as follows: 1. Simples rule is such that can yield a yes/no answer based on the data stored in the db for the handle. E.g. the gender is the integer number, and when the person is stored in the db, the gender is the nth component of the tuple corresponding to the person. So to work, the rule could just do: value =3D db.get(handle) gender =3D value[n] (I don't remember the n now :-) An important thing is that one does not even need to instantiate the Person() object, and that's a huge gain in performance. Again, the DB access is really fast, but the overheads in python code are adding up. 2. More complex rules, e.g. Name contains...: we need to instantiate the Person object to gain access to the Name class. Then we work with the attributes of the Name (and the items of the alternate name list), but still we only retrieve one record per person to determine yes/no with this rule. More expensive than (1) but still bearable. 3. Yet more complex rules: e.g. Has Birth... : we retrieve person object to find the reference to the birth event. Then we retrieve birth event from the event table (maybe several, if the person has more than one). So in the end we need to retrieve several db records from two tables. 4. Yet more complex rules, e.g. Has Family Event: we retrieve person, find out families it belongs to. Then we retrieve each family from the family table. Each has a list of events. Then we retrieve each event from the event table. So it's 3 tables and potentially many more DB lookups, to reach the yes/no decision. This can go on and on. Wait until we get to "matches the ancestor of the filter match" and the like. My point is, we're asking gramps to really do a lot of work. I doubt this can be done with grep, except for the type 1 and 2 rules. > On a database implementation there should be no reason to have a slow > application. Let's be honoust, 100.000 records should be peanuts, you=20 > never see more than 50 on a single screen. But you would need to process all 100 000 to filter them, or have some pre-processing done before hand, such as the secondary indices. Alex --=20 Alexander Roitman http://www.gramps-project.org |