Menu

#357 Speed of Relationship calculator

open
None
5
2004-11-29
2004-11-28
Ron
No

Perhaps the feature I like best is the relationship
calculator. Unfortunately, its also the one feature
that IMHO is a little slow. I have a 70,000+
individual gedcom, and it often takes upwards of 6
minudes to determine a relationship on a 3 GHz machine
with a gig of memory and RAID drives.

Also, unless I have missed something, whenever I hit
"Relationship to me", it goes off looking for a direct
relationship. There is no way to avoid thiis, unless
one goes to the relationship chart and keys in the
identifiers for each of the individuals. It may take
two or three minutes to come back and respond that
there are no direct relationships, at which time one
can then then check for relationships by marriage.

Legacy in a couple of seconds can determine all direct
relationships to an individual. Why so long in
phpgedview? Is it the difference between php and C++?
Even with 70,000+ individuals, the database is still
quite small as compared to physical memory, so I can't
believe it's disk time.

Last time I checked, looking for a relationship by
marriage does not detect direct relationships. Is that
what is intended?

My suggestion on the "Relationship to me" is that it
either gives you the option on which type of
relationship you are interested in prior to searching,
or left mouse click = 1 type and ctrl +click = another.

Just my thoughts on improving what is already a great
product.
Ron

Discussion

  • John Finlay

    John Finlay - 2004-11-29
    • assigned_to: nobody --> yalnifj
     
  • John Finlay

    John Finlay - 2004-11-29

    Logged In: YES
    user_id=300048

    Hi Ron,

    I agree that the relationship calculator is slow. The greatest
    bottleneck is the database. If you turn on the statistics, you
    will see that it takes thousands of database queries to try
    and find the relationship. This is because every person that
    it checks has to be retrieved from the database and every
    family that it passes through also has to be retrieved.

    In a normal application you don't have this problem because
    the database is in memory. So you can go searching through
    memory to follow relationships without waiting for a database
    query.

    My algorithm is probably not the greatest. I based it on
    search algorithms that I learned in an artificial intelligence
    class I took in college. I wonder if other genealogy
    relationship algorithms are out there somewhere in the vast
    internet. I haven't ever looked.

    What I would like to do, and where I think we would find the
    greatest benefit in this area is to have some sort of
    numbering scheme for the people in the gedcom. An example
    of how we might go about this is to do something like this:
    Father = 1
    Child 1 = 1.1
    Child 2 = 1.2
    Child 3 = 1.3
    Child 1 of Child 1 = 1.1.1
    Child 2 of Child 1 = 1.1.2
    Child 1 of Child 2 = 1.2.1

    With this I could get all of the descendants of Father just by
    finding all of the people whose numbers start with "1." which
    also means that anyone who starts with a "1." is related.

    Anyway, this particular scheme has a lot of holes in it and
    generating the numbers to start with would be difficult.

    --John

     
  • Ron

    Ron - 2005-01-12

    Logged In: YES
    user_id=998633

    Hi John,
    Sorry for the delay - I thought I had responded.

    I run windows, and although it does a lot of mysql accesses, I
    notice that most of the CPU time, and it is cpu bound, is
    taken up in apache. I believe that means that most of the
    time is taken up in the php code.

    I guess php does not generate object code, but isn't there a
    way of distributing php projects in psuedo code that works a
    lot quicker?? If so, perhaps full dot revs could be compiled
    and distributed in that manner?! Means that users won't be
    able to avail themselves of CVS, however speed has it's
    advantages, especially with large gedcoms.
    regards,
    Ron

     
  • Stephen Arnold

    Stephen Arnold - 2005-02-17

    Logged In: YES
    user_id=1061833

    It would also be a nice addition to this feature if a
    summary of the relationship appeared at the top. GEDCOMIT
    reports the following (not bragging you understand) and the
    calculation took about 12 seconds whereas it timed out after
    120 seconds in PGV:

    "George Walker Bush is your sixth cousin once removed." when
    reporting the relationship to me.
    stephen

     
  • FamilyConnections

    Logged In: YES
    user_id=1463438

    I also found this feature slow, and decided to do something
    about it. Turns out the algorithm to find direct-line
    relatives can be very speedy - even eith 34,000+ names in a
    file. I would share my patch - if I knew how.

    See it in action at http://www.ournetwork.net/FamilyConnections
    using pids I409 and I5994, or try other combinations.

    For a real treat, try I409 with himself, or with his wife I410!

    Haven't got all the kinks worked out yet, but for me this is
    waaaaay better than what is in the 3.3.8 release.

     
  • Paul King

    Paul King - 2006-08-07

    Logged In: YES
    user_id=1566627

    Would be good if there actually WAS an "relationship to me"
    function in 4.01

    Best I can find is a chart naming the relationship at each
    intermediate connecting link between two individuals - no
    textual description of the overall relationship between the
    two people, which is a BIG omission!

     
  • KosherJava

    KosherJava - 2007-04-19

    Logged In: YES
    user_id=634811
    Originator: NO

    Can this be closed?

     
  • Stephen Arnold

    Stephen Arnold - 2007-04-19

    Logged In: YES
    user_id=1061833
    Originator: NO

    While my comment was resolved with the now existant relationship verbage, the speed issue still exists. On a VERY fast machine with our 46,000 INDI's, many calculations still time out or take more than 3 minutes. Practically useless.
    -Stephen

     
  • FamilyConnections

    Logged In: YES
    user_id=1463438
    Originator: NO

    I have a fix I use in 3.3.8, and am working on adding it to the project's current dev release. My fix finds up to 200 or more direct-line relationships in seconds, and names the overall relationship when each one is displayed.

    Will need help with translations of the relationship descriptions. Also - I am aware that there are (at least) 2 ways of describing a relationship; one used widely in North America and one used in France (and perhaps elsewhere). Any ideas on how to tackle this?

    Barry

     

Log in to post a comment.