Limited entries in "Most Common Surnames" statistic
Brought to you by:
canajun2eh,
yalnifj
Installation: PGV 4.1.3
Within the GEDCOM Statistics on my Welcome Page, I want to show all surnames of my GEDCOM file. Therefore, I set the minimum limit for surnames to be shown in the "Most Common Surnames" statistic to a value of "1". With a minimum of "1", I expect PGV to show ALL surnames. However, the number of surnames shown seems to be limited to about 100. In my earlier 3.3.8 installation, all surnames (e.g. about 1000) were listed.
Logged In: YES
user_id=1466942
Originator: NO
In file functions_name.php, in function get_common_surnames(), the list is populted using:
$surnames = get_top_surnames(100);
Thus the limit is hard-coded. Apparantly by design.
Logged In: YES
user_id=1161917
Originator: YES
Thanx for this valuable hint. The hard coded parameter "100" in "get_top_surnames(100)" seems to be a straight forward explanation of the behaviour.
So, I tried to patch this line with smaller and larger parameters instead of "100". However, this did not seem to have any influences on the number of surnames shown in the GEDCOM statistics. The number of surnames shown is still about 100, regardless of the parameter. Maybe, the number of surnames is also hard coded to 100 in other parts of the code.
Logged In: YES
user_id=1910459
Originator: NO
You might need to clear your cache before changing '100' to anything else will happen.
Logged In: YES
user_id=1161917
Originator: YES
I have already tried to clear the cache, but it did not help. Neither with IE nor Firefox.
Some further test:
a) I commented out the function definition, i.e.
"//function get_common_surnames($min) {"
This directly resulted in an error message, when the the file "functions_name.php" was included. This also shows, that it is not a cache problem.
b) I changed the name of the function, i.e.
"function forget_common_surnames($min) {"
This did not result in an error message but in the already known behavior. For me, this clearly indicates that the discussed function is not called in the "Most common names" statistic. Maybe, another function ist used instead.
Logged In: YES
user_id=1198414
Originator: NO
The "clear cache" that's referred to here is the Index Page cache. You clear this by means of a button on the Configure Index Page page.
Logged In: YES
user_id=1198414
Originator: NO
The list of common surnames is stored in the "index/gedcoms.php" file. It is NOT re-calculated very often.
To make PGV re-calculate the list of common surnames, you have to change the GEDCOM configuration to set a different threshold. Change the threshold to something bigger than the desired "1", save the configuration, and then change the threshold to your desired "1" and save the configuration again.
You should, of course, also change the function that calculates the list so that it doesn't stop at 100.
Are you SURE you want to list every surname in your GEDCOM? I wouldn't do this: the list becomes VERY large VERY quickly, and thus gets to be totally useless to the casual observer. What's your purpose in listing EVERY surname in your database? Are you trying to convince search engines to look at your site? If so, it's not necessary to list all those surnames.
Logged In: YES
user_id=1161917
Originator: YES
O.k., thank you very much canajun2eh!
With your hints, I was now able to regenerate the most common surname list.
Regarding your question about the sense of my request, I can illustrate my use case:
My PGV runs as a sub-site within a Joomla-Wrapper. In this constellation, the inclusion of the most common surname list within the meta tags seemingly does not work. Therefore, I was looking for an alternative solution to get the surnames known to the search engines. Another important thing is, that I dont like the search engines to search the whole PGV-sites. Firstly, this produces a lot of nonsense traffic. Secondly, I do not want to have all data to be searchable. Therefore, I have added the following lines to my robots.txt file:
Disallow: /pgv4/
Allow: /pgv4/index.php
The idea is that the welcome site is found and the rest is quiet.
One last point. Regardless of such crazy use cases like mine, I do not consider hard coded limits like "get_top_surnames(100)" as good coding style. It is just something, which is not transparent and which the user cannot expect. Therefore, I would recommend to change this line of code.
(1) In the latest SVN code, this value is calculated each time it is needed (we now have this information in a table that can be quickly searched). It will therefore be updated immediate following the update of any names.
(2) This limit has been removed in the latest SVN code
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 100 days (the time period specified by
the administrator of this Tracker).