Stephen P. Morse had enhanced his Hebrew Soundex code
since we defined the PGV algorithm based on his one
that is used for instance on the Ellis Island
application and on his site http://stevemorse.org/ . We
should make similar changes in PGV to ensure as correct
results as possible. Stephen has approved our
understanding of the required changes.
A. If a letter does not exist in the Soundex rules
array, try to remove its accents and umlauts to define
the basic letter and use this basic letter in the DM
Soundex calculation in a similar way to the PGV
Dictionary Sort.
(This will enable us to remove accented letters from
our dmarray.full.utf-8.php file)
B. In Hebrew, and at least Arabic, vowels are not
written. This makes it impossible to define all the
correct soundex codes for letter-combinations that may
or may not include a vowel between adjacent letters.
Stephen has added some pre-Soundex code calculation
processing that ensures improved results. We should do
the same in PhpGedView at the time we add DM Soundex
codes to the DB tables or calculate the search value's
DM Soundex codes.
In PhpGedView, if RtL processing is activated and the
name contains Hebrew characters, use the following
algorithm to prepare the name for the DM Soundex code
calculation (something similar is probably needed for
Arabic as soon as we get definitions)
a. Force a single Vav (ו) to be a vowel by replacing it
by an Ayin (ע) in the following cases:
* it is preceded by Bet (ב) or by Pei (פ)
* it is followed by Mem (מ, ם) or by Nun (נ, ן)
b. Force a double Vav (וו) to be a consonant by
replacing it by Bet (ב)
c. Handle double Yud (יי)
** if they precede a Hei (ה) or an Ayin (ע) at the end
of the word, leave them as they are
** else force the two Yuds to be a vowel by changing
them to an Ayin
d. Hebrew names are written without most vowels.
This causes the consecutive letter combinations to
catch, although actually a vowel might exist between
the letters.
We should prepare the Soundex code for the name as
spelled and create alternate names and codes by adding
an intervening Ayin (vowel) between any consecutive
letters, except between two Vavs. It is possible that
we want to include in the future additional
letter-combinations for which the Ayin should not be
added or to define specifically for which letter
combinations to add the Ayin.
e. All the resulting DM Soundex codes should be
combined and any duplicates removed.
f. Sample:
The original name is זהב
The possible alternate names are זהב, זהעב, זעהב, זעהעב
The DM Soundex codes of this name are: 457000 470000
g. In PhpGedView when the user searches by the surname,
the result list of names should (at least optionally,
probably even by default) be sorted by the Given name.
(I think this is already included in a previous posting)
Logged In: YES
user_id=959928
In the original request, the last point g. should be a C.
Logged In: YES
user_id=959928
D. I would optionally(?) return also Family data, if the
search is by Place.
Logged In: YES
user_id=959928
E. It would be useful if we could also search on the
father's first name soundex code(s).
Logged In: YES
user_id=959928
F. We should either also print the soundex codes in our
result or at least enable to print them when &DEBUG=1. We
should also print the code when there is only one result and
we are immediately transferred to the individual page.
G. We should replace the cached DM Soundex array when the
file changes.