2009/9/10 Benny Malengier <email@example.com>
2009/9/10 Frederico Muñoz <firstname.lastname@example.org>:
As GEDCOM has no provision for it, there is no way it can be included
> I'll leave the implementation details to the developers. In general
> terms the proposal to add a "general" field to enter the surnames is
> something that I find sensible, with the following requirements:
> - GEDCOM export should be straightforward. The flattening of the
> surnames should have very clear and simple rules
> - The data entry dialog should be easy to use
> Additionally I'm unsure how GEDCOM import would work out.
and made to work.
What GEDCOM has support for is multiple surnames. They are separated by commas in the value of the SURN tag. No such provision is there for the NAME tag, so leaving the commas there is arguably not permitted by GEDCOM.
So, no matter how it is arranged in fields in GRAMPS, export to GEDCOM should use commas in SURN and expect them on import. GRAMPS may optionally support other things, but that is what GEDCOM requires.
There is no reason to do GRAMPS --> GEDCOM --> GRAMPS
This was the only way to copy a set of people (defined with a filter) from one GRAMPS database to another, since XML export did not use filters. Does it now? I have myself done that gazillions of times.
So adding new fields to GEDCOM only GRAMPS knows adds no value. Time
is better spend cleaning up import of GEDCOM which would just keep the
names as stored, and add new ones present in the GEDCOM.
As a consequence, the way to support it, is to import names as now
from GEDCOM (with addition for support of nickname that is present in
GEDCOM) and offer a plugin tool 'Cleanup names' that can batch process
names so as to split up surnames.
That tool should eg allow for:
* split multiple surnames as <secondary surname> <primary surname>
* split multiple surnames as <primary surname> <secondary surname>
It may help in some cases. But it is in general undecidable. Each surname may contain multiple words and even humans can't get it right everytime without context. Humans should be able to express the correct way to parse names.
An example, in a name like "García Álvarez de Toledo y Carrillo de Toledo":
- How should it be parsed? How many surnames are in it? Well, García is a given name that is now seen only as a surname, but things were different in the past. It seems that "Carrillo de Toledo" would be a second surname, but there are other problems.
- Is "Álvarez" a real patronymic? This means García is the child of some Álvaro (possibly "de Toledo", but no guarantee). His children will probably use the "Garcés" (or even "García") as patronymic and, possibly, "de Toledo" as surname. This scheme was valid until ca. 1200. In this case, Álvarez should go into the patronymic field or, as is common, as part of the given names. "de" would go into the surname prefix field.
- Is "Álvarez" a false patronymic? In this case, his father was not Álvaro, but he got his name (given plus patronymic) from some ancestor or relative of notice named "García Álvarez" (not necessarily "de Toledo"). In this case, the best coding is probably the same as for real patronymics except that context does not help a lot. This is after ca. 1200.
- Is it a non fixed part of the surname? It is a dark period where patronymics start being inherited but they have not solidified. In this case, García's father was "Álvarez de Toledo" but some of Garc'ia's siblings would be just "de Toledo", as would some of his children. Some of them might even have a different patronymic. In this case, IMHO, I think it is best to put all of "Álvarez de" as surname prefix. Otherwise it would sort García under "A" and only an amateur would want that.
- Is it a solidified part of the surname? In this case "Álvarez de Toledo" is uniformly inherited as a block and exceptions are rare. All of it should go into the surname (and sort under "A").
This is not a made-up name, artificially obfuscated. It is the name of a real person, namely, the first Duke of Alva. Unskilled humans are bad at parsing many of these names, so you get to hear a lot of silliness.
I think I have now vigorously scratched the surface of the Spanish name parsing problem. Problem is that most people who understand the issues know very little of computers. And those who understand computers have never seen a facsimile of a document as recent as the 19th century.
I am only unclear in how to handle the binding y. Is it ok to consider
this a prefix? Probably not, but then the y should be part of
secondary surname field, so as to make printing work correctly.is I
would think it is not too much of a bother to have that there if you
want it, after all, you know what it means.
Adding a field (surname combine field) just for the 'y' looks like
overkill to me. The usage will be so specific per culture, with
probably some exceptions too.....
The idea previously that surname is like 'name, nome', and this
automatically goes to 'name y nome' would not work in general either.
Well, I said 'y' as just an example. It can be 'e' too, depending on context. And in Catalan names, 'i' is used instead. The connector may also be a comma, especially if there are more than two surnames. Did this get weird enough already?
If you never try to do anything intelligent with the second surname, like indexing, sorting or exploiting them in some way, you can just leave the connector and prefixes with the following surname. But as soon as you try to do something smart with them... If your program does smart things with later surnames, you have no choice but leaving out the connector completely.
The beauty of what they have done in PGV is how simple it is and how much mileage you can get from something this simple. BTW, they did not do it specifically for Spanish, there are pathological cases in other cultures where this is a win, but can't remember the particulars.