From: Tom S. <gra...@mo...> - 2016-03-15 20:52:23
|
The story: I want to utilize Ancestry hints in my research. My research is relatively new and incomplete, and there are likely many records I haven't yet cited. I use Gramps as my primary tree management program. In using Ancestry hints, I'd like to be guided toward records I haven't yet used, but I have no interest in managing a tree on Ancestry. Any new records that Ancestry helps me to find, I will attach to my tree in Gramps in the same way I've been. So any data transfer between Gramps and Ancestry need only be one-way; I don't need to round-trip any data out of Ancestry. I'd just like Ancestry to be able to analyze my data and guide me, not become my platform. The problem: So I can export a gedcom from Gramps and upload it to Ancestry. Doing so will give me hints, but most of those hints will be noise. That's because Ancestry has no way of knowing that the citation I have to a source titled "United States Census, 1920" is the same as a specific record in their database. So most of the hints that Ancestry will give will be duplicates of what I already have. Ancestry can get that information through a gedcom upload though. It uses a proprietary tag _APID that references the database id and the record id. So if we could somehow enter that data into Gramps and get it to export it into gedcom, we'd be good. So one problem is where in the Gramps object hierarchy to store that info. The way that the information is traditionally organized, (e.g. through the census/forms addons) is n people in the same event, that event having one citation, to the specific source. The APID value is distinct for each person in the event, but each person can be in multiple cited events. The proof of concept: So I've created a proof of concept to tackle this problem. It consists of some changes to the gedcom exporter, and an optional gramplet to make data entry easier. Each Gramps source should map to a given Ancestry database ID. So each source has an attribute with that ID. Then, each person has attributes that reference their record number for a given database ID. For example (attribute names are likely to change): ============================= Source: US Census, 1920 attribute: Ancestry DBID = 6061 Citation: Pennsylvania, Allegheny County, Pittsburgh City, Pittsburgh, Ward 18, sheet 10A, family 227, Henry Watzlaf household Event: Census event Person: Henry George Watzlaf attribute: Ancestry APID H:6061 = 49733277 ============================= Henry George Watzlaf will have other similar attributes for his records in other databases, and the other people that appear in the 1920 census will have an attribute of "Ancestry APID H:6061" with different values. This will result in the CENS census gedcom record having a SOUR source record which contains a line of _APID 1,6061::49733277. When this gedcom is used to create a tree on Ancestry, it will be a reference to the Ancestry record at http://search.ancestry.com/cgi-bin/sse.dll?indiv=1&dbid=6061&h=49733277 and the hint will no longer be given since Ancestry will understand that I've already cited that record. My gramplet will, for the active citation (if its source has the attribute), enumerate events cited and list the people. It sorts them according to the order attribute used by the census/forms addon. Each person can have a record ID entered. For convenience, you can also paste in a URL to the record, and it pulls out the record id (from the "h" parameter). The takeaways: So I've been going through the process of generating a gedcom, uploading it to Ancestry, going through the hints to attribute my citations, then deleting the Ancestry tree and repeating. In doing so, using only hints (not actively finding records on Ancestry) it's succeeded at suggesting the majority of records in my test set of sources [1]. After these attributes, remaining hints are either false positives or genuinely records that I didn't yet have cited in my tree. One issue I've realized exists is if the same person appears in multiple records within the same database. For instance, if the same person is in a marriage license as a groom and in another as the father of the bride. I haven't tested this yet, I think the back-end will work, but there isn't enough information for the UI to be well behaved. So if you've read this far, thanks! I'd like any feedback you may have to offer. I'll package up the code later tonight, but as another warning, it's still very much first-attempt quality. [1] US census sources, PA birth and death certificates, United States Social Security Death Index |