Census data entry format (like Quick Update)
Brought to you by:
canajun2eh,
yalnifj
FHUG member and contributor Nick Walker has writtern a
most cool Gedcom Census prog to work with FH. It
assists with the transcription of census data into a
format that is widely accepted and is well defined.
Some "convention" and input format within PGV would be
cool too.
We've attached a screen shot of one of his templates,
but obviously the prog itself is C/R.
-Stephen
Nick Walkers Gedcom Census program for FH
Logged In: YES
user_id=300048
My plan has been to do this as part of the research log, the
research log being a workflow to help you take your research
from a todo item all the way through to data in your gedcom.
--John
Logged In: YES
user_id=1061833
Sounds like a great place for it!
Gosh, we've now transcribed over 1000 census family reports
in just one year. Maybe I should be doing something else
like trying to earn a living.
Anything to make it easier.
Stephen
Logged In: YES
user_id=623181
The picture as shown doesn't have fields for "marital
status", "sex", & "disability".
Nor does it give the ED (Ecclesiastical District) number ... is
the "157" shown a folio number or a page number or a
schedule number or some other number? There's not enough
info shown for a reader to then go and locate the census
entry in a source.
Most Census projects in the UK at least tend to use
spreadsheets in CVS format for inputting the date. eg
http://www.freecen.org.uk
http://freepages.genealogy.rootsweb.com/~kayhin/ukocp.html
Mark
Logged In: YES
user_id=623181
And even more glaringly omitted is "Relationship to Head of
Household"
Mark
Logged In: YES
user_id=1061833
dxradio
-perhaps you need to reread our RFE as it appears you've
taken the attached screenshot too literally. We suggested
PGV development of a "convention" and "input format",
similar in operation to the Quick Update concept. We did NOT
suggest using Nicks work as a template or format, only that
the development of such a method of entry within PGV would
be, perhaps, helpful.
Actually, the USGenWeb transcription project has an EXCEL
spreadsheet "model" for transcribers, but we thought that
John would assist with the introduction of this type of
device within PGV, and that the names, as listed, might end
up linking to the INDI, as entered. The HUSB/WIFE already
show the census data when entered into the FAM records
(except when the bug arises as reported under future bugs).
Each day we transribe over 20 census reports and thought it
might be helpful for PGV users to settle on a style/method
of including the information contained therein and reported
in a consistent fashion, compliant with v5.5 standards and
the notes capabilities. For relatives listed in the
1850-1930 U.S. census data, the information has proven
invaluable. For the earlier census reports, interpretation
of the BD information, and household sexual structure has
provided important clues to discovering the names of those
inhabitants, particularly using the 1820/40 reports combined
with the 1850 report.
As we have reviewed hundreds of GEDs as listed on Rootsweb,
we find that when Census data is noted, it is usually
haphazard and incomplete and always listed differently
within each GED. That's why we suggested a data entry method
(read convention) be developed within PGV that would assure
the maximum retention of the reported data, like the things
you critcize are missing from Nick's british census effort
for FHUG.
-stephen
Logged In: YES
user_id=623181
Preventing the CENS facts appearing on a dead individual's INDI page was
a feature introduced a few versions ago.
Census has always proved difficult to implement ... do you
a) enter it once in the FAM record, and one or more times for the INDI's
who were Servants, Lodgers, Boarders, Visitors, Sister-in-laws, Mother-in-
laws etc etc
b) enter it only in INDI records which of course ends up with massive
amounts of duplication.
I use (a) above ... and I've got about 350-450 censused families in each of
the UK censuses entered into my GEDCOM.
And in the past two weeks I've transcribed 2 whole Pieces of the 1901 as
well as checking several 1861 Pieces.
Mark
Logged In: YES
user_id=1061833
Mark
Tnx for the inquiry.
Yes, we tend to do as you do, Option A. We sometimes enter a
census still under the FAM record when one member is
deceased and the remaining is widowed, usually when there
are surviving offspring still living as a FAM unit with the
surviving spouse, as this seems to us to be a better reflection.
We then usually enter a more "reduced" information set for
each of the other contributing individuals, although we have
rarely added a servant or farm hand into the actual gedcom
(guess we could, as an ASSOC, but just never have).
We're pretty new at this, having only started a year ago. We
do have a 28000+ GED file with about 1729 census records
annotated. We add about 4-8 per day.
Sounds like you're busy at it like we are.
stephen
Logged In: YES
user_id=634811
Does the RA module resolve this request?
Logged In: YES
user_id=1061833
kosher
Not really, particularly based on john's reply in
discussions that the RA can/does not apply to FAM records,
only INDI's. We have now entered over 3500 "1 CENS" records
transcribed and would love to have some help with more, but
fear that "fixing" other cousins' entries may require as
much effort as entering the data ourselves without some data
entry formats.
Still think that a form could be developed to represent the
blanks on each Census year's designs, both US and other
countries, that could be accessed through the Edit Interface
when the ADD a CENSUS is selected.
thanks for the inquiry - Stephen
Logged In: YES
user_id=1254634
Its a shame no-one (with better technical skills than I)
has yet picked this one up. There are some great ideas
here so far.
For the record, Nick Walker's latest version has addressed
some of the earlier ctiticisms (not all).
For me, the biggest improvement is an option to store the
transcription data in a shared NOTE tag (eg - 0 NOTE
@N33@) This is then x-ref to each indi. It reduces the
duplication of transcription data acreoss the gedcom,
without the limitations imposed by putting it in the FAM
record. (see http://www.our-
families.info/famtree/individual.php?
pid=I10019&ged=Osborne.ged)
If there's a skilled developer with the time to devote to
this I for one would happily assist with the planning and
testing work. Any takers?
Logged In: YES
user_id=300048
This is being worked on as part of the research assistant.
There is *SO MUCH MORE* that we can do with these census
forms than just have a simplified entry UI. There is a
wealth of data on those forms.
Here are the requirements that I am going off of:
1. The system needs to be modular so that any entry form can
be created. I don't want to hard-code up a census form and
then have to duplicate the work for all of the other census
forms and any other form that can be thought of. The forms
should plug-in to a framework that allows for easily
creating the forms without over-burdening the developer with
all of the editing things.
2. It should encourage proper source citations and
automatically do as much work as possible. One of the
biggest weaknesses of GEDCOM (and almost all genealogy
applications regardless of the underlying datamodel) is that
source citations are not global records but have to be
manually enterred for every person they apply to. I don't
like the idea of putting the citation in a global note
because that breaks the source citation format. The
solution I would like to see is a global source citation
(CITE) record seperate from the current SOUR record. This
would be a major revision to the GEDCOM standard.
On the other hand, in order to make these forms work, we
have to go and touch all of the person's records. Since we
have to modify all of the records anyway, and since we are
doing it programmatically, it is not a big deal to also
duplicate the citation. If the UI can support only having
to enter it once, does it really matter what the model looks
like underneath? Well, of course altering the model gives
us more efficient storage, and we should move in that
direction. But moving in that direction will take time, and
since we have to provide a backwards compatibility anyway,
we might as well code to GEDCOM 5.5 now and when GEDCOM 5.6
comes along the changes to support it on the editing should
be fairly minimal.
3. Any data entered through the forms should also be
editable through the same form. And any changes should
automatically filter through to all of the individuals
without having to go back and manually edit all of the
individual entries.
4. There is a ton of data on those census forms and a lot of
factual data can be inferred from them. So we should be
able to attach each row of the census forms to the PGV
individuals they related to, and then automatically compare
the indi records with the census data and suggest facts to
the user that they might want to infer from the census data
and automatically have it cited to the census.
This is why I haven't wanted to link the census records to
families. Support will probably be added in the future
since we need to be able to create marriage record forms and
enter other family related facts. But current efforts are
being directed towards individuals and not families.
I have more to say about this area but I am out of time now.
Please feel free to comment or make suggestions.
--John
Logged In: YES
user_id=1254634
John, it is SO good to hear this is being worked on. I
hope you'll excuse the impatience of users, but the better
PGV gets the more we want!!! :)))
I would like to re-iterate though, Stephen's original
words. Nick Walker's programme does much of what you are
proposing - including your point 4 below, which is
extremely useful. Although his programme is C/R, it is
also free - so I wonder how he would react to an approach
from somebody of your standing to access the details of
his code. At the very least, your students would find it
invaluable to gain a working knowledge of what his
software can do.
The only issue I might have with your requirements is in
point 2 "If the UI can support only having to enter it
once, does it really matter what the model looks like
underneath?" Given the size some user's gedcom's are
reaching, I think it does matter, very much. It is also
relevant in the transfer of gedcom files between users and
systems. I don't have a quick answer, but hope that we can
find a solution somewhere between maintaining strong
adherance to GEDCOM standards (which I fully support)and
practicality (pragmatism?). Are you, for example,
absolutely sure that global notes are a problem? I think
that evidence to date suggests that waiting for a revision
to the standard is not going to get us anywhere soon.
As I type this, I wonder if I correctly understand the
terminology - particularly the word "citation".
My reference to the use of global notes is as a store for
the transcribed data from the census form, in a way that
can be viwed at every INDI page, but without repeating it.
This can be a substantial note, hence the concern for
GEDCOM "bloat". I think of the "citation" as a brief
reference to the page etc within the source that the data
came from. That will be much smaller, and could therefore
be repeated for each INDI.
Nigel
Logged In: YES
user_id=1447380
Most of what's been said here is Greek to me, but I know
there are census forms in the modules/ra/forms folder.
How can I use them?
Logged In: YES
user_id=300048
To use the forms, create a task in the research assistant,
edit the task and click the complete button. Then choose
the census form you want.
--John
Logged In: YES
user_id=300048
Much of the new things I talked about below have been
implemented and are now in the SVN. There is also now the
option of linking the data from these census forms to families.
Nigel,
Re: your comments about point 2 and the data model... you
are right that there are times when the data model makes a
difference. Unfortunately, in genealogy the best data model
depends on the view you want of the data.
In this case though, the lack of a full data model requires
the duplication of data throughout the gedcom. For example
if we want to cite the same census as the source of
information for a person's birth and for a census on the
family the following source citation has to be duplicated:
1 BIRT
2 SOUR @S1@
3 PAGE Twp, Co. State, Page 51
3 DATA
4 TEXT The note/comments/extracted text about this citation
1 CENS
2 SOUR @S1@
3 PAGE Twp, Co. State, Page 51
3 DATA
4 TEXT The note/comments/extracted text about this citation
The 2 SOUR line and it's subrecords make up a source citation.
--John
Logged In: YES
user_id=1254634
Hi John - thanks. The key is in the 4 TEXT line. Thats
where I put a global note instead, which at least
partially reduces the 'bloat' - like this:
1 CENS
2 DATE 6 JUN 1841
2 PLAC Biddenden, Kent, England
2 ADDR Bigg Place, Standen
2 AGE 8y
2 SOUR @S4@
3 PAGE HO107/471/7/9
3 DATA
4 DATE 6 JUN 1841
3 NOTE @N22@
There's still plenty of duplication - but the bulk is
stored in the note.
That said - I'm sure the way your are going will be the
best result once fully implemented. At least I won't have
to do all my census entry off-line any more.
Is someone writing a wiki entry or similar for the whole
RA module? As you may see from various queries on the
forums, a lot of people are playing with it, and having
trouble figuring out how to get the best from it.
nigel
Logged In: YES
user_id=300048
There is a readme file in the SVN now which has some info
about installing. Putting this information on a wiki is a
great idea. But, I think it would be better to wait until
the RA is out of its beta state before a great deal of
effort is speant on end documentation. However, I feel like
we are getting close.
--John
Logged In: YES
user_id=1254634
John - I've just tried using the 1930 census form in RA,
unsuccesfully. Not sure if I'm missing something, or
moving ahead of development.
I entered a task, then clicked 'completed'. I added all
details for 4 people on the 1930 census, and got all the
way to the "Success" logo.
Now, I can see the people linked on the task - but I can't
see any of the data entered through the form (where is it
stored?), and there's no census data connected to INDI or
FAM records. Where have I gone wrong?
Nigel
Logged In: YES
user_id=300048
Hi Nigel,
I just tested the 1930 census and worked correctly for me.
Can you email me a link to your site and a test username and
password so that I can see the problem?
--John
Logged In: YES
user_id=634811
Originator: NO
Can this be closed?
Logged In: YES
user_id=1061833
Originator: YES
The RA works fine, but leaves me cold and still in the dark. If I am correct, rather than going to a family ID and using the CENS fact, SOUR and NOTE fields, I must go to the RA, create a task, then proceed to attach a SOUR and person or persons. The SOUR field is lookup only, allowing no manual entry of the Sxxx as well as the INDI. The INDI was not self-populated from going from the INDI to the RA, so another lookup is necessary. There was no note field, so I could not embellish the facts included in the form with a brief list of neighbors or other comments. Maybe it is just me, but it seemed cumbersome and very time consuming. I've got well over 300 census transcriptions already in place and add several each week.
- Stephen
Logged In: YES
user_id=300048
Originator: NO
Many of these items have been addressed in the most recent addition of the RA.
Now the idea behind the RA is that you would create the task to find the census at the time it is determined that you need to look it up, before you actually find it, not after it has already been found. But to better accommodate the need to move more quickly through the workflow, you can now create a task and go right to the next step of choosing a census form. The process has been streamlined quite a bit, but I know that it can still be improved.
I have yet to decide how (and more importantly "if") we should connect the attempt to add a CENS fact through the normal PGV method directly to the RA forms.
You can add notes to the facts after you fill out the census form. That way the notes only appear on the CENS fact and not on any other facts such as BIRT, OCCU, etc. that might also be derived from the same census citation.