Menu

#437 Census data entry format (like Quick Update)

open
5
2006-06-20
2005-05-30
No

FHUG member and contributor Nick Walker has writtern a
most cool Gedcom Census prog to work with FH. It
assists with the transcription of census data into a
format that is widely accepted and is well defined.
Some "convention" and input format within PGV would be
cool too.

We've attached a screen shot of one of his templates,
but obviously the prog itself is C/R.

-Stephen

Discussion

  • Stephen Arnold

    Stephen Arnold - 2005-05-30

    Nick Walkers Gedcom Census program for FH

     
  • John Finlay

    John Finlay - 2005-06-02

    Logged In: YES
    user_id=300048

    My plan has been to do this as part of the research log, the
    research log being a workflow to help you take your research
    from a todo item all the way through to data in your gedcom.

    --John

     
  • Stephen Arnold

    Stephen Arnold - 2005-06-02

    Logged In: YES
    user_id=1061833

    Sounds like a great place for it!
    Gosh, we've now transcribed over 1000 census family reports
    in just one year. Maybe I should be doing something else
    like trying to earn a living.

    Anything to make it easier.
    Stephen

     
  • Mark Hattam

    Mark Hattam - 2005-06-19

    Logged In: YES
    user_id=623181

    The picture as shown doesn't have fields for "marital
    status", "sex", & "disability".

    Nor does it give the ED (Ecclesiastical District) number ... is
    the "157" shown a folio number or a page number or a
    schedule number or some other number? There's not enough
    info shown for a reader to then go and locate the census
    entry in a source.

    Most Census projects in the UK at least tend to use
    spreadsheets in CVS format for inputting the date. eg
    http://www.freecen.org.uk
    http://freepages.genealogy.rootsweb.com/~kayhin/ukocp.html

    Mark

     
  • Mark Hattam

    Mark Hattam - 2005-06-19

    Logged In: YES
    user_id=623181

    And even more glaringly omitted is "Relationship to Head of
    Household"

    Mark

     
  • Stephen Arnold

    Stephen Arnold - 2005-06-19

    Logged In: YES
    user_id=1061833

    dxradio
    -perhaps you need to reread our RFE as it appears you've
    taken the attached screenshot too literally. We suggested
    PGV development of a "convention" and "input format",
    similar in operation to the Quick Update concept. We did NOT
    suggest using Nicks work as a template or format, only that
    the development of such a method of entry within PGV would
    be, perhaps, helpful.

    Actually, the USGenWeb transcription project has an EXCEL
    spreadsheet "model" for transcribers, but we thought that
    John would assist with the introduction of this type of
    device within PGV, and that the names, as listed, might end
    up linking to the INDI, as entered. The HUSB/WIFE already
    show the census data when entered into the FAM records
    (except when the bug arises as reported under future bugs).

    Each day we transribe over 20 census reports and thought it
    might be helpful for PGV users to settle on a style/method
    of including the information contained therein and reported
    in a consistent fashion, compliant with v5.5 standards and
    the notes capabilities. For relatives listed in the
    1850-1930 U.S. census data, the information has proven
    invaluable. For the earlier census reports, interpretation
    of the BD information, and household sexual structure has
    provided important clues to discovering the names of those
    inhabitants, particularly using the 1820/40 reports combined
    with the 1850 report.

    As we have reviewed hundreds of GEDs as listed on Rootsweb,
    we find that when Census data is noted, it is usually
    haphazard and incomplete and always listed differently
    within each GED. That's why we suggested a data entry method
    (read convention) be developed within PGV that would assure
    the maximum retention of the reported data, like the things
    you critcize are missing from Nick's british census effort
    for FHUG.
    -stephen

     
  • Mark Hattam

    Mark Hattam - 2005-06-19

    Logged In: YES
    user_id=623181

    Preventing the CENS facts appearing on a dead individual's INDI page was
    a feature introduced a few versions ago.

    Census has always proved difficult to implement ... do you
    a) enter it once in the FAM record, and one or more times for the INDI's
    who were Servants, Lodgers, Boarders, Visitors, Sister-in-laws, Mother-in-
    laws etc etc
    b) enter it only in INDI records which of course ends up with massive
    amounts of duplication.

    I use (a) above ... and I've got about 350-450 censused families in each of
    the UK censuses entered into my GEDCOM.

    And in the past two weeks I've transcribed 2 whole Pieces of the 1901 as
    well as checking several 1861 Pieces.

    Mark

     
  • Stephen Arnold

    Stephen Arnold - 2005-06-20

    Logged In: YES
    user_id=1061833

    Mark
    Tnx for the inquiry.
    Yes, we tend to do as you do, Option A. We sometimes enter a
    census still under the FAM record when one member is
    deceased and the remaining is widowed, usually when there
    are surviving offspring still living as a FAM unit with the
    surviving spouse, as this seems to us to be a better reflection.

    We then usually enter a more "reduced" information set for
    each of the other contributing individuals, although we have
    rarely added a servant or farm hand into the actual gedcom
    (guess we could, as an ASSOC, but just never have).

    We're pretty new at this, having only started a year ago. We
    do have a 28000+ GED file with about 1729 census records
    annotated. We add about 4-8 per day.

    Sounds like you're busy at it like we are.
    stephen

     
  • KosherJava

    KosherJava - 2006-06-20
    • assigned_to: nobody --> yalnifj
     
  • KosherJava

    KosherJava - 2006-06-20

    Logged In: YES
    user_id=634811

    Does the RA module resolve this request?

     
  • Stephen Arnold

    Stephen Arnold - 2006-06-21

    Logged In: YES
    user_id=1061833

    kosher

    Not really, particularly based on john's reply in
    discussions that the RA can/does not apply to FAM records,
    only INDI's. We have now entered over 3500 "1 CENS" records
    transcribed and would love to have some help with more, but
    fear that "fixing" other cousins' entries may require as
    much effort as entering the data ourselves without some data
    entry formats.

    Still think that a form could be developed to represent the
    blanks on each Census year's designs, both US and other
    countries, that could be accessed through the Edit Interface
    when the ADD a CENSUS is selected.

    thanks for the inquiry - Stephen

     
  • Anonymous

    Anonymous - 2006-07-27

    Logged In: YES
    user_id=1254634

    Its a shame no-one (with better technical skills than I)
    has yet picked this one up. There are some great ideas
    here so far.
    For the record, Nick Walker's latest version has addressed
    some of the earlier ctiticisms (not all).
    For me, the biggest improvement is an option to store the
    transcription data in a shared NOTE tag (eg - 0 NOTE
    @N33@) This is then x-ref to each indi. It reduces the
    duplication of transcription data acreoss the gedcom,
    without the limitations imposed by putting it in the FAM
    record. (see http://www.our-
    families.info/famtree/individual.php?
    pid=I10019&ged=Osborne.ged)

    If there's a skilled developer with the time to devote to
    this I for one would happily assist with the planning and
    testing work. Any takers?

     
  • John Finlay

    John Finlay - 2006-07-27

    Logged In: YES
    user_id=300048

    This is being worked on as part of the research assistant.

    There is *SO MUCH MORE* that we can do with these census
    forms than just have a simplified entry UI. There is a
    wealth of data on those forms.

    Here are the requirements that I am going off of:

    1. The system needs to be modular so that any entry form can
    be created. I don't want to hard-code up a census form and
    then have to duplicate the work for all of the other census
    forms and any other form that can be thought of. The forms
    should plug-in to a framework that allows for easily
    creating the forms without over-burdening the developer with
    all of the editing things.

    2. It should encourage proper source citations and
    automatically do as much work as possible. One of the
    biggest weaknesses of GEDCOM (and almost all genealogy
    applications regardless of the underlying datamodel) is that
    source citations are not global records but have to be
    manually enterred for every person they apply to. I don't
    like the idea of putting the citation in a global note
    because that breaks the source citation format. The
    solution I would like to see is a global source citation
    (CITE) record seperate from the current SOUR record. This
    would be a major revision to the GEDCOM standard.

    On the other hand, in order to make these forms work, we
    have to go and touch all of the person's records. Since we
    have to modify all of the records anyway, and since we are
    doing it programmatically, it is not a big deal to also
    duplicate the citation. If the UI can support only having
    to enter it once, does it really matter what the model looks
    like underneath? Well, of course altering the model gives
    us more efficient storage, and we should move in that
    direction. But moving in that direction will take time, and
    since we have to provide a backwards compatibility anyway,
    we might as well code to GEDCOM 5.5 now and when GEDCOM 5.6
    comes along the changes to support it on the editing should
    be fairly minimal.

    3. Any data entered through the forms should also be
    editable through the same form. And any changes should
    automatically filter through to all of the individuals
    without having to go back and manually edit all of the
    individual entries.

    4. There is a ton of data on those census forms and a lot of
    factual data can be inferred from them. So we should be
    able to attach each row of the census forms to the PGV
    individuals they related to, and then automatically compare
    the indi records with the census data and suggest facts to
    the user that they might want to infer from the census data
    and automatically have it cited to the census.

    This is why I haven't wanted to link the census records to
    families. Support will probably be added in the future
    since we need to be able to create marriage record forms and
    enter other family related facts. But current efforts are
    being directed towards individuals and not families.

    I have more to say about this area but I am out of time now.
    Please feel free to comment or make suggestions.

    --John

     
  • Anonymous

    Anonymous - 2006-07-27

    Logged In: YES
    user_id=1254634

    John, it is SO good to hear this is being worked on. I
    hope you'll excuse the impatience of users, but the better
    PGV gets the more we want!!! :)))

    I would like to re-iterate though, Stephen's original
    words. Nick Walker's programme does much of what you are
    proposing - including your point 4 below, which is
    extremely useful. Although his programme is C/R, it is
    also free - so I wonder how he would react to an approach
    from somebody of your standing to access the details of
    his code. At the very least, your students would find it
    invaluable to gain a working knowledge of what his
    software can do.

    The only issue I might have with your requirements is in
    point 2 "If the UI can support only having to enter it
    once, does it really matter what the model looks like
    underneath?" Given the size some user's gedcom's are
    reaching, I think it does matter, very much. It is also
    relevant in the transfer of gedcom files between users and
    systems. I don't have a quick answer, but hope that we can
    find a solution somewhere between maintaining strong
    adherance to GEDCOM standards (which I fully support)and
    practicality (pragmatism?). Are you, for example,
    absolutely sure that global notes are a problem? I think
    that evidence to date suggests that waiting for a revision
    to the standard is not going to get us anywhere soon.

    As I type this, I wonder if I correctly understand the
    terminology - particularly the word "citation".
    My reference to the use of global notes is as a store for
    the transcribed data from the census form, in a way that
    can be viwed at every INDI page, but without repeating it.
    This can be a substantial note, hence the concern for
    GEDCOM "bloat". I think of the "citation" as a brief
    reference to the page etc within the source that the data
    came from. That will be much smaller, and could therefore
    be repeated for each INDI.

    Nigel

     
  • Thomas52

    Thomas52 - 2006-08-12

    Logged In: YES
    user_id=1447380

    Most of what's been said here is Greek to me, but I know
    there are census forms in the modules/ra/forms folder.
    How can I use them?

     
  • John Finlay

    John Finlay - 2006-08-28

    Logged In: YES
    user_id=300048

    To use the forms, create a task in the research assistant,
    edit the task and click the complete button. Then choose
    the census form you want.

    --John

     
  • John Finlay

    John Finlay - 2006-08-28

    Logged In: YES
    user_id=300048

    Much of the new things I talked about below have been
    implemented and are now in the SVN. There is also now the
    option of linking the data from these census forms to families.

    Nigel,

    Re: your comments about point 2 and the data model... you
    are right that there are times when the data model makes a
    difference. Unfortunately, in genealogy the best data model
    depends on the view you want of the data.

    In this case though, the lack of a full data model requires
    the duplication of data throughout the gedcom. For example
    if we want to cite the same census as the source of
    information for a person's birth and for a census on the
    family the following source citation has to be duplicated:
    1 BIRT
    2 SOUR @S1@
    3 PAGE Twp, Co. State, Page 51
    3 DATA
    4 TEXT The note/comments/extracted text about this citation

    1 CENS
    2 SOUR @S1@
    3 PAGE Twp, Co. State, Page 51
    3 DATA
    4 TEXT The note/comments/extracted text about this citation

    The 2 SOUR line and it's subrecords make up a source citation.

    --John

     
  • Anonymous

    Anonymous - 2006-08-28

    Logged In: YES
    user_id=1254634

    Hi John - thanks. The key is in the 4 TEXT line. Thats
    where I put a global note instead, which at least
    partially reduces the 'bloat' - like this:

    1 CENS
    2 DATE 6 JUN 1841
    2 PLAC Biddenden, Kent, England
    2 ADDR Bigg Place, Standen
    2 AGE 8y
    2 SOUR @S4@
    3 PAGE HO107/471/7/9
    3 DATA
    4 DATE 6 JUN 1841
    3 NOTE @N22@

    There's still plenty of duplication - but the bulk is
    stored in the note.

    That said - I'm sure the way your are going will be the
    best result once fully implemented. At least I won't have
    to do all my census entry off-line any more.

    Is someone writing a wiki entry or similar for the whole
    RA module? As you may see from various queries on the
    forums, a lot of people are playing with it, and having
    trouble figuring out how to get the best from it.

    nigel

     
  • John Finlay

    John Finlay - 2006-08-28

    Logged In: YES
    user_id=300048

    There is a readme file in the SVN now which has some info
    about installing. Putting this information on a wiki is a
    great idea. But, I think it would be better to wait until
    the RA is out of its beta state before a great deal of
    effort is speant on end documentation. However, I feel like
    we are getting close.

    --John

     
  • Anonymous

    Anonymous - 2006-08-30

    Logged In: YES
    user_id=1254634

    John - I've just tried using the 1930 census form in RA,
    unsuccesfully. Not sure if I'm missing something, or
    moving ahead of development.

    I entered a task, then clicked 'completed'. I added all
    details for 4 people on the 1930 census, and got all the
    way to the "Success" logo.

    Now, I can see the people linked on the task - but I can't
    see any of the data entered through the form (where is it
    stored?), and there's no census data connected to INDI or
    FAM records. Where have I gone wrong?

    Nigel

     
  • John Finlay

    John Finlay - 2006-08-30

    Logged In: YES
    user_id=300048

    Hi Nigel,

    I just tested the 1930 census and worked correctly for me.
    Can you email me a link to your site and a test username and
    password so that I can see the problem?

    --John

     
  • KosherJava

    KosherJava - 2007-04-19

    Logged In: YES
    user_id=634811
    Originator: NO

    Can this be closed?

     
  • Stephen Arnold

    Stephen Arnold - 2007-04-19

    Logged In: YES
    user_id=1061833
    Originator: YES

    The RA works fine, but leaves me cold and still in the dark. If I am correct, rather than going to a family ID and using the CENS fact, SOUR and NOTE fields, I must go to the RA, create a task, then proceed to attach a SOUR and person or persons. The SOUR field is lookup only, allowing no manual entry of the Sxxx as well as the INDI. The INDI was not self-populated from going from the INDI to the RA, so another lookup is necessary. There was no note field, so I could not embellish the facts included in the form with a brief list of neighbors or other comments. Maybe it is just me, but it seemed cumbersome and very time consuming. I've got well over 300 census transcriptions already in place and add several each week.
    - Stephen

     
  • John Finlay

    John Finlay - 2007-04-20

    Logged In: YES
    user_id=300048
    Originator: NO

    Many of these items have been addressed in the most recent addition of the RA.

    Now the idea behind the RA is that you would create the task to find the census at the time it is determined that you need to look it up, before you actually find it, not after it has already been found. But to better accommodate the need to move more quickly through the workflow, you can now create a task and go right to the next step of choosing a census form. The process has been streamlined quite a bit, but I know that it can still be improved.

    I have yet to decide how (and more importantly "if") we should connect the attempt to add a CENS fact through the normal PGV method directly to the RA forms.

    You can add notes to the facts after you fill out the census form. That way the notes only appear on the CENS fact and not on any other facts such as BIRT, OCCU, etc. that might also be derived from the same census citation.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.