I have authored a lexer plugin (GedcomLexer.dll) for GEDCOM files, the standard file format used by genealogy applications to exchange data.
There are user defined languages for GEDCOM based on using tags as keywords. However, to perform syntax-checking and provide level folding, a lexer is needed.
The lexer follows the data representation grammar of GEDCOM specification version 5.5.1. It recognizes the possible tokens in a line: level, xref_id, tag, user tag, pointer, value, and escape. Each of these tokens has a default style supplied by GedcomLexer.xml and can be customized by the Style Configurator. When an invalid character in a token is detected, the lexer enters the Invalid state and outputs the remainder of the line in the Invalid style (default red). The Invalid state is reset when the end of line is reached.
In the current release, folding is based on the line level. In GEDCOM files, logical records begin at line level 0. Subordinate lines with levels 1 or higher contribute to the logical record which was defined by the level 0 line that preceded it. So, folding allows a user to see only level 0 lines (logical record starts) or level 0 lines plus selected additional levels, giving the user some control over the amount of detail displayed.
The plugin has been tested with Notepad++ 6.5.2, on Win 7, 8, and XP.
It has been tested with a variety of GEDCOM files (*.ged), including UTF-8, UTF-16, ANSI, and ASCII.
In this release the ANSEL character set is not supported.
Thank you for this, it is far better than the User-defined attempt I made.
I fear there is too much folding and when completely folded it means nothing to any one as all you're left with is a list of meaningless IDs.
One thing I would like is some way to show up the NAME Tag field better. At the moment I've made the ID stand out as that is unique and related to an individual but to have the surnames highlighted would be great so I can scan down the page looking for people more easily. Could the NAME Tag and it's contents be highlighted somehow?
all the best
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your comments Mark.
I will look into some ways of exposing NAMEs to make navigation through the file easier. Highlighting the NAME tag and its contents are certainly possibilities.
-Stan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Great idea, very clever, well done!
Unfortunately sorting is on the first name in the field which isn't, obviously, the surname. Is that the fault of Function List or can you tweak it in the parser element?
all the best
Mark
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the sorting on given name bothered me too.
I believe that is a limitation of having a parser based purely on a pair of regular expressions - you get a match string and cannot rearrange the parts of it, to make the surname come first. There are a lot of things that Perl regular expressions can do that surprised me, so someone with more expertise might find a way!
-Stan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have authored a lexer plugin (GedcomLexer.dll) for GEDCOM files, the standard file format used by genealogy applications to exchange data.
There are user defined languages for GEDCOM based on using tags as keywords. However, to perform syntax-checking and provide level folding, a lexer is needed.
The lexer follows the data representation grammar of GEDCOM specification version 5.5.1. It recognizes the possible tokens in a line: level, xref_id, tag, user tag, pointer, value, and escape. Each of these tokens has a default style supplied by GedcomLexer.xml and can be customized by the Style Configurator. When an invalid character in a token is detected, the lexer enters the Invalid state and outputs the remainder of the line in the Invalid style (default red). The Invalid state is reset when the end of line is reached.
In the current release, folding is based on the line level. In GEDCOM files, logical records begin at line level 0. Subordinate lines with levels 1 or higher contribute to the logical record which was defined by the level 0 line that preceded it. So, folding allows a user to see only level 0 lines (logical record starts) or level 0 lines plus selected additional levels, giving the user some control over the amount of detail displayed.
The plugin has been tested with Notepad++ 6.5.2, on Win 7, 8, and XP.
It has been tested with a variety of GEDCOM files (*.ged), including UTF-8, UTF-16, ANSI, and ASCII.
In this release the ANSEL character set is not supported.
This plugin project is hosted at SourceForge: https://sourceforge.net/projects/gedcomlexer/
where the DLL and source files can be found.
To find some GEDCOM files for testing, perform a Google search for:
I welcome any feedback.
-Stan
Thank you for this, it is far better than the User-defined attempt I made.
I fear there is too much folding and when completely folded it means nothing to any one as all you're left with is a list of meaningless IDs.
One thing I would like is some way to show up the NAME Tag field better. At the moment I've made the ID stand out as that is unique and related to an individual but to have the surnames highlighted would be great so I can scan down the page looking for people more easily. Could the NAME Tag and it's contents be highlighted somehow?
all the best
Thanks for your comments Mark.
I will look into some ways of exposing NAMEs to make navigation through the file easier. Highlighting the NAME tag and its contents are certainly possibilities.
-Stan
Brilliant - thanks Stan.
Mark,
GedcomLexer, v0.2 was released a couple of days ago.
With this release, there is support for using the Function List, to display NAMEs from INDI records, and locate persons more easily.
I have a detailed description on how set it up at the project website:
http://www.genapps.net/2014/02/new-in-gedcom-plugin-v02-navigating-by.html
-Stan
Great idea, very clever, well done!
Unfortunately sorting is on the first name in the field which isn't, obviously, the surname. Is that the fault of Function List or can you tweak it in the parser element?
all the best
Mark
Yes, the sorting on given name bothered me too.
I believe that is a limitation of having a parser based purely on a pair of regular expressions - you get a match string and cannot rearrange the parts of it, to make the surname come first. There are a lot of things that Perl regular expressions can do that surprised me, so someone with more expertise might find a way!
-Stan