On 7/3/2015 7:05 PM, Kevin Routley wrote:
Hi, Stan -
I've got another bug for you. The next GED file I tried has various "extended" punctuation characters in some NOTE lines - probably pasted in from Doc, Wordpad, etc. The plugin highlights them as invalid.The attached smallest.ged shows the problem. The "-" character in the NOTE line is not a standard minus (0x2D) but something else. DebugTrace tells me it is 0xE2, followed by 0x80.
At that character, the state machine switches from LS_VALUE to LS_ERROR because isInvalidControlChar() returns true for 0xE2.
I changed the declaration of isInvalidControlChar() from
bool isInvalidControlChar(char ch) to
bool isInvalidControlChar(unsigned char ch) and all is as I desire.I stumbled across your blog entry about the Function List capability and that looks useful - something new to explore!
My thanks again for this useful tool!
Kevin
Anonymous
7/6/2015
Hi Kevin,
The example GEDCOM you provided has "CHAR ASCII" in the header, so its contents should only use 7-bit ASCII. In this case it is an error to embed a UTF-8 character. However, if the header is changed to "CHAR UTF-8", the EM DASH (0xe2 0x80 0x94) character is still flagged as an error and that is a bug.
Currently, no attempt is made to check the buffer encoding when validating characters nor is the CHAR tag setting taken into consideration. This would require closer integration with Scintilla (the editor component) and I didn't want to make that extra effort.
-Stan
Last edit: smitchell 2016-12-07