Menu

#1420 Updated X12 lexer for better bad terminator handling

Committed
closed
nobody
5
2021-12-06
2021-11-04
Iain Clarke
No

I've updated to the latest Sctintilla/Lexilla, and brought over some recent changes I made to handle badly formatted X12, especially segment terminators followed by whitespace.

1 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2021-11-05

    That appears better to me. Line ends are more consistently handled.

    The lexers from Scintilla have been moved into a separate project called Lexilla.
    https://github.com/ScintillaOrg/lexilla

    Lexilla is trying to improve the testing of lexers through including test files for lexers (in lexilla/test/examples/<lexer>). The TestLexers program described by lexilla/test/README runs each lexer over its test files in various ways. Lexing is incremental and one source of bugs in lexers is not handling changes to different positions within a file correctly even if lexing the whole file works well.

    The patched lexer (and probably the original) shows problems with incremental folding on the attached file with the message C:\u\hg\lexilla\test\examples\x12\x.x12:1: per-line has different folds. Make a meaningless change to a line that is a fold header and its likely the fold structure will be disrupted. In SciTE, its easier to see the folding structure data by turning on View | Line Numbers and adding these settings:

    fold.flags=64
    line.margin.width=10
    lexer.*.x12=x12
    

    There's one warning for a shadowed variable that should be fixed by you as this is an ambiguity that is easy to 'fix' incorrectly by someone unfamiliar with the X12 language.

    C:\u\hg\lexilla\lexers\LexX12.cxx(262): warning C6246: Local declaration of 'c' hides declaration of the same name in outer scope. For additional information, see previous declaration at line '223' of 'c:\u\hg\lexilla\lexers\lexx12.cxx'.
        for (auto& c : m_SeparatorSegment)
    

    To be more similar to other lexers (and not trip up TestLexers), DescribeProperty and DescribeWordListSets should return "" instead of NULL.

    Since you know the language, you can probably improve the test file to include more important cases and to cover all styles output by the lexer.

     
  • Iain Clarke

    Iain Clarke - 2021-11-09

    I hadn't spotted the test files - that's a really nice feature.

    The folding was doing a fine job of keeping track of the indent as it went on - but a bad job of getting the previous lines indent when it was folding from a later point. That caused the problem you mentioned of disappearing folds - and it was from an earlier version as you suspected.

    The NULL->""s are fixed, as is the variable shadowing.

    To the test file, I've added an extra function group, and each function group gets a varying number of message. The resulting X12 package is very unlikely to match any message specification [+] , but as far as the lexer / folding is concerned, it's grammatically correct.

    I can only add one file, so I've zipped the lexer test case, along with the CPP file. I suspect you can tell which is which!

    My compliments on the test suite for the lexers!

    /Iain.

    [+] Similarly, you can have a file that is valid XML, and lex-able, but it might not match an XSD.

     
  • Neil Hodgson

    Neil Hodgson - 2021-11-13
    • labels: x12 lexilla --> x12, lexilla
    • status: open --> accepted
    • Group: Initial --> Committed
     
  • Neil Hodgson

    Neil Hodgson - 2021-12-06
    • status: accepted --> closed
     

Log in to post a comment.