Menu

#1166 EDIFACT Lexer

Committed
closed
5
2018-05-04
2016-11-22
Iain Clarke
No

I've had a SCLEX_CONTAINER lexer for Scintilla for a while now, and thought it would be nice to give back, as I've got a lot of benefit from Scintilla over the years. It's fairly simple, but it gives a different style to segment headers, element and composite punctuation, and to special segment headers. Lastly, if the edifact file is not closed properly (with a ') it will show an error style.

Attached is a diff of the necessary files (the sce_edi defines, the calalogue addition, and a new LexEDIFSCT.cxx file).

I hope this benefits others!

Iain.

1 Attachments

Discussion

  • Neil Hodgson

    Neil Hodgson - 2016-11-23
    • labels: lexer --> lexer, edifact, scintilla
    • assigned_to: Neil Hodgson
     
  • Neil Hodgson

    Neil Hodgson - 2016-11-23

    Thanks for the contribution. There is a good chance the lexer will be useful to some people in its current form.

    There are a few warnings and other problems with the code currently which means it can't be included in the standard distribution.

    There should be a reference to the license as is present in other lexers. Without any indication of a license, it is dangerous to include code this in a project.

    Scintilla has to build with old compilers across several perating systems so C++11 features are not available, including auto, nullptr and override. This should change in 2017.

    Version is the version of the ILexer interface supported which should be lvOriginal. See LexerBasic.

    ../lexers/LexEDIFACT.cxx: In member function 'virtual void LexerEDIFACT::Release()':
    ../lexers/LexEDIFACT.cxx:34:10: warning: deleting object of polymorphic class type 'LexerEDIFACT' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
       delete this;
              ^
    

    Add a virtual destructor.

    ../lexers/LexEDIFACT.cxx: In member function 'virtual void LexerEDIFACT::Lex(Sci_PositionU, Sci_Position, int, IDocument*)':
    ../lexers/LexEDIFACT.cxx:103:40: warning: overflow in implicit constant conversion [-Woverflow]
      pAccess->StartStyling(posCurrent, 0xFF);
                                            ^
    

    Use '\377' which is a char.

    ../lexers/LexEDIFACT.cxx: In member function 'Sci_Position LexerEDIFACT::InitialiseFromUNA(IDocument*, Sci_PositionU)':
    ../lexers/LexEDIFACT.cxx:180:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (startPos >= MaxLength)
                    ^
    
    ..\lexers\LexEDIFACT.cxx(40): warning C4100: 'name': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(44): warning C4100: 'name': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(48): warning C4100: 'val': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(48): warning C4100: 'key': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(56): warning C4100: 'wl': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(56): warning C4100: 'n': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(62): warning C4100: 'pointer': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(62): warning C4100: 'operation': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(103): warning C4309: 'argument': truncation of constant value
    ..\lexers\LexEDIFACT.cxx(96): warning C4100: 'initStyle': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(168): warning C4100: 'pAccess': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(168): warning C4100: 'initStyle': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(168): warning C4100: 'lengthDoc': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(168): warning C4100: 'startPos': unreferenced formal parameter
    ..\lexers\LexEDIFACT.cxx(180): warning C4018: '>=': signed/unsigned mismatch
    

    VS2008 is particularly fun:

    ..\lexers\LexEDIFACT.cxx(29) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(33) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(37) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(41) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(45) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(49) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(53) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(57) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(60) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(61) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(63) : warning C4481: nonstandard extension used: override specifier 'override'
    ..\lexers\LexEDIFACT.cxx(38) : error C2065: 'nullptr' : undeclared identifier
    ..\lexers\LexEDIFACT.cxx(46) : error C2065: 'nullptr' : undeclared identifier
    ..\lexers\LexEDIFACT.cxx(54) : error C2065: 'nullptr' : undeclared identifier
    ..\lexers\LexEDIFACT.cxx(64) : error C2065: 'nullptr' : undeclared identifier
    ..\lexers\LexEDIFACT.cxx(103) : warning C4309: 'argument' : truncation of constant value
    ..\lexers\LexEDIFACT.cxx(124) : error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
    ..\lexers\LexEDIFACT.cxx(180) : warning C4018: '>=' : signed/unsigned mismatch
    ..\lexers\LexEDIFACT.cxx(199) : warning C4127: conditional expression is constant
    ..\lexers\LexEDIFACT.cxx(213) : error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
    ..\lexers\LexEDIFACT.cxx(213) : error C2440: 'initializing' : cannot convert from 'const char *' to 'int'
            There is no context in which this conversion is possible
    ..\lexers\LexEDIFACT.cxx(215) : error C2109: subscript requires array or pointer type
    ..\lexers\LexEDIFACT.cxx(215) : fatal error C1903: unable to recover from previous error(s); stopping compilation
    

    Calling pAccess->BufferPointer() can be expensive on large files as it copies (on average) half the document with each call to make all the text contiguous when there has been a change. So, if the user is typing, the copy occurs after each character is typed. Most lexer code should use LexAccessor::operator[] or LexAccessor::SafeGetCharAt which don't collapse the gap.

     
  • Iain Clarke

    Iain Clarke - 2016-11-23

    I've sorted out the above issues. I don't have access to GCC etc, but the new cpp file compiles cleanly in vs2015 and 2005. I think it's safe to assume it compiles cleanly in inbetween versions too!

    Feel free to edit the comments at the start or suggest changes. I am happy for licensing to bewide open, though vanity makes me want my name to stay up there.

     
  • Iain Clarke

    Iain Clarke - 2016-11-23

    And here's one with folding operational too. It folds at UNH segments, headers for messages inside the overall package.

     
  • Neil Hodgson

    Neil Hodgson - 2016-11-23

    OK, that's better.

    However there is a warning from VS 2015:

    ..\lexers\LexEDIFACT.cxx(244) : warning C4127: conditional expression is constant
    

    The code is a little weird, using a single iteration loop to be able to break instead of using ifs but it also appears wrong. The call to memcpy will always succeed since memcpy returns its destination argument. Perhaps you want memcmp?

    FindPreviousEnd can be marked const.

    LexerModule.h should be at the end of the list of #includes. Scintilla has a canonical header order defined in scripts/HeaderOrder.txt and a checking script scripts/HeaderCheck.py to ensure there aren't header ordering bugs and precompiled headers work reasonably.

     
  • Iain Clarke

    Iain Clarke - 2016-11-24

    The memcpy was a stupid typo - I had already fixed it, but had to go back in time to an earlier version when I messed up folding, and missed that memcpy.

    I quite like the do { } while (0); with breaks when you have a lot of failure conditions to jump out from, as it prevents a lot of if nesting. but as there's not much of an endurance test here, I've changed to a couple of if's instead.

    All my helper functions have been const'd, except for InitialiseFromUNA (). I am a big fan of const, and the new override modifiers - anything that makes it harder to do silly mistakes must be good.

    The include order has been fixed, and a comment added to remind future Iain not to mess it up later.

     
  • Neil Hodgson

    Neil Hodgson - 2016-11-25
    • Group: Completed --> Committed
     
  • Neil Hodgson

    Neil Hodgson - 2016-11-25

    Committed as [fa9991].

     

    Related

    Commit: [fa9991]

  • Neil Hodgson

    Neil Hodgson - 2016-11-27

    Additional changes committed to enable building on macOS: [94f0d4].

     

    Related

    Commit: [94f0d4]

  • Neil Hodgson

    Neil Hodgson - 2016-12-05
    • status: open --> closed
     
  • Kalle Kloak

    Kalle Kloak - 2018-04-23

    Thanks for implementing this Lexer. I'd like to suggest some improvements...
    Currently UNA line is colored in one color and most of the tags are same colors.
    If you look at TextWrangler editor, they have made it a bit more flexible. Screenshot can be found here for inspiration; https://github.com/pliegl/textwrangler-edi

     
  • Iain Clarke

    Iain Clarke - 2018-04-23

    Hello Kalle,
    The main difference that I can see is that "I" return three segment styles - normal (shared with segment separator), UNA and UNH. Whereas they have: Normal, UNA and UN*.

    It would not be hard to change the match for UNH to UN. The question is whether that's the right thing to do.

    I suppose we do have options, like "fold", so "make UN special" woud not be hard.... I shall have a think.

    Iain.

     
  • Iain Clarke

    Iain Clarke - 2018-04-23

    I've made the changes, and attached a patch and new cxx file. Give them a play, and comment, before Neil need to have an opinion. - Iain.

     
  • Iain Clarke

    Iain Clarke - 2018-04-23

    And the cxx file.

     
  • Neil Hodgson

    Neil Hodgson - 2018-04-23

    OK. The name of the property should be namespaced with a "lexer.edifact." prefix which segregates each lexer's properties except for some of the earliest.

    As well as runtime property discovery, there is also a script, scintilla/scripts/ScintillaData.py, that can discover all the properties for documentation or code generation. Some of SciTE's documentation is created by generating HTML from the property documentation. This is mostly built around the OptionSet class for new lexers or recognizing calls to GetProperty and some commenting conventions for old lexers but it can be triggered by placing a comment above the retrieval of the property plus a "// GetProperty" comment at the line end. Something like:

    // property lexer.edifact.highlight.un.all
    //  Whether to apply UN* highlighting to all UN segments, or just to UNH
    if (!strcmp(key, "lexer.edifact.highlight.un.all")) // GetProperty
    

    Be sure to run the script to make sure your lexer turns up in the list of lexers, the property name is in the list of property names and the documentation is there too.

     

    Last edit: Neil Hodgson 2018-04-23
  • Iain Clarke

    Iain Clarke - 2018-05-04

    Here's an updated patch, using the better name above, and some // property comments added. I could not fine any // GetProperty comments in other lexers to compare against, but I did find the // property ones.
    I currently have 4.0.3 - if they've been added later, I apologise.

     
    • Neil Hodgson

      Neil Hodgson - 2018-05-06

      There are 2 separate elements found by the script: documentation and property names. The documentation is found from your addition but the property name isn't - the script doesn't look for "strcmp" as that is quite common. It looks for lines that contain GetProperty or DefineProperty and with a quoted string. I added "// GetProperty" to the change set. No other lexers use this since they call either GetProperty* or DefineProperty.

      The "fold" property is common to almost all lexers so adding many slightly different pieces of documentation would not be useful. Removing the comment above the declaration m_bFold omits that documentation.

      The updated change set is committed as [5f511d].

       

      Related

      Commit: [5f511d]


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.