Menu

#1072 Add lexer for GNU as

Completed
open
nobody
lexer (116)
5
2022-04-09
2014-09-07
gwr
No

I wrote a lexer for GNU assembler as.
Could it be integrated ?

1 Attachments

Related

Bugs: #1830
Feature Requests: #1314

Discussion

  • gwr

    gwr - 2014-09-07

    Here a screenshot

     
  • Neil Hodgson

    Neil Hodgson - 2014-09-07

    The SCE_GAS_* constants need definitions.

     
  • Neil Hodgson

    Neil Hodgson - 2014-09-08

    Some warnings were found by compilers and static checkers.

    ../lexers/LexGas.cxx: In member function 'virtual void LexerGas::Fold(unsigned int, int, int, IDocument*)':
    ../lexers/LexGas.cxx:1015:33: warning: variable 'lcd' set but not used [-Wunused-but-set-variable]
    int lcd = 0; // line count display ( = lc + 1 )
    ^
    ../lexers/LexGas.cxx:1023:33: warning: variable 'lco' set but not used [-Wunused-but-set-variable]
    int lco = 0;
    ^
    ..\lexers\LexGas.cxx(988) : warning C4100: '_initStyle' : unreferenced formal parameter
    c:\u\hg\scintilla\lexers\lexgas.cxx(942) : warning C4701: potentially uninitialized local variable 'sstr' used
    ..\lexers\LexGas.cxx(559) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
    ..\lexers\LexGas.cxx(568) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
    ..\lexers\LexGas.cxx(577) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data

    scintilla\lexers\LexGas.cxx:1020: style: The scope of the variable 'chNext' can be reduced.
    scintilla\lexers\LexGas.cxx:1023: style: The scope of the variable 'lco' can be reduced.
    scintilla\lexers\LexGas.cxx:1024: style: The scope of the variable 'atEol' can be reduced.
    scintilla\lexers\LexGas.cxx:1117: style: Variable 'lcd' is assigned a value that is never used.
    scintilla\lexers\LexGas.cxx:1120: style: Variable 'lvp' is assigned a value that is never used.
    scintilla\lexers\LexGas.cxx:1058: style: Variable 'lco' is assigned a value that is never used.
    scintilla\lexers\LexGas.cxx:823: warning: Array index -1 is out of bounds. Otherwise there is useless condition at line 824.

     
  • gwr

    gwr - 2014-09-08

    Interesting, I wasn't expecting that much.

    "scintilla\lexers\LexGas.cxx:823: warning: Array index -1 is out of bounds.
    Otherwise there is useless condition at line 824."
    ->Effectively the test is useless ; but I will keep it because of CppCheck

    What version of CppCheck do you use ? I have 1.54 and I don't have the same outputs.

    Are the corrections of CppCheck "style:scope can be reduced" messages mandatory ?

    Can you try with this one ? should be better.

     

    Last edit: gwr 2014-09-08
    • Neil Hodgson

      Neil Hodgson - 2014-09-08

      The test isn't actually useless - its just that the assignment to gas_extension_char needs to be inside the block guarded by the length check. The variable is only used inside that check.

      The point of the 3 scope reduction messages is that the variables have a valid value only inside each loop iteration. The values assigned do not get seen by the next iteration or by code after the loop has finished. Declaring the variables inside the loop makes it easier for the reader to reason about the behaviour. I run the check-everything script after every change so will be seeing the "style:scope can be reduced" many times and ignoring which means I am less likely to see other issues.

      CppCheck 1.66 was used for the checks.

      There are more warnings with the new version since printf is visiblewith type mismatches in format strings
      scintilla\lexers\LexGas.cxx:501: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.
      scintilla\lexers\LexGas.cxx:1075: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.
      scintilla\lexers\LexGas.cxx:1075: warning: %i in format string (no. 3) requires 'int' but the argument type is 'unsigned int'.
      scintilla\lexers\LexGas.cxx:1129: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.
      scintilla\lexers\LexGas.cxx:1134: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.
      scintilla\lexers\LexGas.cxx:1157: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.
      scintilla\lexers\LexGas.cxx:1162: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.

       
  • gwr

    gwr - 2014-09-09

    )Declaring the variables inside the loop makes it easier for )the reader to >reason about the behaviour
    I prefer having all the variables at the beginning of the function with a commentary at the end of the line.

    )The test isn't actually useless
    Yes it is, the code was crappy. CppCheck's analysis is static, and can't simulate the finite state machine dynamic running.

    I Switched to CppCheck 1.66. I Was not convinced about CppCheck until it found the printf bad formats.

    Can we agree about correcting all but "style:scope" warnings for LexGas.cxx ??

    Thanks for the time you spend on this.

     

    Last edit: gwr 2014-09-09
    • Neil Hodgson

      Neil Hodgson - 2014-09-09

      Can we agree about correcting all but "style:scope" warnings for LexGas.cxx ??

      Alright.

       
  • gwr

    gwr - 2014-09-09

    Please try with this version.
    -----------------------------------------------------------
    I have 2 questions ( if you have some time )

    In LexCPP.cxx, ( version 450, line 455 ) there is :

    if (static_cast<int>((sc.currentPos+1)) >= lineEndNext)</int>

    where
    sc.currentPos is an unsigned int
    lineEndNext is an int

    I think this code is wrong : unsigned int is for the position in the document, which can be huge, and can't be compared to an int , which amplitude is lower.
    Am I right ?

    More, in StyleContext.h :
    public:
    unsigned int currentPos;
    ...
    int lineStartNext;

    Why this mix between unsigned int and int for currentPos and lineStartNext ?

     

    Last edit: gwr 2014-09-09
    • Neil Hodgson

      Neil Hodgson - 2014-09-10

      Scintilla only supports documents as large as INT_MAX so casting the position to int will never fail. The reason for the disparity is historic and is not normally important. Arithmetic is safer with signed int.

      Where the current LexGas.cxx still has problems is with size_t. On 64-bit builds, size_t is 64-bits but int (and unsigned int) are 32-bits. That is why there are several warnings where the result of strlen (size_t) is jammed into an int:
      ..\lexers\LexGas.cxx(585) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
      ..\lexers\LexGas.cxx(594) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
      ..\lexers\LexGas.cxx(603) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
      All of these are std::string so know their length (the length() method) so calling strlen on their c_str() is obtuse.
      http://www.cplusplus.com/reference/string/string/length/

      Line 517 has a format mismatch:
      scintilla\lexers\LexGas.cxx:517: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.

      For the _initStyle warning simply omit the parameter name or comment it:

      void SCI_METHOD LexerGas::Fold(unsigned int startPos, int length, int, IDocument *pAccess) {

      or

      void SCI_METHOD LexerGas::Fold(unsigned int startPos, int length, int / initStyle /, IDocument *pAccess) {

      There's some old code included in comments like
      //const bool userDefinedFoldMarkers = !options.foldExplicitStart.empty( ...
      This is quite thoroughly dead since the variables it references no longer exist. Retaining this code is just confusing.

       
  • Neil Hodgson

    Neil Hodgson - 2014-09-10

    Since GasC2C is only used by LXGLDB it should be wrapped in the same #if so that it is only defined when used.

    There are also some issues reported by clang analyze:

    /Users/neil/merc/scintilla/lexers/LexGas.cxx:726:9: warning: Value stored to 'state' is never read
    state = eGasAsEtiquette;
    ^ ~~~~~~~~~~~~~~~
    /Users/neil/merc/scintilla/lexers/LexGas.cxx:731:5: warning: Value stored to 'state' is never read
    state = eGasAsDirective; // else assume AS_DIRECTIVE
    ^ ~~~~~~~~~~~~~~~
    2 warnings generated.

    These must be wrapped in #ifndef clang_analyzer to prevent warnings:

    :::c++
        if ( ch == ':' )                                                            //  AS_ETIQUETTE
        {
    #ifndef __clang_analyzer__
            state           =   eGasAsEtiquette;
    #endif
            wi[wc].a_type   =   eWordAsEtiquette;
            goto lab_state_AS_ETIQUETTE_found;
        }
    
    #ifndef __clang_analyzer__
        state           =   eGasAsDirective;                                        //  else assume AS_DIRECTIVE
    #endif
        wi[wc].a_type   =   eWordAsDirective;
        goto lab_state_AS_DIRECTIVE_found;
    
     

    Last edit: Neil Hodgson 2014-09-10
  • gwr

    gwr - 2014-09-10

    warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data

    Sorry, just forgot to code it. In fact this tests were unuseful ( see code comments )

    All of these are std::string so know their length (the length() method) so calling strlen on their c_str() is obtuse.

    Done ( I never use std::string )

    Line 517 has a format mismatch:
    scintilla\lexers\LexGas.cxx:517: warning: %i in format string (no. 2) requires 'int' but the argument type is 'unsigned int'.

    CppCheck missed it, me too. I checked myself for all others.

    For the _initStyle warning simply omit the parameter name or comment it:

    Done, we learn everyday

    There's some old code included in comments like...

    Removed. I guessed they could be useful further.

    Since GasC2C is only used by LXGLDB it should be wrapped in the same #if so that it is only defined when used.

    Done.CppCheck 1.54 warned, CppCheck 1.66 did not, so I kept...

    There are also some issues reported by clang analyze: warning: Value stored to 'state' is never read
    state = eGasAsEtiquette;

    Lines commented. It was only helpful while coding the finite state machine.

     
  • Neil Hodgson

    Neil Hodgson - 2014-09-10

    That's mostly good.

    The property naming convention uses the prefix "fold.<lexer>" for properties that affect folding. Following the convention makes it more likely that applications will understand the role of the properties.</lexer>

    Do you want a particular name string in the credits?

     
  • gwr

    gwr - 2014-09-11

    The property naming convention uses the prefix "fold.<lexer>" for properties</lexer>

    Done.

    Do you want a particular name string in the credits?

    My name : Guillaume WARDAVOIR

    I made some ( very little ) changes for further handling of different Gnu as's targets ( currently, only implemented is i386 ).

    For example the language file gas_i386.properties targets x86 but use LexGas.cxx. The SCE_GAS_ constants will remain common to all targets.

    That's mostly good.

    What do you mean by "almost" ?

    And again, thank you.

     

    Last edit: gwr 2014-09-11
    • Neil Hodgson

      Neil Hodgson - 2014-09-12

      "Mostly good" because of the property naming convention.

      Adding extra modes for different processors appears incomplete. For example, there is a simple SCLEX_GAS with a 2 part gas_i386. How will the properties files differentiate between processor targets? There are about 50 targets supported by gas and adding a lexer for each will be unwieldy.

      There is a new warning from MSVC:
      ..\lexers\LexGas.cxx(229) : warning C4482: nonstandard extension used: enum 'LexerGas::eTarget' used in qualified name

       
  • gwr

    gwr - 2014-09-12

    The targets specializations implementation should be discussed.

    adding a lexer for each will be unwieldy

    It was the idea. One lexer LexGas.cxx, one SCLEX_GAS lexer id, one set of SCE_GAS_... constants, and many gas_....properties files, for :
    - putting in common : the directives, the comments, ...
    - specialize :registers,...

    If you prefer a particular implementation, please tell.

    Anyway, find here attached the "naming-convention-corrected" version of previous "branch" ( where gas <=> i386 ).We can stay on this for the moment.

     
    • Neil Hodgson

      Neil Hodgson - 2014-09-12

      Amending an existing structure to support multiple processors will be more trouble than working out how to do this first.

      There doesn't appear to be much difference between the syntax for each target processor. The main difference is the line comment character https://en.wikipedia.org/wiki/GNU_Assembler#Comments. Its likely to be less trouble to provide one lexer and add a property to set the line comment character.

      For projects with files for different processors there should be a way of matching the instructions and other settings to the file. SciTE isn't great here: you can override the settings for a directory with the local properties file and for each file with individual settings but that means adding 10-20 settings (fold., keywords, command.*) for each file. Other applications may have better ways of defining modes and submodes and applying them to files.

      For SciTE, please note that the individual properties files are not namespaces. If both gas.properties and asm.properties are included, then there will be two definitions of mem_instruction read and the one used is arbitrary. Thus, the gas lexer may not highlight movb because the setting of mem_instruction from asm.properties has taken precedence.

      If multiple processors are supported then there should be a way of separating their instructions, such as a processor name prefix to the property names.

      The value of SCLEX_GAS will be 117 as the BibTeX lexer took 116.

       
  • Zufu Liu

    Zufu Liu - 2022-04-09
    • labels: --> lexer
     

Log in to post a comment.