Menu

#185 Stable output for distinct token fields

closed
None
1
2017-03-19
2016-12-15
No

For the moment, you're using a dict to store distinct fields, and not sorting them when outputing to C/C++ code. The result is that the orders of the fields changes. This is unpractical because:

  1. Debugging is made harder
  2. Binding quex to non C languages is harder (here Ada), because you cannot rely on the order of the fields being stable, eg. you have to regenerate the token structure everytime, or use an intermediate wrapper.

This could be solved either using an ordered dict rather than a dict, or sorting the fields before emitting them.

What do you think ?

Discussion

  • Frank-Rene Schäfer

    Ticket moved from /p/quex/bugs/295/

     
  • Frank-Rene Schäfer

    Please, provide an example. Do you mean, you want to have the token-ids sorted by name?

     
  • Frank-Rene Schäfer

    Ticket moved from /p/getpot/support-requests/2/

     
  • Raphael Amiard

    Raphael Amiard - 2016-12-19

    Hi !

    Sure, here is the definition of my token_type fields:

        standard {
            id         : uint16_t;
        }
    
        distinct {
            text       : const QUEX_TYPE_CHARACTER*;
            len        : uint32_t;
            offset     : uint32_t;
            end_line   : uint32_t;
            end_column : uint16_t;
            last_id    : uint16_t;
        }
    

    Which will produce this token definition in C:

    typedef struct quex_Token_tag {
        QUEX_TYPE_TOKEN_ID    _id;
    
    #   line 21 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
        const QUEX_TYPE_CHARACTER* text;
    
    #   line 50 "quex_lexer-token.h"
    
    #   line 22 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
        uint32_t                   len;
    
    #   line 55 "quex_lexer-token.h"
    
    #   line 24 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
        uint32_t                   end_line;
    
    #   line 60 "quex_lexer-token.h"
    
    #   line 25 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
        uint16_t                   end_column;
    
    #   line 65 "quex_lexer-token.h"
    
    #   line 23 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
        uint32_t                   offset;
    
    #   line 70 "quex_lexer-token.h"
    
    #   line 26 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
        uint16_t                   last_id;
    
    #   line 75 "quex_lexer-token.h"
    
    #   ifdef     QUEX_OPTION_TOKEN_STAMPING_WITH_LINE_AND_COLUMN
    #       ifdef QUEX_OPTION_LINE_NUMBER_COUNTING
            QUEX_TYPE_TOKEN_LINE_N    _line_n;
    #       endif
    #       ifdef  QUEX_OPTION_COLUMN_NUMBER_COUNTING
            QUEX_TYPE_TOKEN_COLUMN_N  _column_n;
    #       endif
    #   endif
    
    } quex_Token;
    

    As you can see, the fields are not ordered similarly in the C code, and no order is guaranteed, eg. it can change from one compilation to the next. This is very problematic because it can break binary compatibility, eg, if I recompile my lexer I have to recompile every dependency, even if the lexer, or the fields, didn't change.

    Also I'm binding to Ada, where this causes other complications.

    This all happens because you're using a regular dict in token_type.py:

    class TokenTypeDescriptorCore:
        """Object used during the generation of the TokenTypeDescriptor."""
        def __init__(self, Core=None):
            if Core is None:
                ...
                self.distinct_db = {}  # <---- Here
    

    And not ordering the fields before rendering in token_class_maker.py (I believe ?):

    def get_distinct_members(Descr):
        # '0' to make sure, that it works on an empty sequence too.
        TL = Descr.type_name_length_max()
        NL = Descr.variable_name_length_max()
        txt = ""
        for name, type_code in Descr.distinct_db.items():
            txt += __member(type_code, TL, name, NL)
        #txt += Lng._SOURCE_REFERENCE_END()
        return txt
    

    The fix is relatively easy, either:

    1. Use an OrderedDict, so that the insertion order is preserved:
    2. Sort the fields before rendering, so that they always have the same order

    In my opinion, 1. is better because it has the added benefit of preserving the order in which the user has specified the fields.

     
  • Frank-Rene Schäfer

    Ticket moved from /p/quex/support-requests/7/

     
  • Frank-Rene Schäfer

    I see. This is a good point.
    I will include this in the next release.
    Thanks for the note.

     
  • Raphael Amiard

    Raphael Amiard - 2016-12-19

    Well thank you :)

     
  • Frank-Rene Schäfer

    • status: open --> pending
    • Group: v1.0_(example) --> Next_Release_(example)
     
  • Frank-Rene Schäfer

    implemented since 0.67.1. Please, confirm!

     
  • Frank-Rene Schäfer

    • status: pending --> closed
     
  • Frank-Rene Schäfer

    Close due to lack of objection. Assume task accomplished.

     

Log in to post a comment.