Lexical Analyzer Generator Quex / Feature Requests / #185 Stable output for distinct token fields

#185 Stable output for distinct token fields

Milestone: Next_Release_(example)

Status: closed

Owner: Frank-Rene Schäfer

Labels: None

Priority: 1

Updated: 2017-03-19

Created: 2016-12-15

Creator: Raphael Amiard

Private: No

For the moment, you're using a dict to store distinct fields, and not sorting them when outputing to C/C++ code. The result is that the orders of the fields changes. This is unpractical because:

Debugging is made harder
Binding quex to non C languages is harder (here Ada), because you cannot rely on the order of the fields being stable, eg. you have to regenerate the token structure everytime, or use an intermediate wrapper.

This could be solved either using an ordered dict rather than a dict, or sorting the fields before emitting them.

What do you think ?

Discussion

Frank-Rene Schäfer - 2016-12-15

Ticket moved from /p/quex/bugs/295/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2016-12-15

Please, provide an example. Do you mean, you want to have the token-ids sorted by name?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2016-12-15

Ticket moved from /p/getpot/support-requests/2/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi !

Sure, here is the definition of my token_type fields:

    standard {
        id         : uint16_t;
    }

    distinct {
        text       : const QUEX_TYPE_CHARACTER*;
        len        : uint32_t;
        offset     : uint32_t;
        end_line   : uint32_t;
        end_column : uint16_t;
        last_id    : uint16_t;
    }

Which will produce this token definition in C:

typedef struct quex_Token_tag {
    QUEX_TYPE_TOKEN_ID    _id;

#   line 21 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
    const QUEX_TYPE_CHARACTER* text;

#   line 50 "quex_lexer-token.h"

#   line 22 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
    uint32_t                   len;

#   line 55 "quex_lexer-token.h"

#   line 24 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
    uint32_t                   end_line;

#   line 60 "quex_lexer-token.h"

#   line 25 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
    uint16_t                   end_column;

#   line 65 "quex_lexer-token.h"

#   line 23 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
    uint32_t                   offset;

#   line 70 "quex_lexer-token.h"

#   line 26 "/home/amiard/libadalang/src/libadalang/build/include/libadalang/ada.qx"
    uint16_t                   last_id;

#   line 75 "quex_lexer-token.h"

#   ifdef     QUEX_OPTION_TOKEN_STAMPING_WITH_LINE_AND_COLUMN
#       ifdef QUEX_OPTION_LINE_NUMBER_COUNTING
        QUEX_TYPE_TOKEN_LINE_N    _line_n;
#       endif
#       ifdef  QUEX_OPTION_COLUMN_NUMBER_COUNTING
        QUEX_TYPE_TOKEN_COLUMN_N  _column_n;
#       endif
#   endif

} quex_Token;

As you can see, the fields are not ordered similarly in the C code, and no order is guaranteed, eg. it can change from one compilation to the next. This is very problematic because it can break binary compatibility, eg, if I recompile my lexer I have to recompile every dependency, even if the lexer, or the fields, didn't change.

Also I'm binding to Ada, where this causes other complications.

This all happens because you're using a regular dict in token_type.py:

class TokenTypeDescriptorCore:
    """Object used during the generation of the TokenTypeDescriptor."""
    def __init__(self, Core=None):
        if Core is None:
            ...
            self.distinct_db = {}  # <---- Here

And not ordering the fields before rendering in token_class_maker.py (I believe ?):

def get_distinct_members(Descr):
    # '0' to make sure, that it works on an empty sequence too.
    TL = Descr.type_name_length_max()
    NL = Descr.variable_name_length_max()
    txt = ""
    for name, type_code in Descr.distinct_db.items():
        txt += __member(type_code, TL, name, NL)
    #txt += Lng._SOURCE_REFERENCE_END()
    return txt

The fix is relatively easy, either:

Use an OrderedDict, so that the insertion order is preserved:
Sort the fields before rendering, so that they always have the same order

In my opinion, 1. is better because it has the added benefit of preserving the order in which the user has specified the fields.

Frank-Rene Schäfer - 2016-12-19

Ticket moved from /p/quex/support-requests/7/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2016-12-19

I see. This is a good point.
I will include this in the next release.
Thanks for the note.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raphael Amiard - 2016-12-19

Well thank you :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2017-03-14

status: open --> pending

Group: v1.0_(example) --> Next_Release_(example)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2017-03-14

implemented since 0.67.1. Please, confirm!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2017-03-19

status: pending --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2017-03-19

Close due to lack of objection. Assume task accomplished.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stable output for distinct token fields

Generator of lexical analyzers in C and C++. Unicode Supported.

Group

Searches

Help

#185 Stable output for distinct token fields

Discussion