Menu

Performance question about Quex

ollydbg
2010-08-01
2012-10-02
  • ollydbg

    ollydbg - 2010-08-01

    I would like to consider as a lexer(tokenizer) to support the "Codecompletion
    plugin in Codeblocks IDE".

    as the home page said, the code directed lexer is faster than the table driven
    lexer. then Quex is a code directed lexer.
    I just do a comparesion of the flex. I use the C/004 demo code, and I disabled
    the cout output in the while loop, and also disable the assert macro. by
    lexing a 8M cpp source code, it takes about 1.5s.

    I just take some extra test by using flex gramma, and the result is quite
    fast, it is only about 150 ms.
    See a post link of mine in the codeblocks' forum:

    http://forums.codeblocks.org/index.php/topic,13017.msg87659.html#msg87659

    Any ideas?

    Thanks.

     
  • Frank-Rene Schäfer

    Did you disable the asserts? Can you provide à Sample Application?

     
  • ollydbg

    ollydbg - 2010-08-01

    Thanks for your reply.

    I have disabled the "assert" by adding -DQUEX_OPTION_ASSERTS_DISABLED to the
    compiler options.
    I use MinGW 4.4.4 compiler Windows XP,python 2.6.5.

    The code is exact the same as minimal sample, see below:

    #include<fstream>
    #include<iostream>
    
    // (*) include lexical analyser header
    #include "tiny_lexer"
    
    #ifndef     ENCODING_NAME
    #    define ENCODING_NAME (0x0)
    #endif
    
    #define PRINT_TOKEN
    
    #ifdef _WIN32
    
    #include<Windows.h>
    
    class CTimer {
    private:
    
        LARGE_INTEGER m_base;
        LARGE_INTEGER m_temp;
        float m_resolution;
    
    public:
    
        CTimer() {
            LARGE_INTEGER t_freq;
            QueryPerformanceFrequency(&t_freq);
            m_resolution = (float) (1.0f / (double) t_freq.QuadPart);
            reset();
        }
    
        void reset() {
            QueryPerformanceCounter(&m_base);
        }
    
        inline float GetCurrentTime() {
            QueryPerformanceCounter(&m_temp);
            return (m_temp.QuadPart - m_base.QuadPart) * m_resolution * 1000.0f;
        }
    
        void SleepTime(float ms_val) {
            float ms_st = GetCurrentTime();
            while (GetCurrentTime()-ms_st < ms_val)
                continue;
        }
    
    };
    #else
    class CTimer {
    private:
        unsigned long m_base;
    public:
        CTimer() {
            reset();
        }
    
        void reset() {
            timeval t;
            gettimeofday(&t, NULL);
            m_base = t.tv_sec;
        }
    
        float GetCurrentTime() {
            timeval t;
            gettimeofday(&t, NULL);
            return 1000 * (t.tv_sec - m_base) + t.tv_usec * 0.001f;
        }
    };
    #endif
    
    int
    main(int argc, char** argv)
    {
        using namespace std;
    
        quex::Token*       token_p = 0x0;
        // (*) create the lexical analyser
        //     1st arg: input file, default = 'example.txt'
        //     2nd arg: input character encoding name, 0x0 --> no codec conversion
        quex::tiny_lexer    qlex(argc == 1 ? "example.txt" : argv[1], ENCODING_NAME);
    
        cout << ",-----------------------------------------------------------------\n";
        cout << "| [START]\n";
    
        CTimer aaa;
    
        int number_of_tokens = 0;
        // (*) loop until the 'termination' token arrives
        do {
            // (*) get next token from the token stream
            token_p = qlex.receive();
    
            // (*) print out token information
    #       ifdef PRINT_TOKEN
            //cout << string(*token_p) << endl;
    #       else
            //cout << token_p->type_id_name() << endl;
    #       endif
    
    #       ifdef SPECIAL_ACTION
            SPECIAL_ACTION(&qlex, &my_token);
    #       endif
    
            ++number_of_tokens;
    
            // (*) check against 'termination'
        } while( token_p->type_id() != QUEX_TKN_TERMINATION );
    
        int t = aaa.GetCurrentTime();
        cout<< "time" << t << "\n";
    
        cout << "| [END] number of token = " << number_of_tokens << "\n";
        cout << "`-----------------------------------------------------------------\n";
    
        return 0;
    }
    

    The C grammar is used in:
    C:\Program Files\quex\quex-0.50.1\demo\C\004\c.qx

    And I use the command to generate the code

    quex -i simple.qx -o tiny_lexer

    By the way, I found that there is a benchmark folder, there:

    C:\Program Files\quex\quex-0.50.1\demo\benchmark\in\flex

    So, can you show some result of comparing flex and quex??
    Thanks.

     
  • ollydbg

    ollydbg - 2010-08-02

    Hi, I found a post by you in GCC maillist
    http://gcc.gnu.org/ml/gcc/2007-05/msg00726.html
    You said that 200% performance, so I have more interests on this. I'm not sure
    how to run the benchmark on my WinXp and MinGW, I have also seen that all the
    result data in
    C:\Program Files\quex\quex-0.50.1\demo\benchmark
    is only about quex, there is no flex and re2c result.
    Can you show some of them?

    Thanks.

     
  • Frank-Rene Schäfer

    (*) demo/benchmark


    As you said in "demo/benchmark" there is already a setup
    to benchmark and compare with other systems. The Makefile
    contains targets to build a quex-based, a flex-based and a
    re2c-based parser.

    make run/lexer-flex

    builds a flex based lexical analyzer,

    make run/lexer-quex

    builds a quex based analyzer, etc.

    (*) Results


    Results on "code/linux-2.6.22.17-kernel-dir.c".

    (1) lexer-flex:

    Compiled with -Os (size optimized)

    clock_cycles_per_character = {35.004494}, // overhead eliminated

    Compiled with -O3 (speed optimized)

    clock_cycles_per_character = {43.224880}

    (2) lexer-quex

    Compiled with -Os (size optimized)

    clock_cycles_per_character = {17.738173},

    Compiled with -O3 (speed optimized)

    clock_cycles_per_character = {17.393938}

    As you can see, it is also a cache issue--smaller programs perform faster,
    cause
    lesser cache misses.

    In directory "demo/benchmark/run" you find some helper scripts to run your
    benchmark.

    Note, that the benchmark tries to isolate the cost for lexical analyzis. An
    inadvertent use of the std::string class can slow down the performance
    tremendously.

    (*) Your Case


    :w
    From what I understood, the thing you are trying to do is covered by the
    benchmark. If not, I would like from you the following:

    -- a .qx file that describes exactly the grammar and the
    pattern-action pairs.

    -- a .cpp file for your parser/lexical analyzer, where
    the lexical analyzis is the only thing that happens.

    -- an example text file on which the analyzer runs for
    benchmarking.

    -- some results from '> time my_lexer' as a reference.

    In this case, send me an email to fschaef at users dot sourceforge dot net,
    so that I have your private email and we can exchange attachments.

    It should definitely be possible to get a lexer for the task you
    describe with a clock cycles per character rate of 10 or better.

    Best Regards

    Frank

     
  • ollydbg

    ollydbg - 2010-08-02

    Thanks for your reply. As you said:

    Note, that the benchmark tries to isolate the cost for lexical analyzis. An
    inadvertent use of the std::string class can slow down the performance
    tremendously.

    That's may be the reason why my code is slow down, because In my test code, I
    guess it use "std::string".

    I'm so grad to see the result, your quex is quite good, I will choose this one
    as the lexer to refactor the " tokenizer of codecompletion plugins"

    thanks

     
  • Frank-Rene Schäfer

    Please, let me know about your benchmark results. Note, that with
    --path-compression, --template-compression, or --template-compression-uniform,
    you can achieve sometimes a big deal of program space savings.

     
  • ollydbg

    ollydbg - 2010-08-02

    I have tried in under MSYS and MinGW 4.4.4, but faild. I even can't build the
    bench mark.

    See the log:

    XXXXX /e/code/quex/quex-0.50.1/demo/benchmark
    $ make run/lexer-flex
    flex -o out/flex_scan.c in/flex/c.lex
    flex: could not create out/flex_scan.c
    make: *** Error 1

    XXXXX /e/code/quex/quex-0.50.1/demo/benchmark
    $ make run/lexer-quex
    quex -i in/quex/c.qx \
    --foreign-token-id-file in/token-ids.h \
    --output-directory out \
    --engine quex_scan \
    --no-string-accumulator \
    --token-offset 3 \
    --token-prefix TKN_ \
    --token-policy single
    make: quex: Command not found
    make: *** Error 127

    It is strange, I have all python, mingw correctly installed, I can run quex
    command in the windows cmd.
    I don't know how to find a solution...

     
  • Frank-Rene Schäfer

    echo $PATH

    in msys shell. In Makefile

    me_watch:
    echo $PATH
    echo $SHELL
    echo $QUEX_PATH

    Note that the thing before 'echo' must be a tabulator. Then,

    make me_watch

     
  • ollydbg

    ollydbg - 2010-08-02

    Dear fschaef:
    Thanks for your reply. but I'm sorry I can't catch your idea. I don't know how
    to follow your steps.

    running the "echo $PATH" will print all the search patch.

    But what does "me_watch: " mean??

    Sorry, I'm not an normal linux user, I don't know much about bash.....

    What does the last command "make me_watch" means?

    Sorry, sorry....

     
  • Frank-Rene Schäfer

    The lines above print out environment variables that tell how
    your environment find applications. The variable PATH contains
    a list of directories where executables can be found. The
    directory where quex is located must be one of them. Else,
    it must be added.

    Inside the make environment things may change. The make target
    'me_watch' allows to print out the variables SHELL, PATH, and QUEX_PATH.
    Dependent on the shell other variables may be activated. The variable
    QUEX_PATH must be set to the directory where quex is installed.

    Regards

    Frank

     
  • ollydbg

    ollydbg - 2010-08-02

    thanks. but I still failed.
    I add these code to the makefile under :
    E:\code\quex\quex-0.50.1\demo\benchmark

    me_watch: 
        echo $(PATH) 
        echo $(SHELL) 
        echo $(QUEX_PATH)
    

    The command make me_watch gives the result:

    $ make me_watch
    echo
    .:/usr/local/bin:/mingw/bin:/bin:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/
    System32/Wbem:/c/Program Files/ATI Technologies/ATI.ACE/Core-Static:/c/Program
    F
    iles/Intel/WiFi/bin/:/e/code/bin_path:/e/code/cb/MinGW/bin:/e/code/shortcut:/e
    /c
    ode/opencv/opencv_build/bin:/c/Program Files/Common Files/Thunder
    Network/KanKan
    /Codecs:/d/Program Files/TortoiseSVN/bin:/e/code/msys/bin:/e/code/quex
    .:/usr/local/bin:/mingw/bin:/bin:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/Sys
    te
    m32/Wbem:/c/Program Files/ATI Technologies/ATI.ACE/Core-Static:/c/Program
    Files/
    Intel/WiFi/bin/:/e/code/bin_path:/e/code/cb/MinGW/bin:/e/code/shortcut:/e/code
    /o
    pencv/opencv_build/bin:/c/Program Files/Common Files/Thunder
    Network/KanKan/Code
    cs:/d/Program Files/TortoiseSVN/bin:/e/code/msys/bin:/e/code/quex
    echo /bin/sh
    /bin/sh
    echo /e/code/quex/quex-0.50.1/
    /e/code/quex/quex-0.50.1/

    Also, I have QUEX_PATH variable defined in my windows environment
    The default value is: E:\code\quex\quex-0.50.1
    I have tried to change it to /e/code/quex/quex-0.50.1

    But both cases, when I run the command:
    make run/lexer-quex

    I get the same result as before: quex command not found.

     
  • ollydbg

    ollydbg - 2010-08-03

    by the way, I should you should add the \r to the c.qx
    like:

    P_WHITESPACE          [ \t\r\n]+
    ...
    
    <skip:       [ \n\r\t]>
    

    Also, I think the pdf manual is a bit hard for beginners. I think a mini
    example and tutorial is enough. Also, I'm not sure How to generate C stype
    EasyLexer. When I use the default command:

    quex -i xxx.qx -o EasyLexer
    

    It always generate a C++ code.
    thanks.

     
  • Frank-Rene Schäfer

    -- There is a minimalist example in the PDF Documentation. Do you mean
    there should be a mini docu that only contains the minimalist example?
    Well, go ahead write it in rst or plain text and I will include it in the
    download
    section.

    -- Good point with \r

    -- use command line option '--language C'. This becomes clear
    from the examples in demo/C/*

    Best Regards

     

Log in to post a comment.