Lexical Analyzer Generator Quex / Discussion / General Discussion: Performance question about Quex

ollydbg - 2010-08-01

I would like to consider as a lexer(tokenizer) to support the "Codecompletion
plugin in Codeblocks IDE".

as the home page said, the code directed lexer is faster than the table driven
lexer. then Quex is a code directed lexer.
I just do a comparesion of the flex. I use the C/004 demo code, and I disabled
the cout output in the while loop, and also disable the assert macro. by
lexing a 8M cpp source code, it takes about 1.5s.

I just take some extra test by using flex gramma, and the result is quite
fast, it is only about 150 ms.
See a post link of mine in the codeblocks' forum:

http://forums.codeblocks.org/index.php/topic,13017.msg87659.html#msg87659

Any ideas?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2010-08-01

Did you disable the asserts? Can you provide à Sample Application?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thanks for your reply.

I have disabled the "assert" by adding -DQUEX_OPTION_ASSERTS_DISABLED to the
compiler options.
I use MinGW 4.4.4 compiler Windows XP,python 2.6.5.

The code is exact the same as minimal sample, see below:

#include<fstream>
#include<iostream>

// (*) include lexical analyser header
#include "tiny_lexer"

#ifndef     ENCODING_NAME
#    define ENCODING_NAME (0x0)
#endif

#define PRINT_TOKEN

#ifdef _WIN32

#include<Windows.h>

class CTimer {
private:

    LARGE_INTEGER m_base;
    LARGE_INTEGER m_temp;
    float m_resolution;

public:

    CTimer() {
        LARGE_INTEGER t_freq;
        QueryPerformanceFrequency(&t_freq);
        m_resolution = (float) (1.0f / (double) t_freq.QuadPart);
        reset();
    }

    void reset() {
        QueryPerformanceCounter(&m_base);
    }

    inline float GetCurrentTime() {
        QueryPerformanceCounter(&m_temp);
        return (m_temp.QuadPart - m_base.QuadPart) * m_resolution * 1000.0f;
    }

    void SleepTime(float ms_val) {
        float ms_st = GetCurrentTime();
        while (GetCurrentTime()-ms_st < ms_val)
            continue;
    }

};
#else
class CTimer {
private:
    unsigned long m_base;
public:
    CTimer() {
        reset();
    }

    void reset() {
        timeval t;
        gettimeofday(&t, NULL);
        m_base = t.tv_sec;
    }

    float GetCurrentTime() {
        timeval t;
        gettimeofday(&t, NULL);
        return 1000 * (t.tv_sec - m_base) + t.tv_usec * 0.001f;
    }
};
#endif

int
main(int argc, char** argv)
{
    using namespace std;

    quex::Token*       token_p = 0x0;
    // (*) create the lexical analyser
    //     1st arg: input file, default = 'example.txt'
    //     2nd arg: input character encoding name, 0x0 --> no codec conversion
    quex::tiny_lexer    qlex(argc == 1 ? "example.txt" : argv[1], ENCODING_NAME);

    cout << ",-----------------------------------------------------------------\n";
    cout << "| [START]\n";

    CTimer aaa;

    int number_of_tokens = 0;
    // (*) loop until the 'termination' token arrives
    do {
        // (*) get next token from the token stream
        token_p = qlex.receive();

        // (*) print out token information
#       ifdef PRINT_TOKEN
        //cout << string(*token_p) << endl;
#       else
        //cout << token_p->type_id_name() << endl;
#       endif

#       ifdef SPECIAL_ACTION
        SPECIAL_ACTION(&qlex, &my_token);
#       endif

        ++number_of_tokens;

        // (*) check against 'termination'
    } while( token_p->type_id() != QUEX_TKN_TERMINATION );

    int t = aaa.GetCurrentTime();
    cout<< "time" << t << "\n";

    cout << "| [END] number of token = " << number_of_tokens << "\n";
    cout << "`-----------------------------------------------------------------\n";

    return 0;
}

The C grammar is used in:
C:\Program Files\quex\quex-0.50.1\demo\C\004\c.qx

And I use the command to generate the code

quex -i simple.qx -o tiny_lexer

By the way, I found that there is a benchmark folder, there:

C:\Program Files\quex\quex-0.50.1\demo\benchmark\in\flex

So, can you show some result of comparing flex and quex??
Thanks.

ollydbg - 2010-08-02

Hi, I found a post by you in GCC maillist
http://gcc.gnu.org/ml/gcc/2007-05/msg00726.html
You said that 200% performance, so I have more interests on this. I'm not sure
how to run the benchmark on my WinXp and MinGW, I have also seen that all the
result data in
C:\Program Files\quex\quex-0.50.1\demo\benchmark
is only about quex, there is no flex and re2c result.
Can you show some of them?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2010-08-02

(*) demo/benchmark

As you said in "demo/benchmark" there is already a setup
to benchmark and compare with other systems. The Makefile
contains targets to build a quex-based, a flex-based and a
re2c-based parser.

make run/lexer-flex

builds a flex based lexical analyzer,

make run/lexer-quex

builds a quex based analyzer, etc.

(*) Results

Results on "code/linux-2.6.22.17-kernel-dir.c".

(1) lexer-flex:

Compiled with -Os (size optimized)

clock_cycles_per_character = {35.004494}, // overhead eliminated

Compiled with -O3 (speed optimized)

clock_cycles_per_character = {43.224880}

(2) lexer-quex

Compiled with -Os (size optimized)

clock_cycles_per_character = {17.738173},

Compiled with -O3 (speed optimized)

clock_cycles_per_character = {17.393938}

As you can see, it is also a cache issue--smaller programs perform faster,
cause
lesser cache misses.

In directory "demo/benchmark/run" you find some helper scripts to run your
benchmark.

Note, that the benchmark tries to isolate the cost for lexical analyzis. An
inadvertent use of the std::string class can slow down the performance
tremendously.

(*) Your Case

:w
From what I understood, the thing you are trying to do is covered by the
benchmark. If not, I would like from you the following:

-- a .qx file that describes exactly the grammar and the
pattern-action pairs.

-- a .cpp file for your parser/lexical analyzer, where
the lexical analyzis is the only thing that happens.

-- an example text file on which the analyzer runs for
benchmarking.

-- some results from '> time my_lexer' as a reference.

In this case, send me an email to fschaef at users dot sourceforge dot net,
so that I have your private email and we can exchange attachments.

It should definitely be possible to get a lexer for the task you
describe with a clock cycles per character rate of 10 or better.

Best Regards

Frank

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ollydbg - 2010-08-02

Thanks for your reply. As you said:

Note, that the benchmark tries to isolate the cost for lexical analyzis. An
inadvertent use of the std::string class can slow down the performance
tremendously.

That's may be the reason why my code is slow down, because In my test code, I
guess it use "std::string".

I'm so grad to see the result, your quex is quite good, I will choose this one
as the lexer to refactor the " tokenizer of codecompletion plugins"

thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2010-08-02

Please, let me know about your benchmark results. Note, that with
--path-compression, --template-compression, or --template-compression-uniform,
you can achieve sometimes a big deal of program space savings.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ollydbg - 2010-08-02

I have tried in under MSYS and MinGW 4.4.4, but faild. I even can't build the
bench mark.

See the log:

XXXXX /e/code/quex/quex-0.50.1/demo/benchmark
$ make run/lexer-flex
flex -o out/flex_scan.c in/flex/c.lex
flex: could not create out/flex_scan.c
make: *** Error 1

XXXXX /e/code/quex/quex-0.50.1/demo/benchmark
$ make run/lexer-quex
quex -i in/quex/c.qx \
--foreign-token-id-file in/token-ids.h \
--output-directory out \
--engine quex_scan \
--no-string-accumulator \
--token-offset 3 \
--token-prefix TKN_ \
--token-policy single
make: quex: Command not found
make: *** Error 127

It is strange, I have all python, mingw correctly installed, I can run quex
command in the windows cmd.
I don't know how to find a solution...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2010-08-02

echo $PATH

in msys shell. In Makefile

me_watch:
echo $PATH
echo $SHELL
echo $QUEX_PATH

Note that the thing before 'echo' must be a tabulator. Then,

make me_watch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ollydbg - 2010-08-02

Dear fschaef:
Thanks for your reply. but I'm sorry I can't catch your idea. I don't know how
to follow your steps.

running the "echo $PATH" will print all the search patch.

But what does "me_watch: " mean??

Sorry, I'm not an normal linux user, I don't know much about bash.....

What does the last command "make me_watch" means?

Sorry, sorry....

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2010-08-02

The lines above print out environment variables that tell how
your environment find applications. The variable PATH contains
a list of directories where executables can be found. The
directory where quex is located must be one of them. Else,
it must be added.

Inside the make environment things may change. The make target
'me_watch' allows to print out the variables SHELL, PATH, and QUEX_PATH.
Dependent on the shell other variables may be activated. The variable
QUEX_PATH must be set to the directory where quex is installed.

Regards

Frank

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ollydbg - 2010-08-02

thanks. but I still failed.
I add these code to the makefile under :
E:\code\quex\quex-0.50.1\demo\benchmark

me_watch: echo $(PATH) echo $(SHELL) echo $(QUEX_PATH)

The command make me_watch gives the result:

$ make me_watch
echo
.:/usr/local/bin:/mingw/bin:/bin:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/
System32/Wbem:/c/Program Files/ATI Technologies/ATI.ACE/Core-Static:/c/Program
F
iles/Intel/WiFi/bin/:/e/code/bin_path:/e/code/cb/MinGW/bin:/e/code/shortcut:/e
/c
ode/opencv/opencv_build/bin:/c/Program Files/Common Files/Thunder
Network/KanKan
/Codecs:/d/Program Files/TortoiseSVN/bin:/e/code/msys/bin:/e/code/quex
.:/usr/local/bin:/mingw/bin:/bin:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/Sys
te
m32/Wbem:/c/Program Files/ATI Technologies/ATI.ACE/Core-Static:/c/Program
Files/
Intel/WiFi/bin/:/e/code/bin_path:/e/code/cb/MinGW/bin:/e/code/shortcut:/e/code
/o
pencv/opencv_build/bin:/c/Program Files/Common Files/Thunder
Network/KanKan/Code
cs:/d/Program Files/TortoiseSVN/bin:/e/code/msys/bin:/e/code/quex
echo /bin/sh
/bin/sh
echo /e/code/quex/quex-0.50.1/
/e/code/quex/quex-0.50.1/

Also, I have QUEX_PATH variable defined in my windows environment
The default value is: E:\code\quex\quex-0.50.1
I have tried to change it to /e/code/quex/quex-0.50.1

But both cases, when I run the command:
make run/lexer-quex

I get the same result as before: quex command not found.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ollydbg - 2010-08-03

by the way, I should you should add the \r to the c.qx
like:

P_WHITESPACE [ \t\r\n]+ ... <skip: [ \n\r\t]>

Also, I think the pdf manual is a bit hard for beginners. I think a mini
example and tutorial is enough. Also, I'm not sure How to generate C stype
EasyLexer. When I use the default command:

quex -i xxx.qx -o EasyLexer

It always generate a C++ code.
thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank-Rene Schäfer - 2010-08-03

-- There is a minimalist example in the PDF Documentation. Do you mean
there should be a mini docu that only contains the minimalist example?
Well, go ahead write it in rst or plain text and I will include it in the
download
section.

-- Good point with \r

-- use command line option '--language C'. This becomes clear
from the examples in demo/C/*

Best Regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Performance question about Quex

Generator of lexical analyzers in C and C++. Unicode Supported.

Forums

Help

Performance question about Quex

Performance question about Quex

Generator of lexical analyzers in C and C++. Unicode Supported.

Forums

Help

Performance question about Quex document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Performance question about Quex