1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #3116 (closed enhancement: fixed)

Opened 20 months ago

Last modified 19 months ago

Tokenizer: Improve Token::isStandardType()

Reported by: stinsen Owned by: tjarosch
Priority: Milestone:
Component: Refactoring Keywords:
Cc:

Description

Hi all,

First of all, thank you for cppcheck, it's a lovely tool!

I don't know if this is a type of activity that you want in TRAC, but I thought I'd give it a shot at least. :)

The patch deals with a rater inefficiently (excessive string comparisons) implemented method in lib/token.cpp. I made some small modifications:

* Made it return directly if it finds a match.
* Rearranged the 'type' array by moving the common 'int' to the start of the array in hopes of getting an earlier match.
* Made the const array static.

Below are the execution counts from a run on lib/*.cpp:

Before:
=======

 83305:  731:bool Token::isStandardType() const
     -:  732:{
 83305:  733:    bool ret = false;
 83305:  734:    const char *type[] = {"bool", "char", "short", "int", "long", "float", "double", "size_t", 0};
749745:  735:    for (int i = 0; type[i]; i++)
666440:  736:        ret |= (_str == type[i]);
 83305:  737:    return ret;
     -:  738:}

After:
=======

83297:  731:bool Token::isStandardType() const
     -:  732:{
     -:  733:    static const char * const type[] = {"int", "char", "bool", "long", "short", "float", "double", "size_t", 0};
     -:  734:
523601:  735:    for (int i = 0; type[i]; i++)
472971:  736:        if(_str == type[i])
 32667:  737:            return true;
     -:  738:
 50630:  739:    return false;

Attachments

is_standard_type.diff (0.6 KB) - added by stinsen 20 months ago.

Change History

Changed 20 months ago by stinsen

follow-up: ↓ 7   Changed 20 months ago by hyd_danmar

sounds good.

If you tell me the author information you want me to use then I'll commit it, I need your full name and your email. The author information you give me will be shown in the log.

  Changed 20 months ago by hyd_danmar

I don't know if this is a type of activity that you want in TRAC, but I thought I'd give it a shot at least. :)

It is ok to submit patches through trac.

  Changed 20 months ago by seb777

It seems to be a really good patch.

Made it return directly if it finds a match

Maybe a new idea for cppcheck. How can cppcheck detect this kind of case ?

follow-up: ↓ 8   Changed 20 months ago by kimmov

It would be more interesting to see timings. Of course less rounds in loops is usually good but if the net effect is like one microsecond... So is this micro-optimization or real optimization? There is benchmark test in our tree already that gives some hint about it.

follow-up: ↓ 9   Changed 20 months ago by pkeus

If size_t (stddef.h) is part of the array, we should think about adding all stdint.h types as well.

  Changed 20 months ago by kimmov

If size_t (stddef.h) is part of the array, we should think about adding all stdint.h types as well.

Please lets not go there in this ticket. At least not until we have some outcome from tickets #2888 and #3105.

in reply to: ↑ 1   Changed 20 months ago by stinsen

Replying to hyd_danmar:

sounds good.

If you tell me the author information you want me to use then I'll commit it, I need your full name and your email. The author information you give me will be shown in the log.

Thank you, you can use "Johan Samuelson" (stinsen at gmail.com).

in reply to: ↑ 4   Changed 20 months ago by stinsen

Replying to kimmov:

It would be more interesting to see timings. Of course less rounds in loops is usually good but if the net effect is like one microsecond... So is this micro-optimization or real optimization? There is benchmark test in our tree already that gives some hint about it.

Yes, I should have included timings of course for this kind of activity. I'm at work at the moment, so I don't have the figures handy, but I used "time" on cppcheck on the lib/*.cpp run and if I recall correctly I observed an improvement of ~0.7s user CPU time so even if it wasn't a vast improvement, it was still measurable. I will look into that benchmark test when I get home, thanks for the tip. :)

in reply to: ↑ 5   Changed 20 months ago by robertreif

Replying to pkeus:

If size_t (stddef.h) is part of the array, we should think about adding all stdint.h types as well.

size_t has been removed from this array in the patch for #2888 specifically because it is not a standard type. In fact size_t should be replaced with the appropriate standard type in the preprocessor or early tokenization stage. No code in cppcheck after the substution should ever have to ever deal with size_t directly.

follow-up: ↓ 11   Changed 20 months ago by hyd_danmar

I compiled with -O2 and executed cppcheck 3 times both with and without the patch. These are the timings on my computer:

original
0m30.374s
0m30.510s
0m30.426s

patch
0m30.790s
0m30.558s
0m30.606s

It seems to me that your patch makes cppcheck a little slower on my computer. It's interesting that you got faster cppcheck.

in reply to: ↑ 10   Changed 20 months ago by stinsen

Replying to hyd_danmar:

I compiled with -O2 and executed cppcheck 3 times both with and without the patch. These are the timings on my computer:
{{{
original
0m30.374s
0m30.510s
0m30.426s

patch
0m30.790s
0m30.558s
0m30.606s
}}}

It seems to me that your patch makes cppcheck a little slower on my computer. It's interesting that you got faster cppcheck.

Yes, it seems I took my timing figures from my profiling build. I tested some builds without profiling enabled and with different optimization settings and data sets and it seems the patch has very minor impact if any, the compiler really does a good job. :)

I guess I should learn to better validate my results, sorry for the inconvenience.

  Changed 20 months ago by hyd_danmar

I guess I should learn to better validate my results, sorry for the inconvenience.

No problem.

  Changed 20 months ago by kimmov

With benchmark test I see without the patch:
********* Start testing of BenchmarkSimple? *********
Config: Using QTest library 4.7.2, Qt 4.7.2
PASS : BenchmarkSimple::initTestCase()
RESULT : BenchmarkSimple::tokenize():

0.0030 msecs per iteration (total: 99, iterations: 32768)

PASS : BenchmarkSimple::tokenize()
RESULT : BenchmarkSimple::simplify():

0.0050 msecs per iteration (total: 83, iterations: 16384)

PASS : BenchmarkSimple::simplify()
PASS : BenchmarkSimple::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped
********* Finished testing of BenchmarkSimple? *********

and with the patch:

********* Start testing of BenchmarkSimple? *********
Config: Using QTest library 4.7.2, Qt 4.7.2
PASS : BenchmarkSimple::initTestCase()
RESULT : BenchmarkSimple::tokenize():

0.0028 msecs per iteration (total: 93, iterations: 32768)

PASS : BenchmarkSimple::tokenize()
RESULT : BenchmarkSimple::simplify():

0.0050 msecs per iteration (total: 83, iterations: 16384)

PASS : BenchmarkSimple::simplify()
PASS : BenchmarkSimple::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped
********* Finished testing of BenchmarkSimple? *********

So there is a small measurable improvement in my Ubuntu 11.04 x64. Hence I'd take this patch in.

  Changed 20 months ago by kimmov

Forgot to mention that above numbers are from optimized (GCC -O2) release build.

  Changed 19 months ago by tjarosch

  • owner changed from noone to tjarosch
  • status changed from new to assigned

Good idea to optimize this! I'll also test another route the next days:

I'll try to determine the type information just once on Token::str("") invocation.
This might be a lot faster than doing it on Token::isStandardType() invocation over and over again.

I'll benchmark the proposed change and report back.
(I wanted to do this change anyway and just found this ticket...)

  Changed 19 months ago by tjarosch

  • status changed from assigned to closed
  • resolution set to fixed

Commited to git, thanks Johan!

https://github.com/danmar/cppcheck/commit/1a454256dc3f6a210bda9ccee7fe1936592f75bc

A little note on benchmarking: The turbo boost feature of modern Intel CPUs can give quite varying results. So it's best to run stuff three times in a row if you don't do it already.

  Changed 19 months ago by kimmov

A little note on benchmarking: The turbo boost feature of modern Intel CPUs can give quite varying results. So it's best to run stuff three times in a row if you don't do it already.

Make it 100 or 1000 and calculate median etc. Notice how Qt test runs 16 000 and 32 000 times...

Note: See TracTickets for help on using tickets.