Menu

[Crash] Uncontrolled Recursion in `simplifyAddBracesToCommand` / `simplifyAddBracesPair`

Yun
4 days ago
4 days ago
  • Yun

    Yun - 4 days ago

    Description

    Two unbounded recursive paths exist with no depth limit:

    Path 1 — Direct recursion via else if chains

    At lib/tokenize.cpp:6850-6852:

    if (tokEndNextNext->str() == "if")
        // do not change "else if ..." to "else { if ... }"
        tokEnd = simplifyAddBracesToCommand(tokEndNextNext);  // ← direct self-recursion
    

    When simplifyAddBracesToCommand processes an if statement followed by else if, it recurses into itself for the next else if. For a chain of N else if clauses, this produces N stack frames.

    Path 2 — Mutual recursion via nested control structures

    At lib/tokenize.cpp:6923 (simplifyAddBracesPair):

    } else {
        Token * tokEnd = simplifyAddBracesToCommand(tokStatement);  // ← mutual recursion
    

    For nested if(x) if(x) if(x) ... without braces, each nesting level calls simplifyAddBracesToCommandsimplifyAddBracesPairsimplifyAddBracesToCommand. This produces 2 stack frames per nesting level.

    No Guard

    • The AST depth limit (AST_MAX_DEPTH = 150) only applies to AST construction in list.createAst(), which runs after simplifyAddBraces.
    • No recursion depth counter or ulimit-style guard exists in either function.
    • The call chain is: simplifyTokenList1 (line 5755) → simplifyAddBraces (line 6802) → iterates tokens → simplifyAddBracesToCommand → recurse.

    Reproduction

    Prerequisites

    • A built cppcheck binary (with or without ASAN — both crash)
    • Python 3 (for the PoC generator script)
    • Linux/macOS with default 8 MB stack

    Method A: Using the PoC Script

    # 1. Navigate to the cppcheck repository root
    cd /path/to/cppcheck
    
    # 2. Run the PoC script
    python3 Tokenizer_simplifyAddBracesToCommand_poc.py
    

    Expected output (ASAN build):

    === Test 1: else-if chain (100,000 clauses) ===
    Return code: 1
    stderr: AddressSanitizer:DEADLYSIGNAL
    ==NNNNN==ERROR: AddressSanitizer: stack-overflow on address 0x...
    

    Expected output (non-ASAN build):

    === Test 1: else-if chain (100,000 clauses) ===
    CRASH: killed by signal 11 (SIGSEGV)
    VERIFIED: Stack overflow (SIGSEGV)  CWE-674 confirmed
    

    Method B: Manual Reproduction — else-if Chain (Path 1)

    # 1. Generate a C file with 100,000 else-if clauses
    python3 -c "
    lines = ['void f() {', '    if(x) a();']
    for i in range(100000):
        lines.append(f'    else if(x{i}) a();')
    lines.append('    else a();')
    lines.append('}')
    print('\n'.join(lines))
    " > /tmp/crash_elseif.c
    
    # 2. Verify the file is valid C (optional)
    wc -l /tmp/crash_elseif.c   # should be ~100004 lines
    
    # 3. Run cppcheck on it
    ./cppcheck /tmp/crash_elseif.c
    # → Expected: SIGSEGV (signal 11) or ASAN stack-overflow
    
    # 4. Confirm the crash signal
    echo "Exit code: $?"
    # → 139 (128 + 11 = SIGSEGV) on non-ASAN builds
    # → 1 on ASAN builds (ASAN catches and reports it)
    
    # 5. Clean up
    rm /tmp/crash_elseif.c
    

    Method C: Manual Reproduction — Nested if (Path 2)

    # 1. Generate a C file with 50,000 nested if statements
    python3 -c "
    code = 'void f() {\n'
    code += '    ' + 'if(x) ' * 50000 + 'a();\n'
    code += '}\n'
    print(code)
    " > /tmp/crash_nested.c
    
    # 2. Run cppcheck on it
    ./cppcheck /tmp/crash_nested.c
    # → Expected: SIGSEGV or ASAN stack-overflow
    
    # 3. Clean up
    rm /tmp/crash_nested.c
    

    Tuning the Trigger Depth

    The exact number of else if clauses needed depends on stack size and build configuration:

    Build Config Stack Size Approx. Trigger Depth (else-if)
    ASAN -O0 8 MB ~8,000–15,000 clauses
    Release -O2 8 MB ~50,000–100,000 clauses
    Debug -O0 8 MB ~20,000–40,000 clauses
    Any build ulimit -s 2048 (2 MB) ~2,000–5,000 clauses

    Nested if (Path 2) requires roughly half the count since each level uses 2 stack frames.

    ASAN Output (Captured)

    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==408149==ERROR: AddressSanitizer: stack-overflow on address 0x7ffde7ab8ee8
        #0 0x... in __interceptor_strlen
        #1 0x... in std::__cxx11::basic_string::compare
        ...  (frames in simplifyAddBracesToCommand  simplifyAddBracesPair mutual recursion)
    

    Root Cause Analysis

    The call graph for an else if chain:

    simplifyTokenList1 (line 5755)
      └→ simplifyAddBraces (line 6802)
           └→ [loop over tokens]  simplifyAddBracesToCommand(tok)  // tok = first "if"
                └→ simplifyAddBracesPair(tok, true)                  // process "if(x) a;"
                └→ tokEnd->strAt(1) == "else"  true
                └→ tokEndNextNext->str() == "if"  true
                └→ simplifyAddBracesToCommand(tokEndNextNext)         // ← RECURSE for next "else if"
                     └→ simplifyAddBracesPair(...)
                     └→ ...  simplifyAddBracesToCommand(...)         // ← RECURSE again
                          └→ ... (N times, one per else-if clause)
                               └→ STACK OVERFLOW
    

    Each recursion adds one frame of simplifyAddBracesToCommand (~200–400 bytes depending on build). With 100K clauses on a default 8 MB stack, total stack usage is ~20–40 MB, far exceeding the limit.

    Suggested Fix

    Add a recursion depth parameter with a sensible maximum.
    Alternatively, convert to an iterative approach using a worklist (as was done for isVarUsedInTree in commit c2df2d430).

    Commit Description
    c2df2d430 converted to iterative for isVarUsedInTree
     
  • CHR

    CHR - 4 days ago

    Seems like you have the issue figured out already, why not open a PR at https://github.com/danmar/cppcheck/pulls?

     
  • Oliver Stöneberg

    Probably because the used AI agent doesn't support creating PRs yet...

     
  • Yun

    Yun - 4 days ago

    Thanks for the suggestions. Just to clarify the context: I actually found this crash while fuzzing the harness. Since fuzzing drivers can sometimes produce false positives, I made sure to verify that this specific case is fully reproducible from the main entry point. I did use Claude to help structure and draft the PoC report for better readability ( comments are also polished by AI to improve my English writing ;).

    The main reason I haven't opened a PR yet is that I wanted to get the crash confirmed by the upstream maintainers first. I'm more than happy to create a PR, but I'd love to get your thoughts on the best path forward like whether limiting the recursion depth or moving to an iterative approach would be preferred.
    I will create a PR soon with simple regression tests soon.

     

    Last edit: Yun 4 days ago

Log in to post a comment.

MongoDB Logo MongoDB