Menu

#238 SEGV in JiebaSingleton::getInstance when running ctest for Chinese Tokenizer

other
open
nobody
None
5
2025-08-29
2025-08-29
Wang Yiming
No

Environment:

OS: Ubuntu 24.10

Compiler: GCC 14 with AddressSanitizer enabled

Commit: a82acd50f9c2e1b38a189635d411b55f670dda8c (with the <memory> header fix applied to TestDocument.cpp)</memory>

Steps to Reproduce:

  1. Build the project successfully (make).

  2. Run the tests from the build directory (ctest --output-on-failure).

Observed Behavior:

ctest reports that SimpleTest has failed. AddressSanitizer catches a SEGV at address 0x000000000008, which strongly suggests accessing a member of a null pointer.

The key error log and stack trace are as follows:

==105994==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x56c4a81a1bbb bp 0x7ffc71fc3700 sp 0x7ffc71fc3540 T0)
==105994==The signal is caused by a READ memory access.
==105994==Hint: address points to the zero page.
    #0 0x56c4a81a1bbb in std::vector<...>::operator[](...) const /usr/include/c++/14/bits/stl_vector.h:1147
    #1 0x56c4a81a1bbb in lucene::analysis::jieba::JiebaSingleton::getInstance(...) /home/user/about_doris/clucene-origin/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.h:35
    #2 0x56c4a818f42c in lucene::analysis::jieba::ChineseTokenizer::reset(...) /home/user/about_doris/clucene-origin/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp:62
    #3 0x56c4a818f59d in lucene::analysis::jieba::ChineseTokenizer::ChineseTokenizer(...) /home/user/about_doris/clucene-origin/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp:17
    #4 0x56c4a817d7e6 in lucene::analysis::LanguageBasedAnalyzer::tokenStream(...) /home/user/about_doris/clucene-origin/src/contribs-lib/CLucene/analysis/LanguageBasedAnalyzer.cpp:124
    #5 0x56c4a7e483e0 in testSimpleJiebaTokenizer2(CuTest*) /home/user/about_doris/clucene-origin/src/test/contribs-lib/analysis/testChinese.cpp:365
    ... (further calls)

SUMMARY: AddressSanitizer: SEGV /usr/include/c++/14/bits/stl_vector.h:1147 in std::vector<...>::operator[](...) const

Initial Analysis:

Frame#1of the stack trace clearly points to line 35 of ChineseTokenizer.h, inside the JiebaSingleton::getInstance function, as the root of the problem. Combined with the crash address 0x000000000008, it's highly likely that a Singleton object, which should have been initialized, is actually a nullptr. The code then crashes when attempting to access a member variable of this null object .

Discussion


Log in to post a comment.

MongoDB Logo MongoDB