From: Shailesh B. <sbi...@gm...> - 2015-03-23 22:26:23
|
Hello, I am observing a strange behavior of CLucene with large data (though its not that large). I have 40,000 HTML documents (around 5GB of data). I added these documents in Lucene Index. When I try to search a word with this index it gives me zero results. If I take subset of these documents (only 170 documents) and create a Index then the same search works. Note, to create above both Index I used the same the same code. Here is what I am doing, to add an string in index. (Note I am passing the document contents as string). void LuceneLib::AddStringToDoc(Document *doc, const char *fieldName, const char *str) { wchar_t *wstr = charToWChar(fieldName); wchar_t *wstr2 = charToWChar(str); bool isHighlighted = false; bool isStoreCompressed = false; for (int i =0; i < highlightedFields.size(); i++) { if (highlightedFields.at(i).compare(fieldName) == 0) { isHighlighted = true; break; } } for (int i =0; i < compressedFields.size(); i++) { if (compressedFields.at(i).compare(fieldName) == 0) { isStoreCompressed = true; break; } } cout << "Field : " << fieldName << " "; int fieldConfig = Field::INDEX_TOKENIZED; if (isHighlighted == true) { fieldConfig = fieldConfig | Field::TERMVECTOR_WITH_POSITIONS_OFFSETS; cout << " Highlighted"; } if (isStoreCompressed == true) { fieldConfig = fieldConfig | Field::STORE_COMPRESS; cout << " Store Compressed"; } else { fieldConfig = fieldConfig | Field::STORE_NO; cout << " Do not store"; } cout << " : " << fieldConfig << endl; Field *field = _CLNEW Field((const TCHAR *) wstr, (const TCHAR *) wstr2, fieldConfig); doc->add(*field); delete[] wstr; delete[] wstr2; } I checked the field config values and those are as below: Field : docName Do not store : 34 Field : docPath Do not store : 34 Field : docContent Highlighted Store Compressed : 3620 Field : All Do not store : 34 The field on which I am doing a query is docContent. Please let me know if I have missed anything. Thanks, Shailesh |