|
From: Shailesh B. <sbi...@gm...> - 2015-03-23 22:26:23
|
Hello,
I am observing a strange behavior of CLucene with large data (though its
not that large).
I have 40,000 HTML documents (around 5GB of data). I added these documents
in Lucene Index. When I try to search a word with this index it gives me
zero results.
If I take subset of these documents (only 170 documents) and create a Index
then the same search works.
Note, to create above both Index I used the same the same code.
Here is what I am doing, to add an string in index. (Note I am passing the
document contents as string).
void LuceneLib::AddStringToDoc(Document *doc, const char *fieldName, const
char *str)
{
wchar_t *wstr = charToWChar(fieldName);
wchar_t *wstr2 = charToWChar(str);
bool isHighlighted = false;
bool isStoreCompressed = false;
for (int i =0; i < highlightedFields.size(); i++)
{
if (highlightedFields.at(i).compare(fieldName) == 0) {
isHighlighted = true;
break;
}
}
for (int i =0; i < compressedFields.size(); i++)
{
if (compressedFields.at(i).compare(fieldName) == 0) {
isStoreCompressed = true;
break;
}
}
cout << "Field : " << fieldName << " ";
int fieldConfig = Field::INDEX_TOKENIZED;
if (isHighlighted == true) {
fieldConfig = fieldConfig | Field::TERMVECTOR_WITH_POSITIONS_OFFSETS;
cout << " Highlighted";
}
if (isStoreCompressed == true) {
fieldConfig = fieldConfig | Field::STORE_COMPRESS;
cout << " Store Compressed";
}
else {
fieldConfig = fieldConfig | Field::STORE_NO;
cout << " Do not store";
}
cout << " : " << fieldConfig << endl;
Field *field = _CLNEW Field((const TCHAR *) wstr, (const TCHAR *) wstr2,
fieldConfig);
doc->add(*field);
delete[] wstr;
delete[] wstr2;
}
I checked the field config values and those are as below:
Field : docName Do not store : 34
Field : docPath Do not store : 34
Field : docContent Highlighted Store Compressed : 3620
Field : All Do not store : 34
The field on which I am doing a query is docContent.
Please let me know if I have missed anything.
Thanks,
Shailesh
|