You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(16) |
Jul
(56) |
Aug
(2) |
Sep
(62) |
Oct
(71) |
Nov
(45) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(12) |
Feb
(22) |
Mar
|
Apr
(62) |
May
(15) |
Jun
(57) |
Jul
(4) |
Aug
(24) |
Sep
(7) |
Oct
(34) |
Nov
(81) |
Dec
(41) |
2005 |
Jan
(70) |
Feb
(51) |
Mar
(46) |
Apr
(16) |
May
(22) |
Jun
(34) |
Jul
(23) |
Aug
(13) |
Sep
(43) |
Oct
(42) |
Nov
(54) |
Dec
(68) |
2006 |
Jan
(81) |
Feb
(43) |
Mar
(64) |
Apr
(141) |
May
(37) |
Jun
(101) |
Jul
(112) |
Aug
(32) |
Sep
(85) |
Oct
(63) |
Nov
(84) |
Dec
(81) |
2007 |
Jan
(25) |
Feb
(64) |
Mar
(46) |
Apr
(28) |
May
(14) |
Jun
(42) |
Jul
(19) |
Aug
(34) |
Sep
(29) |
Oct
(25) |
Nov
(12) |
Dec
(9) |
2008 |
Jan
(15) |
Feb
(34) |
Mar
(37) |
Apr
(23) |
May
(18) |
Jun
(47) |
Jul
(28) |
Aug
(61) |
Sep
(29) |
Oct
(48) |
Nov
(24) |
Dec
(79) |
2009 |
Jan
(48) |
Feb
(50) |
Mar
(28) |
Apr
(10) |
May
(51) |
Jun
(22) |
Jul
(125) |
Aug
(29) |
Sep
(38) |
Oct
(29) |
Nov
(58) |
Dec
(32) |
2010 |
Jan
(15) |
Feb
(10) |
Mar
(12) |
Apr
(64) |
May
(4) |
Jun
(81) |
Jul
(41) |
Aug
(82) |
Sep
(84) |
Oct
(35) |
Nov
(43) |
Dec
(26) |
2011 |
Jan
(59) |
Feb
(25) |
Mar
(23) |
Apr
(14) |
May
(22) |
Jun
(8) |
Jul
(5) |
Aug
(20) |
Sep
(10) |
Oct
(12) |
Nov
(29) |
Dec
(7) |
2012 |
Jan
(1) |
Feb
(22) |
Mar
(9) |
Apr
(5) |
May
(2) |
Jun
|
Jul
(6) |
Aug
(2) |
Sep
|
Oct
(5) |
Nov
(9) |
Dec
(10) |
2013 |
Jan
(9) |
Feb
(3) |
Mar
(2) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(5) |
Sep
|
Oct
(3) |
Nov
(3) |
Dec
(2) |
2014 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(10) |
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(3) |
2015 |
Jan
(8) |
Feb
(3) |
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(8) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Abhay R. <abh...@ho...> - 2015-11-10 10:37:37
|
Hello, Currently java lucene has this functionality called "More Like This" Which is used to find representative terms of a document which can be further used to search for similar documents. I looked in latest clucene code but could not find this functionality. Is it there in clucene? If not then are there any plans to include it? Or if someone has done some work on this or area similar to this, It will be great to hear from them. Thanks Abhay ________________________________ **************************************** IMPORTANT INFORMATION The information contained in this email or any of its attachments is confidential and is intended for the exclusive use of the individual or entity to whom it is addressed. It may not be disclosed to, copied, distributed or used by anyone else without our express permission. If you receive this communication in error please advise the sender immediately and delete it from your systems. This email is not intended to and does not create legally binding commitments or obligations on behalf of Hornbill Service Management Limited which may only be created by hard copy writing signed by a director or other authorized officer. Any opinions, conclusions and other information in this message that do not relate to the official business of Hornbill Service Management Limited are unauthorized and neither given nor endorsed by it. Although Anti-Virus measures are used by Hornbill Service Management Limited it is the responsibility of the addressee to scan this email and any attachments for computer viruses or other defects. Hornbill Service Management Limited does not accept any liability for any loss or damage of any nature, however caused, which may result directly or indirectly from this email or any file attached. Hornbill Service Management Limited. Registered Office: Apollo, Odyssey Business Park, West End Road, Ruislip, HA4 6QD, United Kingdom. Registered in England Number: 3033585. **************************************** |
From: Akash <akb...@gm...> - 2015-10-13 21:15:01
|
Hi, I am using Dovecot with its clucene plugin for indexing. I am hitting a error while trying to index a large folder of emails. Sometimes it throws this error after 30000 emails, sometimes 40000, the latest it gave up after 111000. But it just never completes. On Dovecot list, I was told that its probably CLucene library bug which they can't do much about & I was suggested to switch to solr (which I don't want to). Can there be a fix for this: 111000/322080 doveadm: /home/stephan/packages/wheezy/i386/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:210: std:tring lucene::index:ocumentsWriter::closeDocStore(): Assertion `numDocsInStore*8 == directory->fileLength( (docStoreSegment + "." + IndexFileNames::FIELDS_INDEX_EXTENSION).c_str() )' failed. Aborted I am using dovecot 2:2.2.19-1~auto+7& libclucene-core1:i386 2.3.3.4-4 from debian wheezy backports. Please advice. -Akash |
From: cel t. <cel...@gm...> - 2015-03-31 21:38:21
|
Norbert, I guess you need to check the analyzer you're using to create your indexes, as well as the analyzer you use for searches. You probably need to use an analyzer (both for indexing and searching) that uses LowCaseFilter. Off the top of my head ... check if StandardAnalyzer (both for indexing and searching) does what you want. To get a better explanation, google for: lucene case insensitive search >From what you'll find for Java Lucene -- you'll get an idea of the way to go. To inspect the contents of your index, you can use Luke (google for: luke lucene) -- you'll see straight away if your index has case-sensitive terms. Regards Celto On Wed, Mar 25, 2015 at 11:51 PM, norbert barichard < nor...@di...> wrote: > Hello, > > Is there a way to tell CLucene to be Case Insensitive when performing a > search ? It's a bit annoying that when I do a search, I don't get any > results if I don't get all the upper case letters right. > > Thanks in advance ! > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > |
From: norbert b. <nor...@di...> - 2015-03-25 12:51:28
|
Hello, Is there a way to tell CLucene to be Case Insensitive when performing a search ? It's a bit annoying that when I do a search, I don't get any results if I don't get all the upper case letters right. Thanks in advance ! |
From: Shailesh B. <sbi...@gm...> - 2015-03-23 22:26:23
|
Hello, I am observing a strange behavior of CLucene with large data (though its not that large). I have 40,000 HTML documents (around 5GB of data). I added these documents in Lucene Index. When I try to search a word with this index it gives me zero results. If I take subset of these documents (only 170 documents) and create a Index then the same search works. Note, to create above both Index I used the same the same code. Here is what I am doing, to add an string in index. (Note I am passing the document contents as string). void LuceneLib::AddStringToDoc(Document *doc, const char *fieldName, const char *str) { wchar_t *wstr = charToWChar(fieldName); wchar_t *wstr2 = charToWChar(str); bool isHighlighted = false; bool isStoreCompressed = false; for (int i =0; i < highlightedFields.size(); i++) { if (highlightedFields.at(i).compare(fieldName) == 0) { isHighlighted = true; break; } } for (int i =0; i < compressedFields.size(); i++) { if (compressedFields.at(i).compare(fieldName) == 0) { isStoreCompressed = true; break; } } cout << "Field : " << fieldName << " "; int fieldConfig = Field::INDEX_TOKENIZED; if (isHighlighted == true) { fieldConfig = fieldConfig | Field::TERMVECTOR_WITH_POSITIONS_OFFSETS; cout << " Highlighted"; } if (isStoreCompressed == true) { fieldConfig = fieldConfig | Field::STORE_COMPRESS; cout << " Store Compressed"; } else { fieldConfig = fieldConfig | Field::STORE_NO; cout << " Do not store"; } cout << " : " << fieldConfig << endl; Field *field = _CLNEW Field((const TCHAR *) wstr, (const TCHAR *) wstr2, fieldConfig); doc->add(*field); delete[] wstr; delete[] wstr2; } I checked the field config values and those are as below: Field : docName Do not store : 34 Field : docPath Do not store : 34 Field : docContent Highlighted Store Compressed : 3620 Field : All Do not store : 34 The field on which I am doing a query is docContent. Please let me know if I have missed anything. Thanks, Shailesh |
From: Shailesh B. <sbi...@gm...> - 2015-03-23 02:20:22
|
Hello, I am trying some different scenario. I have 2 CLucene sertups, one with one machine and the other with 3 machines. I have added same set of HTML documents (around 40K documents) on both the setup. On distributed setup (i.e. 3 node) the documents are divided between nodes (40K/3 documents on each node). When I do a search for any word I get following results. Note, on distributed setup I execute query and do merge all results. *Query predicate* *Single Node* *Distributed Node* zoo 575 584 india 0 1624 Germany 8054 8082 Mobile 0 5104 Canada 0 5792 You can see in above table there are many queries which returns 0 results on single node search. Note the same insert program is used for both single and distributed indexing. Also, same code is used to query each index. I am using latest CLucene distribution. Can you please help me here ? Thanks, Shailesh |
From: Mark W. <mwi...@gm...> - 2015-03-22 21:49:37
|
Ah, thanks a lot! I knew it had to be a dumb mistake, and it was! Thanks for the hint about Luke.... On Sun, Mar 22, 2015 at 5:24 PM, cel tix44 <cel...@gm...> wrote: > Mark > > Looks like your "field config" is not correct. > > In your code, you used the logical AND ( && ) -- rather than the bitwise > OR ( | ): > int config = Field::STORE_YES | Field::INDEX_TOKENIZED; > > >>> I can see my fields in the index file > To browse your index, you can use a wonderful tool called "Luke". To find > it, just google for: luke lucene > > With Luke, I was able to see that : > -- your original code produced an 'empty' index (it had no terms); > -- with the change suggested above -- the index was populated with 3 terms > (nixon, obama, clinton); and your query returned 1 hit in Document ID 0. > > Hope this helps. > > Regards > Celto > > > > On Mon, Mar 23, 2015 at 1:24 AM, Mark Wilson <mwi...@gm...> > wrote: > >> I'm not sure what I'm doing wrong. I can see my fields in the index >> file, but my query/search returns no hits. >> >> Here is my index creation code: >> >> lucene::analysis::SimpleAnalyzer* analyzer; >> >> int main(int argc, char** argv) >> { >> analyzer = new lucene::analysis::SimpleAnalyzer(); >> Directory* indexDir = FSDirectory::getDirectory("../Index"); >> >> IndexWriter* w = new IndexWriter(indexDir, analyzer, true, true); >> >> int config = Field::STORE_YES && Field::INDEX_TOKENIZED; >> >> Field* field; >> Document* doc; >> >> doc = new Document(); >> >> field = new Field(L"president", L"Nixon", config); >> doc->clear(); >> doc->add(*field); >> w->addDocument(doc); >> >> field = new Field(L"president", L"Obama", config); >> doc->clear(); >> doc->add(*field); >> w->addDocument(doc); >> >> field = new Field(L"president", L"Clinton", config); >> doc->clear(); >> doc->add(*field); >> w->addDocument(doc); >> >> w->close(); >> >> indexDir->close(); >> } >> >> Here is my query code: >> >> >> int main(int argc, char** argv) >> { >> >> IndexReader* reader = IndexReader::open("../Index"); >> >> lucene::analysis::SimpleAnalyzer* analyzer = >> new lucene::analysis::SimpleAnalyzer(); >> >> IndexReader* newreader = reader->reopen(); >> if ( newreader != reader ) >> { >> _CLLDELETE(reader); >> reader = newreader; >> } >> IndexSearcher searcher(reader); >> >> >> Query* query = QueryParser::parse(L"Nixon*", >> L"president", analyzer); >> Hits* hits = searcher.search(query); >> cout << "Total hits: " << hits->length() << endl; >> } >> >> >> I get no hits on this. I've tried to make my experiment as simple as >> possible, but I get nothing. Is the query formed wrong? >> >> Thanks for any help. >> >> Regards, >> Mark >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming The Go Parallel Website, >> sponsored >> by Intel and developed in partnership with Slashdot Media, is your hub >> for all >> things parallel software development, from weekly thought leadership >> blogs to >> news, videos, case studies, tutorials and more. Take a look and join the >> conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
From: cel t. <cel...@gm...> - 2015-03-22 21:25:01
|
Mark Looks like your "field config" is not correct. In your code, you used the logical AND ( && ) -- rather than the bitwise OR ( | ): int config = Field::STORE_YES | Field::INDEX_TOKENIZED; >>> I can see my fields in the index file To browse your index, you can use a wonderful tool called "Luke". To find it, just google for: luke lucene With Luke, I was able to see that : -- your original code produced an 'empty' index (it had no terms); -- with the change suggested above -- the index was populated with 3 terms (nixon, obama, clinton); and your query returned 1 hit in Document ID 0. Hope this helps. Regards Celto On Mon, Mar 23, 2015 at 1:24 AM, Mark Wilson <mwi...@gm...> wrote: > I'm not sure what I'm doing wrong. I can see my fields in the index file, > but my query/search returns no hits. > > Here is my index creation code: > > lucene::analysis::SimpleAnalyzer* analyzer; > > int main(int argc, char** argv) > { > analyzer = new lucene::analysis::SimpleAnalyzer(); > Directory* indexDir = FSDirectory::getDirectory("../Index"); > > IndexWriter* w = new IndexWriter(indexDir, analyzer, true, true); > > int config = Field::STORE_YES && Field::INDEX_TOKENIZED; > > Field* field; > Document* doc; > > doc = new Document(); > > field = new Field(L"president", L"Nixon", config); > doc->clear(); > doc->add(*field); > w->addDocument(doc); > > field = new Field(L"president", L"Obama", config); > doc->clear(); > doc->add(*field); > w->addDocument(doc); > > field = new Field(L"president", L"Clinton", config); > doc->clear(); > doc->add(*field); > w->addDocument(doc); > > w->close(); > > indexDir->close(); > } > > Here is my query code: > > > int main(int argc, char** argv) > { > > IndexReader* reader = IndexReader::open("../Index"); > > lucene::analysis::SimpleAnalyzer* analyzer = > new lucene::analysis::SimpleAnalyzer(); > > IndexReader* newreader = reader->reopen(); > if ( newreader != reader ) > { > _CLLDELETE(reader); > reader = newreader; > } > IndexSearcher searcher(reader); > > > Query* query = QueryParser::parse(L"Nixon*", > L"president", analyzer); > Hits* hits = searcher.search(query); > cout << "Total hits: " << hits->length() << endl; > } > > > I get no hits on this. I've tried to make my experiment as simple as > possible, but I get nothing. Is the query formed wrong? > > Thanks for any help. > > Regards, > Mark > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
From: Mark W. <mwi...@gm...> - 2015-03-22 14:24:23
|
I'm not sure what I'm doing wrong. I can see my fields in the index file, but my query/search returns no hits. Here is my index creation code: lucene::analysis::SimpleAnalyzer* analyzer; int main(int argc, char** argv) { analyzer = new lucene::analysis::SimpleAnalyzer(); Directory* indexDir = FSDirectory::getDirectory("../Index"); IndexWriter* w = new IndexWriter(indexDir, analyzer, true, true); int config = Field::STORE_YES && Field::INDEX_TOKENIZED; Field* field; Document* doc; doc = new Document(); field = new Field(L"president", L"Nixon", config); doc->clear(); doc->add(*field); w->addDocument(doc); field = new Field(L"president", L"Obama", config); doc->clear(); doc->add(*field); w->addDocument(doc); field = new Field(L"president", L"Clinton", config); doc->clear(); doc->add(*field); w->addDocument(doc); w->close(); indexDir->close(); } Here is my query code: int main(int argc, char** argv) { IndexReader* reader = IndexReader::open("../Index"); lucene::analysis::SimpleAnalyzer* analyzer = new lucene::analysis::SimpleAnalyzer(); IndexReader* newreader = reader->reopen(); if ( newreader != reader ) { _CLLDELETE(reader); reader = newreader; } IndexSearcher searcher(reader); Query* query = QueryParser::parse(L"Nixon*", L"president", analyzer); Hits* hits = searcher.search(query); cout << "Total hits: " << hits->length() << endl; } I get no hits on this. I've tried to make my experiment as simple as possible, but I get nothing. Is the query formed wrong? Thanks for any help. Regards, Mark |
From: norbert b. <nor...@di...> - 2015-02-24 08:08:14
|
Hello, Yeah, replacing " " with "_" is an idea. I went for something a little bit different to get around my issue, and just added an escaping backslash in front of each " " in my string. hello world ---> hello\ world*. Solved my problem ! Thank you. Le 23/02/2015 19:31, Ahmed Saidi a écrit : > Hi, > > You can't use Exact search query ("" operand) with * or ? > > One thing you can do to solve is problem is to use a different > Analyzer for that field, what it should do is convert whitespaces to _ > for example: > hello world -> hello_world > > Yours, > > Le 23/02/2015 13:55, norbert barichard a écrit : >> Hello, >> >> There's something I'm having trouble understanding with the Keyword >> Analyzer. >> >> I'm indexing elements with a field named /type/, in which I put the >> value /hello world/, using the INDEX_TOKENIZED flag. With a >> WhitespaceAnalyzer, the field becomes split into 2 terms in the index >> :/type=hello/ and /type=world/. Fine. With a KeywordAnalyzer, there's >> only 1 term, /type=hello world/. Perfect. >> >> But my problem is when I build my search queries, using the QueryParser : >> >> QueryParser lParser( _T( "type" ), lAnalyzer ); >> Query* lQuery = lParser.parse( _T( "hello world*" ) ); >> >> (That * at the end is important for reasons I don't need to explain here) >> This results in my search query being (type:hello type:world*), no >> matter which analyzer I use (Whitespace or Keyword). I'm guessing >> this is normal, because the Lucene Syntax rules take whitespaces as >> separators between different terms. The analyzer doesn't have any >> influence on that (correct me if I'm wrong). >> >> To prevent that, I should put /hello world/ between " ", so the >> whitespace isn't taken into account. But if I do that, where can I >> put my * at the end ? >> If I give the parser/"hello world*"/, the * isn't processed as a >> wildcard. >> If I give the parser /"hello world"*/, the query becomes (type:hello >> world type:*), which isn't ok. >> >> Any help ? I'm probably missing something. >> >> As a side question, what's the influence of an Analyzer in the >> QueryParser ? >> >> Thanks ! >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk >> >> >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > > > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers |
From: Ahmed S. <ci7...@gm...> - 2015-02-23 18:31:49
|
<html style="direction: ltr;"> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> <style type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style> </head> <body style="direction: ltr;" bidimailui-detected-decoding-type="latin-charset" text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">Hi, <br> <br> You can't use Exact search query ("" operand) with * or ?<br> <br> One thing you can do to solve is problem is to use a different Analyzer for that field, what it should do is convert whitespaces to _ for example:<br> hello world -> hello_world<br> <br> Yours,<br> <br> Le 23/02/2015 13:55, norbert barichard a écrit :<br> </div> <blockquote cite="mid:54E...@di..." type="cite"> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> Hello,<br> <br> There's something I'm having trouble understanding with the Keyword Analyzer. <br> <br> I'm indexing elements with a field named <i>type</i>, in which I put the value <i>hello world</i>, using the INDEX_TOKENIZED flag. With a WhitespaceAnalyzer, the field becomes split into 2 terms in the index :<i> type=hello</i> and <i>type=world</i>. Fine. With a KeywordAnalyzer, there's only 1 term, <i>type=hello world</i>. Perfect. <br> <br> But my problem is when I build my search queries, using the QueryParser :<br> <font face="Courier New, Courier, monospace"><br> QueryParser lParser( _T( "type" ), lAnalyzer );<br> Query* lQuery = lParser.parse( _T( "hello world*" ) );</font><br> <br> (That * at the end is important for reasons I don't need to explain here)<br> This results in my search query being (type:hello type:world*), no matter which analyzer I use (Whitespace or Keyword). I'm guessing this is normal, because the Lucene Syntax rules take whitespaces as separators between different terms. The analyzer doesn't have any influence on that (correct me if I'm wrong). <br> <br> To prevent that, I should put <i>hello world</i> between " ", so the whitespace isn't taken into account. But if I do that, where can I put my * at the end ? <br> If I give the parser<i> "hello world*"</i>, the * isn't processed as a wildcard.<br> If I give the parser <i>"hello world"*</i>, the query becomes (type:hello world type:*), which isn't ok.<br> <br> Any help ? I'm probably missing something.<br> <br> As a side question, what's the influence of an Analyzer in the QueryParser ?<br> <br> Thanks !<br> <br> <br> <br> <br> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE <a class="moz-txt-link-freetext" href="http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk">http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk</a></pre> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ CLucene-developers mailing list <a class="moz-txt-link-abbreviated" href="mailto:CLu...@li...">CLu...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/clucene-developers">https://lists.sourceforge.net/lists/listinfo/clucene-developers</a> </pre> </blockquote> <br> </body> </html> |
From: norbert b. <nor...@di...> - 2015-02-23 13:55:20
|
Hello, There's something I'm having trouble understanding with the Keyword Analyzer. I'm indexing elements with a field named /type/, in which I put the value /hello world/, using the INDEX_TOKENIZED flag. With a WhitespaceAnalyzer, the field becomes split into 2 terms in the index :/type=hello/ and /type=world/. Fine. With a KeywordAnalyzer, there's only 1 term, /type=hello world/. Perfect. But my problem is when I build my search queries, using the QueryParser : QueryParser lParser( _T( "type" ), lAnalyzer ); Query* lQuery = lParser.parse( _T( "hello world*" ) ); (That * at the end is important for reasons I don't need to explain here) This results in my search query being (type:hello type:world*), no matter which analyzer I use (Whitespace or Keyword). I'm guessing this is normal, because the Lucene Syntax rules take whitespaces as separators between different terms. The analyzer doesn't have any influence on that (correct me if I'm wrong). To prevent that, I should put /hello world/ between " ", so the whitespace isn't taken into account. But if I do that, where can I put my * at the end ? If I give the parser/"hello world*"/, the * isn't processed as a wildcard. If I give the parser /"hello world"*/, the query becomes (type:hello world type:*), which isn't ok. Any help ? I'm probably missing something. As a side question, what's the influence of an Analyzer in the QueryParser ? Thanks ! |
From: Shailesh B. <sbi...@gm...> - 2015-01-28 02:54:35
|
Hello, I am new to CLucene. I am looking for a Geo Spatial index example. Can you please point me to the same or share it ? Thanks, Shailesh |
From: Kostka B. <ko...@to...> - 2015-01-27 09:30:49
|
Hello, Create Term from field name and emty valu Call read->terms( Term * ) to get term enum Use TermDocs together with TermEnum to get related document ids RangeQuery::rewrite(IndexReader* reader) could be a good example I think This way is very efficient, but you should not read other document field values inside this cycle Regards Borek From: norbert barichard [mailto:nor...@di...] Sent: Monday, January 26, 2015 4:37 PM To: clu...@li... Subject: [CLucene-dev] Retrieve all values for a given field ? Hello, me again, with yet another CLucene question ! I would like to retrieve (in a relatively fast way) all the values for a given field. For example, if my index looks like : Document1 ( fieldName=value1 ) Document2 (fieldName=value1 ) Document3 (fieldName=value2 ) Document4 (fieldName=value2 ) I'd like to be able to call something like getFieldValues( "fieldName" ) and get a list ( value1, value2 ) in return. How can I do that ? Thanks in advance -- [cid:image001.png@01D03A1A.DCC17110] |
From: norbert b. <nor...@di...> - 2015-01-26 15:36:57
|
Hello, me again, with yet another CLucene question ! I would like to retrieve (in a relatively fast way) all the values for a given field. For example, if my index looks like : Document1 ( fieldName=value1 ) Document2 (fieldName=value1 ) Document3 (fieldName=value2 ) Document4 (fieldName=value2 ) I'd like to be able to call something like getFieldValues( "fieldName" ) and get a list ( value1, value2 ) in return. How can I do that ? Thanks in advance -- |
From: norbert b. <nor...@di...> - 2015-01-23 15:27:45
|
Hello, Yes, I found about the non static parse method just a while after writing my 1st e-mail, thank you ! It does sound like a better way to use the queryParser indeed. Cheers, Norbert Le 23/01/2015 16:21, Kostka Bořivoj a écrit : > > Hello, > > I never used QueryParser::parse static method so I'm not sure, but if > you pass a complex query as first parameter it should return correct > Query object > > Please note this static method is marked /** For backward > compatibility */ in source code, so better way is to create > QueryParser object > > and call member parse(const TCHAR* _query) method. This avoid creation > of new parser object for each query you parse so it is a bit more > efficient. > > Borek > -- |
From: Kostka B. <ko...@to...> - 2015-01-23 15:22:10
|
Hello, I never used QueryParser::parse static method so I'm not sure, but if you pass a complex query as first parameter it should return correct Query object Please note this static method is marked /** For backward compatibility */ in source code, so better way is to create QueryParser object and call member parse(const TCHAR* _query) method. This avoid creation of new parser object for each query you parse so it is a bit more efficient. Borek From: norbert barichard [mailto:nor...@di...] Sent: Friday, January 23, 2015 2:31 PM To: clu...@li... Subject: [CLucene-dev] Parsing queries ? Hello, I would like to know if there is a way (in CLucene2.3.3.4) to build a Query* object from a complete Lucene query string, taking into account all of its operators (AND, OR, NOT, +, -, :, (), etc.). Until now, I've been able to use the QueryParser to build fairly simple queries, like this for example : /*******/ BooleanQuery* lBoolQuery = new BooleanQuery(); Query* lQuery1 = QueryParser::parse( _T( "sampleValue1" ), _T( "sampleField1" ), lAnalyzer ); Query* lQuery2 = QueryParser::parse( _T( "sampleValue2" ), _T( "sampleField2" ), lAnalyzer ); lBoolQuery ->add( lQuery1, true, BooleanClause::MUST ); lBoolQuery->add( lQuery2, true, BooleanClause::MUST ); /*******/ This works ok. But if I've got a complex Lucene string query, like : "title:(test -stuff) AND text:"hello wor*") OR "something nice"^4 ... How can I use this to get a Query* object representing this request ? I'm assuming there is a method to do that, I just can't find it. And I don't want to write a parser myself ! Thanks in advance, -- [cid:image001.png@01D03728.79E22E30] |
From: norbert b. <nor...@di...> - 2015-01-23 13:31:33
|
Hello, I would like to know if there is a way (in CLucene2.3.3.4) to build a Query* object from a complete Lucene query string, taking into account all of its operators (AND, OR, NOT, +, -, :, (), etc.). Until now, I've been able to use the QueryParser to build fairly simple queries, like this for example : /*******/ BooleanQuery* lBoolQuery = new BooleanQuery(); Query* lQuery1 = QueryParser::parse( _T( "sampleValue1" ), _T( "sampleField1" ), lAnalyzer ); Query* lQuery2 = QueryParser::parse( _T( "sampleValue2" ), _T( "sampleField2" ), lAnalyzer ); lBoolQuery ->add( lQuery1, true, BooleanClause::MUST ); lBoolQuery->add( lQuery2, true, BooleanClause::MUST ); /*******/ This works ok. But if I've got a complex Lucene string query, like : "title:(test -stuff) AND text:"hello wor*") OR "something nice"^4 ... How can I use this to get a Query* object representing this request ? I'm assuming there is a method to do that, I just can't find it. And I don't want to write a parser myself ! Thanks in advance, -- |
From: Kostka B. <ko...@to...> - 2015-01-20 14:52:53
|
Unfortunately this method is not implemented. Use getFields()and filter out related fields manually. Something like std::vector<Field*> fields; const CL_NS(document)::Document::FieldsType * pFields = m_pDoc->getFields(); if ( pFields ) for ( CL_NS(document)::Document::FieldsType::const_iterator iFld = pFields->begin(); iFld != pFields->end(); iFld++ ) if (_tcscmp( (*iFld)->name(), name ) == 0 ) fields.push_back( *iFld ); You can also use Document::getValues(const TCHAR* name) if you need only values Hope this help Borek From: norbert barichard [mailto:nor...@di...] Sent: Tuesday, January 20, 2015 2:04 PM To: clu...@li... Subject: [CLucene-dev] Document::getFields( const TCHAR* name, std::vector<Field*> &ret ) missing ? Hello, I'm using CLucene 2.3.3.4 in my project, and so far everything was going fine, until I wanted to use the Documenbt::getFields method with arguments. It's listed in the doxygen and is in Document.h, but as I was getting a link error, I checked Document.cpp, and the method isn't implemented there. It's a bit annoying, did I miss something or has this method never been implemented ? Thanks in advance -- [cid:image001.png@01D034C6.8357C240] |
From: norbert b. <nor...@di...> - 2015-01-20 13:20:09
|
Hello, I'm using CLucene 2.3.3.4 in my project, and so far everything was going fine, until I wanted to use the Documenbt::getFields method with arguments. It's listed in the doxygen and is in Document.h, but as I was getting a link error, I checked Document.cpp, and the method isn't implemented there. It's a bit annoying, did I miss something or has this method never been implemented ? Thanks in advance -- |
From: Itamar Syn-H. <it...@co...> - 2014-12-16 21:06:02
|
Don't even try.. just go with Elasticsearch or Solr. My $0.02 -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Author of RavenDB in Action <http://manning.com/synhershko/> On Tue, Dec 16, 2014 at 3:36 PM, Shailesh Birari <sbi...@gm...> wrote: > > Hello, > > I am new to CLucene. I am looking for an example or document to implement > distributed CLucene Index. Index should get created on Multiple machines > and search should be run in parallel on all machines and should return > consolidated results. > > Currently I am going through CLucene API document but couldn't found any > related reference. > Can you please point me to proper document/sample ? > > Thanks, > Shailesh > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
From: Shailesh B. <sbi...@gm...> - 2014-12-16 20:36:23
|
Hello, I am new to CLucene. I am looking for an example or document to implement distributed CLucene Index. Index should get created on Multiple machines and search should be run in parallel on all machines and should return consolidated results. Currently I am going through CLucene API document but couldn't found any related reference. Can you please point me to proper document/sample ? Thanks, Shailesh |
From: Vikash J. <vik...@ya...> - 2014-12-11 10:01:18
|
Hi CLucene Developers, Greetings!!! I am new to CLucene. I build clucene static library using below command with clucene-core-2.3.3.4 code as per instructions in clucene-core-2.3.3.4/INSTALL: cmake -DBUILD_STATIC_LIBRARIES=ON ../ After that I was able to generate below libs: libclucene-core.so libclucene-core.so.2.3.3.4 libclucene-shared.so libclucene-shared.so.2.3.3.4 libclucene-core.so.1 libclucene-core-static.a libclucene-shared.so.1 libclucene-shared-static.a There after I was able to run a sample app to create indexes using libclucene-shared but getting below errors when I am using libclucene-core-static.a undefined reference to `lucene::document::Document::Document()' undefined reference to `lucene::document::Field::Field(wchar_t const*, wchar_t const*, int, bool)' undefined reference to `lucene::document::Document::add(lucene::document::Field&)' undefined reference to `lucene::document::Field::Field(wchar_t const*, wchar_t const*, int, bool)' undefined reference to `lucene::document::Document::add(lucene::document::Field&)' undefined reference to `lucene::index::IndexWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*)' undefined reference to `lucene::document::Document::clear()' undefined reference to `lucene::document::Document::~Document()' undefined reference to `lucene::document::Document::~Document()' undefined reference to `lucene::analysis::standard::StandardAnalyzer::StandardAnalyzer()' undefined reference to `lucene::index::IndexReader::indexExists(char const*)' undefined reference to `lucene::index::IndexReader::isLocked(char const*)' undefined reference to `lucene::index::IndexReader::unlock(char const*)' undefined reference to `lucene::index::IndexWriter::IndexWriter(char const*, lucene::analysis::Analyzer*, bool)' undefined reference to `lucene::index::IndexWriter::IndexWriter(char const*, lucene::analysis::Analyzer*, bool)' undefined reference to `lucene::index::IndexWriter::setMaxFieldLength(int)' undefined reference to `lucene::index::IndexWriter::setUseCompoundFile(bool)' undefined reference to `lucene::index::IndexWriter::setUseCompoundFile(bool)' undefined reference to `lucene::index::IndexWriter::optimize(bool)' undefined reference to `lucene::index::IndexWriter::close(bool)' undefined reference to `lucene::analysis::standard::StandardAnalyzer::~StandardAnalyzer()' undefined reference to `lucene::analysis::standard::StandardAnalyzer::~StandardAnalyzer()' Please help me how to use static library. Thanks & regards, Vikash Jindal |
From: Timo S. <ts...@ik...> - 2014-09-17 09:10:09
|
This iteration loops forever: void IndexWriter::addMergeException(MergePolicy::OneMerge* _merge) { SCOPED_LOCK_MUTEX(THIS_LOCK) if ( mergeGen == _merge->mergeGen ){ MergeExceptionsType::iterator itr = mergeExceptions->begin(); while ( itr != mergeExceptions->end() ){ MergePolicy::OneMerge* x = *itr; if ( x == _merge ){ return; } } } mergeExceptions->push_back(_merge); } Apparently it would require itr++ at the end of while-loop? |
From: Stephan B. <sbe...@re...> - 2014-05-22 07:13:11
|
FYI: -------- Original Message -------- Subject: [Libreoffice-commits] core.git: external/clucene Date: Wed, 21 May 2014 12:32:58 -0700 (PDT) From: Stephan Bergmann <sbe...@re...> Reply-To: lib...@li... To: lib...@li... external/clucene/UnpackedTarball_clucene.mk | 1 + external/clucene/patches/clucene-asan.patch | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+) New commits: commit c01904f91f76b771362ab9fb71289feba1e342f6 Author: Stephan Bergmann <sbe...@re...> Date: Wed May 21 21:26:36 2014 +0200 external/clucene: Avoid InitOrderFiasco ...as reported by AddressSanitizer, where src/core/CLucene/index/IndexWriter.cpp initializes IndexWriter::MAX_TERM_LENGTH with the value of DocumentsWriter::MAX_TERM_LENGTH before the latter is initialized in src/core/CLucene/index/DocumentsWriter.cpp. But turns out that IndexWriter::MAX_TERM_LENGTH is completely unused. Change-Id: Ica01186584ec05a989a13dc58823f4751e8724e2 diff --git a/external/clucene/UnpackedTarball_clucene.mk b/external/clucene/UnpackedTarball_clucene.mk index efa7747..d059241 100644 --- a/external/clucene/UnpackedTarball_clucene.mk +++ b/external/clucene/UnpackedTarball_clucene.mk @@ -34,6 +34,7 @@ $(eval $(call gb_UnpackedTarball_add_patches,clucene,\ external/clucene/patches/clucene-git1-win64.patch \ external/clucene/patches/clucene-ub.patch \ external/clucene/patches/clucene-mutex.patch \ + external/clucene/patches/clucene-asan.patch \ )) ifneq ($(OS),WNT) diff --git a/external/clucene/patches/clucene-asan.patch b/external/clucene/patches/clucene-asan.patch new file mode 100644 index 0000000..51adfad --- /dev/null +++ b/external/clucene/patches/clucene-asan.patch @@ -0,0 +1,26 @@ +--- src/core/CLucene/index/IndexWriter.cpp ++++ src/core/CLucene/index/IndexWriter.cpp +@@ -53,7 +53,6 @@ + + DEFINE_MUTEX(IndexWriter::MESSAGE_ID_LOCK) + int32_t IndexWriter::MESSAGE_ID = 0; +-const int32_t IndexWriter::MAX_TERM_LENGTH = DocumentsWriter::MAX_TERM_LENGTH; + + class IndexWriter::Internal{ + public: +--- src/core/CLucene/index/IndexWriter.h ++++ src/core/CLucene/index/IndexWriter.h +@@ -384,13 +384,6 @@ + */ + static const int32_t DEFAULT_MAX_MERGE_DOCS; + +- /** +- * Absolute hard maximum length for a term. If a term +- * arrives from the analyzer longer than this length, it +- * is skipped and a message is printed to infoStream, if +- * set (see {@link #setInfoStream}). +- */ +- static const int32_t MAX_TERM_LENGTH; + + + /* Determines how often segment indices are merged by addDocument(). With |