You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(16) |
Jul
(56) |
Aug
(2) |
Sep
(62) |
Oct
(71) |
Nov
(45) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(12) |
Feb
(22) |
Mar
|
Apr
(62) |
May
(15) |
Jun
(57) |
Jul
(4) |
Aug
(24) |
Sep
(7) |
Oct
(34) |
Nov
(81) |
Dec
(41) |
2005 |
Jan
(70) |
Feb
(51) |
Mar
(46) |
Apr
(16) |
May
(22) |
Jun
(34) |
Jul
(23) |
Aug
(13) |
Sep
(43) |
Oct
(42) |
Nov
(54) |
Dec
(68) |
2006 |
Jan
(81) |
Feb
(43) |
Mar
(64) |
Apr
(141) |
May
(37) |
Jun
(101) |
Jul
(112) |
Aug
(32) |
Sep
(85) |
Oct
(63) |
Nov
(84) |
Dec
(81) |
2007 |
Jan
(25) |
Feb
(64) |
Mar
(46) |
Apr
(28) |
May
(14) |
Jun
(42) |
Jul
(19) |
Aug
(34) |
Sep
(29) |
Oct
(25) |
Nov
(12) |
Dec
(9) |
2008 |
Jan
(15) |
Feb
(34) |
Mar
(37) |
Apr
(23) |
May
(18) |
Jun
(47) |
Jul
(28) |
Aug
(61) |
Sep
(29) |
Oct
(48) |
Nov
(24) |
Dec
(79) |
2009 |
Jan
(48) |
Feb
(50) |
Mar
(28) |
Apr
(10) |
May
(51) |
Jun
(22) |
Jul
(125) |
Aug
(29) |
Sep
(38) |
Oct
(29) |
Nov
(58) |
Dec
(32) |
2010 |
Jan
(15) |
Feb
(10) |
Mar
(12) |
Apr
(64) |
May
(4) |
Jun
(81) |
Jul
(41) |
Aug
(82) |
Sep
(84) |
Oct
(35) |
Nov
(43) |
Dec
(26) |
2011 |
Jan
(59) |
Feb
(25) |
Mar
(23) |
Apr
(14) |
May
(22) |
Jun
(8) |
Jul
(5) |
Aug
(20) |
Sep
(10) |
Oct
(12) |
Nov
(29) |
Dec
(7) |
2012 |
Jan
(1) |
Feb
(22) |
Mar
(9) |
Apr
(5) |
May
(2) |
Jun
|
Jul
(6) |
Aug
(2) |
Sep
|
Oct
(5) |
Nov
(9) |
Dec
(10) |
2013 |
Jan
(9) |
Feb
(3) |
Mar
(2) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(5) |
Sep
|
Oct
(3) |
Nov
(3) |
Dec
(2) |
2014 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(10) |
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(3) |
2015 |
Jan
(8) |
Feb
(3) |
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(8) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
From: yunwen ye <yun...@gm...> - 2011-03-13 06:21:46
|
Hi there, I am trying to upgrade from 0.9.21 to the current git snapshot I downloaded. When I was trying to build using Visual Studio 10 on windows 7, I got the error: LuceneThreads.h(143): A valid thread library was not found While I was running Cmake, it complained that pthread was not found (which is right), and then it said Found Threads: true Any clues on what might have caused this problem? Thanks a lot. --yunwen |
From: Veit J. <nun...@go...> - 2011-03-10 16:30:11
|
Hi Shivaji! 2011/3/8 shivaji badade <shi...@gm...>: > > Do anybody has the diff in these two version on performance data, which will > help in moving me to 2.3.2 version. I have no diff. But I changed once from 0.9.21b to 2.3.2 (about 2 years ago). The most changes I had to made where in classes extenting existing features of CLucene (own queries, scorers, weights, filters, etc.) and considering some renamed methods, e.g., Term::termBuffer() instead of Term::termText(). So, if you not have extension of CLucene classes, the migration should be rather easy. And if you ran into problems, I can offer to check, if I had the same problem and how I solved it. Kind regards, Veit |
From: Veit J. <nun...@go...> - 2011-03-10 16:18:28
|
2011/3/8 Rustem Alimov <ar...@gm...>: > Hi, > > src/core/CLucene/util/BitSet.cpp : line 93 > > [code] > _count = -1; <-- FIRST SET > > if (val) > bits[bit >> 3] |= 1 << (bit & 7); > else > bits[bit >> 3] &= ~(1 << (bit & 7)); > _count =-1; <-- SECOND SET (unnecessary???) > [/code] You are right. The second one isn't necessary. I will remove it. Veit |
From: shivaji b. <shi...@gm...> - 2011-03-08 08:53:55
|
Hi, I am using 0.9.21 for quite long time in my search project. recently I have checked the 2.3.2 version performance in indexing. It has 5 fold improvement in indexing time. I wanted to know more about the improvements can be found in 2.3.2. Do anybody has the diff in these two version on performance data, which will help in moving me to 2.3.2 version. Please share ...... BR, Shivaji. |
From: Rustem A. <ar...@gm...> - 2011-03-08 08:19:17
|
Hi, src/core/CLucene/util/BitSet.cpp : line 93 [code] _count = -1; <-- FIRST SET if (val) bits[bit >> 3] |= 1 << (bit & 7); else bits[bit >> 3] &= ~(1 << (bit & 7)); _count =-1; <-- SECOND SET (unnecessary???) [/code] |
From: muhammad i. <m.i...@gm...> - 2011-03-08 00:32:05
|
Hi Jiri, Thank you for your concern. How can I use the highlighter to get the list of the terms? Thank you On Mon, Mar 7, 2011 at 2:04 PM, < clu...@li...> wrote: > Send CLucene-developers mailing list submissions to > clu...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/clucene-developers > or, via email, send a message with subject or body 'help' to > clu...@li... > > You can reach the person managing the list at > clu...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of CLucene-developers digest..." > > > Today's Topics: > > 1. Re: Is it possible to highlight single characters > usinghighlighter? (?pl?chal Ji??) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 7 Mar 2011 09:02:33 +0100 > From: ?pl?chal Ji?? <spl...@to...> > Subject: Re: [CLucene-dev] Is it possible to highlight single > characters usinghighlighter? > To: <clu...@li...> > Message-ID: <BC9...@ex...kd.local> > Content-Type: text/plain; charset="utf-8" > > Hello, > > > > so far I know this is not possible just by using a highlighter because is > highlights terms that match your query. > > But you could use the highlighter to retrieve a list of matching terms and > highlight the characters yourself. > > > > Jiri > > > > From: muhammad ismael [mailto:m.i...@gm...] > Sent: Wednesday, March 02, 2011 7:34 PM > To: CLu...@li... > Subject: [CLucene-dev] Is it possible to highlight single characters > usinghighlighter? > > > > Hi all, > > Can I use Highlighter to highlight single characters, because I am using it > now but it highlights the whole word that contains my charachters > > Mohammad Ismael > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > What You Don't Know About Data Connectivity CAN Hurt You > This paper provides an overview of data connectivity, details > its effect on application quality, and explores various alternative > solutions. http://p.sf.net/sfu/progress-d2d > > ------------------------------ > > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > End of CLucene-developers Digest, Vol 59, Issue 2 > ************************************************* > -- Sincerely -------------------- Mohammad Ismael Software Developer Mobile:+20114753575 |
From: Šplíchal J. <spl...@to...> - 2011-03-07 10:48:15
|
Hello, so far I know this is not possible just by using a highlighter because is highlights terms that match your query. But you could use the highlighter to retrieve a list of matching terms and highlight the characters yourself. Jiri From: muhammad ismael [mailto:m.i...@gm...] Sent: Wednesday, March 02, 2011 7:34 PM To: CLu...@li... Subject: [CLucene-dev] Is it possible to highlight single characters usinghighlighter? Hi all, Can I use Highlighter to highlight single characters, because I am using it now but it highlights the whole word that contains my charachters Mohammad Ismael |
From: muhammad i. <m.i...@gm...> - 2011-03-02 18:34:32
|
Hi all, Can I use Highlighter to highlight single characters, because I am using it now but it highlights the whole word that contains my charachters Mohammad Ismael |
From: Veit J. <nun...@go...> - 2011-02-16 16:32:53
|
2011/2/9 Veit Jahns <nun...@go...>: > it seems that addIndexesNoOptimize() cause the trouble here. I can't > open the index with Luke. I got a "read past EOF" error. But only from > Luke. The CLucene-IndexReader has no problem with the index---besides > that sorting doesn't work correctly. I will keep at it. I added a ticket [1] for this (can be assigned to me) and pushed the test case in the branch tracker_3183890_fix [2]. Veit [1] https://sourceforge.net/tracker/?func=detail&aid=3183890&group_id=80013&atid=558446 [2] http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=shortlog;h=refs/heads/tracker_3183890_fix |
From: Veit J. <nun...@go...> - 2011-02-09 17:25:20
|
Hi Alexander, it seems that addIndexesNoOptimize() cause the trouble here. I can't open the index with Luke. I got a "read past EOF" error. But only from Luke. The CLucene-IndexReader has no problem with the index---besides that sorting doesn't work correctly. I will keep at it. Kind regards, Veit |
From: Lien, J. <jen...@ca...> - 2011-02-09 08:08:05
|
Hi, As far as I know (and I would like to be corrected...) there is no general macro on solaris that aids in ANSI/UNICODE duality. UTF-16 string literals are prefixed with U (like U"ASCII_String") - but I've not found anything like the _T or TEXT macros defined in <tchar.h> on Win32 The reason for _T breaking the Solaris build now is the upgrade to SunStudio 12.1. This version includes a default STL library that uses _T internally for template parameters. I don't think the CLucene build will break on other platforms due to the _T macro replacement - but depending on the use of this macro in client code there might be a need for at least some rewrites. /jens > -----Original Message----- > From: Itamar Syn-Hershko [mailto:it...@co...] > Sent: 8. februar 2011 23:10 > To: clu...@li... > Subject: Re: [CLucene-dev] Solaris 10 build issues > > Jens, > > > Thanks for your work on this. > > > Re the _T macro - what is the general practice or macro used in Solaris to > achieve ANSI / Unicode duality in strings? I'm curious because I think CLucene > did compile on Solaris, and the _T macro has been there forever. Perhaps > there is a way of macro-hacking it to work? > > > If CLucene doesn't break compilation on all other platforms we'll be more > than happy to include it... > > > Itamar. > > > On 8/2/2011 2:39 PM, Lien, Jens wrote: > > > Hi Veit, > > > > See below for comments. I'll split the patch (and create two issues in the > > tracker) - one for the solaris build fixes and one for the ArrayBase issue > > > > /jens > > > >> -----Original Message----- > >> From: Veit Jahns [mailto:nun...@go...] > >> Sent: 7. februar 2011 17:02 > >> To: clu...@li... > >> Subject: Re: [CLucene-dev] Solaris 10 build issues > >> > >> Hi Jens! > >> > >> 2011/2/3 Lien, Jens<jen...@ca...>: > >>> All, > >>> > >>> I’ve recently worked on getting CLucene building and running on Solaris > 10 > >> using Sun Studio 12.1 compilers. To get this (almost) done, I’ve had to do a > >> few fixes. Before I submit a patch I would like to discuss the proposed > >> changes: > >>> 1) Usage of the _T macro: > >>> The STL version used default by the 12.1 compiler uses _T heavily for > >> internal template types and gets confused by the macro expansion > >> SYMBOL__T defined in src/shared/CMakeLists.txt. > >>> Replacing _T with e.g. clT makes the compiler compile almost all the > code. > >> I'm aware that client code might be using this macro already, but to be > >> compatible with 12.1 (both default STL version as well as the --stlport4 > >> version) I think this needs to be fixed. > >> > >> Did you compiled it in ASCII mode and UNICODE mode? I suppose that can > >> cause problems on Windows plattforms, because the _T macro has a > special > >> meaning. At least it should be chechked there. > >> > > Compiled both ASCII and UNICODE mode on multiple platforms (Win32, > > Solaris and linux) > > > > The _T macro is used on the Windows platform to simplify usage of the > > wide string literal prefix (L). But in clucene _T is defined for all platforms, > > expanded to L"" on Windows compiled with Unicode. > > > >>> 2) Updating use of the _CLFINALLY(...) macro. Removed space > >>> (_CLFINALLY (...) to _CLFINALLY(...) > >> What is the purpose of this change? > >> > > Solaris compiler complaints on space between macro name and argument > list. > > > >>> 3) Change the type used for insertion in the fieldSelections map > >>> (FieldSelector.cpp, line 60), now using FieldSelectionType::value_type > >> Makes sense to me. > >> > >>> 4) Added macro for return value in searchDocs (TestIndexSearcher.cpp). > >> Compiler complaints. > >> > >> What are the compiler complaints? > > Missing return value. > > > >>> 5) Added copy constructor and assignment operator in ArrayBase > >>> (Array.h). The lack of these made both cl_demo and cl_test to fail on > >>> solaris. Quite obvious actually - and scary since the Win32 and Linux > >>> builds works perfectly without this fix. (Evaluate usage in > >>> DocumentsWriterThreadState.cpp) > >> Ok. > >> > >>> After all of these changes, I'm able to compile, run cl_demo and cl_test > on > >> Solaris 10, Win32 and Linux using the same sources. However, the sort > tests > >> fails on Solaris, this I'll need to look more into. > >> > >> That would be great! > >> > >> Kind regards, > >> > >> Veit > >> > >> ------------------------------------------------------------------------------ > >> The modern datacenter depends on network connectivity to access > >> resources and provide services. The best practices for maximizing a > physical > >> server's connectivity to a physical network are well understood - see how > >> these rules translate into the virtual world? > >> http://p.sf.net/sfu/oracle-sfdevnlfb > >> _______________________________________________ > >> CLucene-developers mailing list > >> CLu...@li... > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > Capgemini is a trading name used by the Capgemini Group of companies > which includes Capgemini Norge AS, a company registered in Norway > (number 943574537) whose registered office is at Hoffsveien 1 D - Pb. 475, > Skøyen – 0214 Oslo. > > > > > > > > > > > > > > This message contains information that may be privileged or confidential > and is the property of the Capgemini Group. It is > > intended only for the person to whom it is addressed. If you are not the > intended recipient, you are not authorized to > > read, print, retain, copy, disseminate, distribute, or use this message or any > part thereof. If you receive this message > > in error, please notify the sender immediately and delete all copies of this > message. > > ------------------------------------------------------------------------------ > > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > > Pinpoint memory and threading errors before they happen. > > Find and fix more than 250 security defects in the development cycle. > > Locate bottlenecks in serial and parallel code that limit performance. > > http://p.sf.net/sfu/intel-dev2devfeb > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Norge AS, a company registered in Norway (number 943574537) whose registered office is at Hoffsveien 1 D - Pb. 475, Skøyen – 0214 Oslo. This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. |
From: Itamar Syn-H. <it...@co...> - 2011-02-08 22:11:06
|
Jens, Thanks for your work on this. Re the _T macro - what is the general practice or macro used in Solaris to achieve ANSI / Unicode duality in strings? I'm curious because I think CLucene did compile on Solaris, and the _T macro has been there forever. Perhaps there is a way of macro-hacking it to work? If CLucene doesn't break compilation on all other platforms we'll be more than happy to include it... Itamar. On 8/2/2011 2:39 PM, Lien, Jens wrote: > Hi Veit, > > See below for comments. I'll split the patch (and create two issues in the > tracker) - one for the solaris build fixes and one for the ArrayBase issue > > /jens > >> -----Original Message----- >> From: Veit Jahns [mailto:nun...@go...] >> Sent: 7. februar 2011 17:02 >> To: clu...@li... >> Subject: Re: [CLucene-dev] Solaris 10 build issues >> >> Hi Jens! >> >> 2011/2/3 Lien, Jens<jen...@ca...>: >>> All, >>> >>> I’ve recently worked on getting CLucene building and running on Solaris 10 >> using Sun Studio 12.1 compilers. To get this (almost) done, I’ve had to do a >> few fixes. Before I submit a patch I would like to discuss the proposed >> changes: >>> 1) Usage of the _T macro: >>> The STL version used default by the 12.1 compiler uses _T heavily for >> internal template types and gets confused by the macro expansion >> SYMBOL__T defined in src/shared/CMakeLists.txt. >>> Replacing _T with e.g. clT makes the compiler compile almost all the code. >> I'm aware that client code might be using this macro already, but to be >> compatible with 12.1 (both default STL version as well as the --stlport4 >> version) I think this needs to be fixed. >> >> Did you compiled it in ASCII mode and UNICODE mode? I suppose that can >> cause problems on Windows plattforms, because the _T macro has a special >> meaning. At least it should be chechked there. >> > Compiled both ASCII and UNICODE mode on multiple platforms (Win32, > Solaris and linux) > > The _T macro is used on the Windows platform to simplify usage of the > wide string literal prefix (L). But in clucene _T is defined for all platforms, > expanded to L"" on Windows compiled with Unicode. > >>> 2) Updating use of the _CLFINALLY(...) macro. Removed space >>> (_CLFINALLY (...) to _CLFINALLY(...) >> What is the purpose of this change? >> > Solaris compiler complaints on space between macro name and argument list. > >>> 3) Change the type used for insertion in the fieldSelections map >>> (FieldSelector.cpp, line 60), now using FieldSelectionType::value_type >> Makes sense to me. >> >>> 4) Added macro for return value in searchDocs (TestIndexSearcher.cpp). >> Compiler complaints. >> >> What are the compiler complaints? > Missing return value. > >>> 5) Added copy constructor and assignment operator in ArrayBase >>> (Array.h). The lack of these made both cl_demo and cl_test to fail on >>> solaris. Quite obvious actually - and scary since the Win32 and Linux >>> builds works perfectly without this fix. (Evaluate usage in >>> DocumentsWriterThreadState.cpp) >> Ok. >> >>> After all of these changes, I'm able to compile, run cl_demo and cl_test on >> Solaris 10, Win32 and Linux using the same sources. However, the sort tests >> fails on Solaris, this I'll need to look more into. >> >> That would be great! >> >> Kind regards, >> >> Veit >> >> ------------------------------------------------------------------------------ >> The modern datacenter depends on network connectivity to access >> resources and provide services. The best practices for maximizing a physical >> server's connectivity to a physical network are well understood - see how >> these rules translate into the virtual world? >> http://p.sf.net/sfu/oracle-sfdevnlfb >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Norge AS, a company registered in Norway (number 943574537) whose registered office is at Hoffsveien 1 D - Pb. 475, Skøyen – 0214 Oslo. > > > > > > > This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is > intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to > read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message > in error, please notify the sender immediately and delete all copies of this message. > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers |
From: Itamar Syn-H. <it...@co...> - 2011-02-08 22:01:29
|
I understand perfectly well. I was merely pointing two facts: 1. If malloc/realloc fails, it is not enough to ignore the fact that buffer == NULL. The indexing process has to be aborted in some way. Meaning, your added if statement should have an else clause, probably with an exception thrown. Or a CND instead to verify buffer != NULL. 2. Code redundancy in growBuffer and friends. I'm interested in hearing Ben's opinion on what actions to take, or yours if you have any. I will tackle later when I'll have more time. Itamar. On 8/2/2011 7:19 PM, Rustem Alimov wrote: > Hi, > > you don't understand... If buffer == NULL, then > _tcsncpy(buffer,_term->text(), bufferLength); - at this point we > have potential bug > > Need > > [code] > //Instantiate the new buffer + 1 is needed for terminator '\0' > if ( buffer == NULL ) > buffer = (TCHAR*)malloc(sizeof(TCHAR) * (bufferLength+1)); > else > buffer = (TCHAR*)realloc(buffer, sizeof(TCHAR) * > (bufferLength+1)); > > if (buffer != NULL && (copy || force_copy) ){ > //Copy the text of term into buffer > _tcsncpy(buffer,_term->text(), bufferLength); > } > [/code] > > > 2011/1/27 Itamar Syn-Hershko <it...@co... > <mailto:it...@co...>> > > Hi, > > > If malloc / realloc returns NULL the indexing process has to be > aborted anyway, and the only way I can think of doing this is > throwing an exception. Did you have other idea in mind? > > > Also, I'm not sure why growBuffer is used there at all. This is a > simple TCHAR array being used as a buffer, why can't we generalize > this piece of code or use some STL alternatives? > > > Looking in files_list.txt it seems to be Ben's code, so perhaps he > can give us some answers... > > > Itamar. > > > On 22/10/2010 12:12 PM, Rustem Alimov wrote: > >> Hi, >> >> src/core/CLucene/index/SegmentTermEnum.cpp : line 377 >> >> [code] >> //Instantiate the new buffer + 1 is needed for terminator >> '\0' >> if ( buffer == NULL ) >> buffer = (TCHAR*)malloc(sizeof(TCHAR) * >> (bufferLength+1)); >> else >> buffer = (TCHAR*)realloc(buffer, sizeof(TCHAR) * >> (bufferLength+1)); >> >> if ( copy || force_copy){ >> //Copy the text of term into buffer >> _tcsncpy(buffer,_term->text(),bufferLength); >> } >> [/code] >> >> If malloc / realloc return NULL? >> >> >> ------------------------------------------------------------------------------ >> Nokia and AT&T present the 2010 Calling All Innovators-North America contest >> Create new apps& games for the Nokia N8 for consumers in U.S. and Canada >> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing >> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store >> http://p.sf.net/sfu/nokia-dev2dev >> >> >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... <mailto:CLu...@li...> >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better > price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > <mailto:CLu...@li...> > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > > > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers |
From: Rustem A. <ar...@gm...> - 2011-02-08 17:19:26
|
Hi, you don't understand... If buffer == NULL, then _tcsncpy(buffer,_term->text(), bufferLength); - at this point we have potential bug Need [code] //Instantiate the new buffer + 1 is needed for terminator '\0' if ( buffer == NULL ) buffer = (TCHAR*)malloc(sizeof(TCHAR) * (bufferLength+1)); else buffer = (TCHAR*)realloc(buffer, sizeof(TCHAR) * (bufferLength+1)); if (buffer != NULL && (copy || force_copy) ){ //Copy the text of term into buffer _tcsncpy(buffer,_term->text(), bufferLength); } [/code] 2011/1/27 Itamar Syn-Hershko <it...@co...> > Hi, > > > If malloc / realloc returns NULL the indexing process has to be aborted > anyway, and the only way I can think of doing this is throwing an exception. > Did you have other idea in mind? > > > Also, I'm not sure why growBuffer is used there at all. This is a simple > TCHAR array being used as a buffer, why can't we generalize this piece of > code or use some STL alternatives? > > > Looking in files_list.txt it seems to be Ben's code, so perhaps he can > give us some answers... > > > Itamar. > > > On 22/10/2010 12:12 PM, Rustem Alimov wrote: > > Hi, > > src/core/CLucene/index/SegmentTermEnum.cpp : line 377 > > [code] > //Instantiate the new buffer + 1 is needed for terminator '\0' > if ( buffer == NULL ) > buffer = (TCHAR*)malloc(sizeof(TCHAR) * (bufferLength+1)); > else > buffer = (TCHAR*)realloc(buffer, sizeof(TCHAR) * > (bufferLength+1)); > > if ( copy || force_copy){ > //Copy the text of term into buffer > _tcsncpy(buffer,_term->text(),bufferLength); > } > [/code] > > If malloc / realloc return NULL? > > > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev > > > _______________________________________________ > CLucene-developers mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better > price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
From: Lien, J. <jen...@ca...> - 2011-02-08 13:25:49
|
Hi Veit, See below for comments. I'll split the patch (and create two issues in the tracker) - one for the solaris build fixes and one for the ArrayBase issue /jens > -----Original Message----- > From: Veit Jahns [mailto:nun...@go...] > Sent: 7. februar 2011 17:02 > To: clu...@li... > Subject: Re: [CLucene-dev] Solaris 10 build issues > > Hi Jens! > > 2011/2/3 Lien, Jens <jen...@ca...>: > > All, > > > > I’ve recently worked on getting CLucene building and running on Solaris 10 > using Sun Studio 12.1 compilers. To get this (almost) done, I’ve had to do a > few fixes. Before I submit a patch I would like to discuss the proposed > changes: > > > > 1) Usage of the _T macro: > > The STL version used default by the 12.1 compiler uses _T heavily for > internal template types and gets confused by the macro expansion > SYMBOL__T defined in src/shared/CMakeLists.txt. > > > > Replacing _T with e.g. clT makes the compiler compile almost all the code. > I'm aware that client code might be using this macro already, but to be > compatible with 12.1 (both default STL version as well as the --stlport4 > version) I think this needs to be fixed. > > Did you compiled it in ASCII mode and UNICODE mode? I suppose that can > cause problems on Windows plattforms, because the _T macro has a special > meaning. At least it should be chechked there. > Compiled both ASCII and UNICODE mode on multiple platforms (Win32, Solaris and linux) The _T macro is used on the Windows platform to simplify usage of the wide string literal prefix (L). But in clucene _T is defined for all platforms, expanded to L"" on Windows compiled with Unicode. > > 2) Updating use of the _CLFINALLY(...) macro. Removed space > > (_CLFINALLY (...) to _CLFINALLY(...) > > What is the purpose of this change? > Solaris compiler complaints on space between macro name and argument list. > > 3) Change the type used for insertion in the fieldSelections map > > (FieldSelector.cpp, line 60), now using FieldSelectionType::value_type > > Makes sense to me. > > > 4) Added macro for return value in searchDocs (TestIndexSearcher.cpp). > Compiler complaints. > > What are the compiler complaints? Missing return value. > > > 5) Added copy constructor and assignment operator in ArrayBase > > (Array.h). The lack of these made both cl_demo and cl_test to fail on > > solaris. Quite obvious actually - and scary since the Win32 and Linux > > builds works perfectly without this fix. (Evaluate usage in > > DocumentsWriterThreadState.cpp) > > Ok. > > > After all of these changes, I'm able to compile, run cl_demo and cl_test on > Solaris 10, Win32 and Linux using the same sources. However, the sort tests > fails on Solaris, this I'll need to look more into. > > That would be great! > > Kind regards, > > Veit > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access > resources and provide services. The best practices for maximizing a physical > server's connectivity to a physical network are well understood - see how > these rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Norge AS, a company registered in Norway (number 943574537) whose registered office is at Hoffsveien 1 D - Pb. 475, Skøyen – 0214 Oslo. This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. |
From: Pankaj J. <pan...@gm...> - 2011-02-08 13:22:53
|
Wow. That would be great help. Do send me patch. I will test it on win, linux, solaris x86, solaris sparc and aix. Thanks Pankaj -----Original Message----- From: "Lien, Jens" <jen...@ca...> Date: Tue, 8 Feb 2011 13:42:45 To: clu...@li...<clu...@li...> Reply-To: clu...@li... Subject: Re: [CLucene-dev] Solaris 10 build issues ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb |
From: Lien, J. <jen...@ca...> - 2011-02-08 13:14:59
|
Hi Pankaj, If you like I can send you a patch for you to try out. /jens From: Pankaj Jangid [mailto:pan...@gm...] Sent: 7. februar 2011 17:15 To: clu...@li... Subject: Re: [CLucene-dev] Solaris 10 build issues I am eagerly waiting for this patch to be included. Thanks Jens, for suggesting these changes. I had reported this in Nov, 2010. http://sourceforge.net/mailarchive/forum.php?thread_name=AANLkTimYj2pJy%2B8o2O8Kdnz9hafW0RUnt8R7j--NGoDQ%40mail.gmail.com&forum_name=clucene-developers -- Thanks & Regards Pankaj 2011/2/7 Veit Jahns <nun...@go...<mailto:nun...@go...>> Hi Jens! 2011/2/3 Lien, Jens <jen...@ca...<mailto:jen...@ca...>>: > All, > > I’ve recently worked on getting CLucene building and running on Solaris 10 using Sun Studio 12.1 compilers. To get this (almost) done, I’ve had to do a few fixes. Before I submit a patch I would like to discuss the proposed changes: > > 1) Usage of the _T macro: > The STL version used default by the 12.1 compiler uses _T heavily for internal template types and gets confused by the macro expansion SYMBOL__T defined in src/shared/CMakeLists.txt. > > Replacing _T with e.g. clT makes the compiler compile almost all the code. I'm aware that client code might be using this macro already, but to be compatible with 12.1 (both default STL version as well as the --stlport4 version) I think this needs to be fixed. Did you compiled it in ASCII mode and UNICODE mode? I suppose that can cause problems on Windows plattforms, because the _T macro has a special meaning. At least it should be chechked there. > 2) Updating use of the _CLFINALLY(...) macro. Removed space (_CLFINALLY (...) to _CLFINALLY(...) What is the purpose of this change? > 3) Change the type used for insertion in the fieldSelections map (FieldSelector.cpp, line 60), now using FieldSelectionType::value_type Makes sense to me. > 4) Added macro for return value in searchDocs (TestIndexSearcher.cpp). Compiler complaints. What are the compiler complaints? > 5) Added copy constructor and assignment operator in ArrayBase (Array.h). The lack of these made both cl_demo and cl_test to fail on solaris. Quite obvious actually - and scary since the Win32 and Linux builds works perfectly without this fix. (Evaluate usage in DocumentsWriterThreadState.cpp) Ok. > After all of these changes, I'm able to compile, run cl_demo and cl_test on Solaris 10, Win32 and Linux using the same sources. However, the sort tests fails on Solaris, this I'll need to look more into. That would be great! Kind regards, Veit ------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb _______________________________________________ CLucene-developers mailing list CLu...@li...<mailto:CLu...@li...> https://lists.sourceforge.net/lists/listinfo/clucene-developers ________________________________ Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini Norge AS, a company registered in Norway (number 943574537) whose registered office is at Hoffsveien 1 D - Pb. 475, Skøyen – 0214 Oslo. </PRE><p style="font-family:arial;color:grey" style="font-size:13px">This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.</p><PRE> |
From: Pankaj J. <pan...@gm...> - 2011-02-07 16:15:35
|
I am eagerly waiting for this patch to be included. Thanks Jens, for suggesting these changes. I had reported this in Nov, 2010. http://sourceforge.net/mailarchive/forum.php?thread_name=AANLkTimYj2pJy%2B8o2O8Kdnz9hafW0RUnt8R7j--NGoDQ%40mail.gmail.com&forum_name=clucene-developers -- Thanks & Regards Pankaj 2011/2/7 Veit Jahns <nun...@go...> > Hi Jens! > > 2011/2/3 Lien, Jens <jen...@ca...>: > > All, > > > > I’ve recently worked on getting CLucene building and running on Solaris > 10 using Sun Studio 12.1 compilers. To get this (almost) done, I’ve had to > do a few fixes. Before I submit a patch I would like to discuss the proposed > changes: > > > > 1) Usage of the _T macro: > > The STL version used default by the 12.1 compiler uses _T heavily for > internal template types and gets confused by the macro expansion SYMBOL__T > defined in src/shared/CMakeLists.txt. > > > > Replacing _T with e.g. clT makes the compiler compile almost all the > code. I'm aware that client code might be using this macro already, but to > be compatible with 12.1 (both default STL version as well as the --stlport4 > version) I think this needs to be fixed. > > Did you compiled it in ASCII mode and UNICODE mode? I suppose that can > cause problems on Windows plattforms, because the _T macro has a > special meaning. At least it should be chechked there. > > > 2) Updating use of the _CLFINALLY(...) macro. Removed space (_CLFINALLY > (...) to _CLFINALLY(...) > > What is the purpose of this change? > > > 3) Change the type used for insertion in the fieldSelections map > (FieldSelector.cpp, line 60), now using FieldSelectionType::value_type > > Makes sense to me. > > > 4) Added macro for return value in searchDocs (TestIndexSearcher.cpp). > Compiler complaints. > > What are the compiler complaints? > > > 5) Added copy constructor and assignment operator in ArrayBase (Array.h). > The lack of these made both cl_demo and cl_test to fail on solaris. Quite > obvious actually - and scary since the Win32 and Linux builds works > perfectly without this fix. (Evaluate usage in > DocumentsWriterThreadState.cpp) > > Ok. > > > After all of these changes, I'm able to compile, run cl_demo and cl_test > on Solaris 10, Win32 and Linux using the same sources. However, the sort > tests fails on Solaris, this I'll need to look more into. > > That would be great! > > Kind regards, > > Veit > > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access resources > and provide services. The best practices for maximizing a physical server's > connectivity to a physical network are well understood - see how these > rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > |
From: Veit J. <nun...@go...> - 2011-02-07 16:07:49
|
2011/2/7 Itamar Syn-Hershko <it...@co...>: > Hi Veit, good catch! > > > That is quite a straight-forward fix, can you please merge it to master? Done. Veit |
From: Veit J. <nun...@go...> - 2011-02-07 16:01:52
|
Hi Jens! 2011/2/3 Lien, Jens <jen...@ca...>: > All, > > I’ve recently worked on getting CLucene building and running on Solaris 10 using Sun Studio 12.1 compilers. To get this (almost) done, I’ve had to do a few fixes. Before I submit a patch I would like to discuss the proposed changes: > > 1) Usage of the _T macro: > The STL version used default by the 12.1 compiler uses _T heavily for internal template types and gets confused by the macro expansion SYMBOL__T defined in src/shared/CMakeLists.txt. > > Replacing _T with e.g. clT makes the compiler compile almost all the code. I'm aware that client code might be using this macro already, but to be compatible with 12.1 (both default STL version as well as the --stlport4 version) I think this needs to be fixed. Did you compiled it in ASCII mode and UNICODE mode? I suppose that can cause problems on Windows plattforms, because the _T macro has a special meaning. At least it should be chechked there. > 2) Updating use of the _CLFINALLY(...) macro. Removed space (_CLFINALLY (...) to _CLFINALLY(...) What is the purpose of this change? > 3) Change the type used for insertion in the fieldSelections map (FieldSelector.cpp, line 60), now using FieldSelectionType::value_type Makes sense to me. > 4) Added macro for return value in searchDocs (TestIndexSearcher.cpp). Compiler complaints. What are the compiler complaints? > 5) Added copy constructor and assignment operator in ArrayBase (Array.h). The lack of these made both cl_demo and cl_test to fail on solaris. Quite obvious actually - and scary since the Win32 and Linux builds works perfectly without this fix. (Evaluate usage in DocumentsWriterThreadState.cpp) Ok. > After all of these changes, I'm able to compile, run cl_demo and cl_test on Solaris 10, Win32 and Linux using the same sources. However, the sort tests fails on Solaris, this I'll need to look more into. That would be great! Kind regards, Veit |
From: Itamar Syn-H. <it...@co...> - 2011-02-07 15:54:30
|
Hi Veit, good catch! That is quite a straight-forward fix, can you please merge it to master? Itamar. On 7/2/2011 5:36 PM, Veit Jahns wrote: > Hi, > > one of my colleagues found a bug in BooleanScorer2. Due to an > erroneous port of the Java code prohibited scorers were never used in > the score() method. Test case and fix are pushed to the branch > BooleanScorer2_fix [1]. I added a MockScorer and MockHitCollector to > create this test case. Maybe they are useful for other test cases too. > > Kind regards, > > Veit > > [1] http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=shortlog;h=refs/heads/BooleanScorer2_fix > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access resources > and provide services. The best practices for maximizing a physical server's > connectivity to a physical network are well understood - see how these > rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
From: Veit J. <nun...@go...> - 2011-02-07 15:36:12
|
Hi, one of my colleagues found a bug in BooleanScorer2. Due to an erroneous port of the Java code prohibited scorers were never used in the score() method. Test case and fix are pushed to the branch BooleanScorer2_fix [1]. I added a MockScorer and MockHitCollector to create this test case. Maybe they are useful for other test cases too. Kind regards, Veit [1] http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=shortlog;h=refs/heads/BooleanScorer2_fix |
From: Ben v. K. <bva...@gm...> - 2011-02-06 23:22:05
|
i thought Isidor's mpi branch (isidor_mpi) had support for that? Or maybe that was remote searcher... i forget ben On Fri, Jan 28, 2011 at 3:52 AM, Itamar Syn-Hershko <it...@co...> wrote: > Hi, > > ParallelMultiSearcher wasn't ported yet. You are welcome to port it yourself > - have a look at search/ParallelMultiSearcher.java and > search/MultiSearcher.java. > > Itamar. > > On 8/11/2010 12:23 PM, Rajendra Prasad Murakonda wrote: > > I can's seem to find ParallelMultiSearcher. I couldn't locate the class in > the latest source code snap shot too. Is it not supported in cLucene? What > do I need to do to use it - I used MultiSearcher succesfully though. Any > pointers will be really helpful. > > Raj > > ------------------------------------------------------------------------------ > The Next 800 Companies to Lead America's Growth: New Video Whitepaper > David G. Thomson, author of the best-selling book "Blueprint to a > Billion" shares his insights and actions to help propel your > business during the next growth cycle. Listen Now! > http://p.sf.net/sfu/SAP-dev2dev > > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > -- ------------------------------------- Ben van Klinken Mob: 0401 921847 Em: be...@vi... |
From: Ahmed S. <ci7...@gm...> - 2011-02-05 13:19:57
|
The problem is solved, it was my mistake, by accident i have stored the file text without tokenization in the categorie field! Thanks for your help. Ahmed 2011/2/3, Ben van Klinken <bva...@gm...>: > Stored fields are kept as plain text. It is possible to compress the > fields if it is a lot of data, but you could look into not storing > certain fields (but of course you won't be able to retrieve the data > out of the document after a search). depending on your requirements > this may be interesting. > > another thing i suggest is looking at the index using a tool called > 'luke' (http://www.getopt.org/luke/). You can analyse what's going > on, see how much data there is, perhaps run the check index tool, > check to see if there are any extra segments that aren't used, etc. > > hope that helps > ben > > On Fri, Feb 4, 2011 at 7:00 AM, Ahmed Saidi <ci7...@gm...> wrote: >> i'm using an arabic analyzer, it analyze only arabic characters, please >> see >> the attached file. >> there is no duplicate document, and no IndexReader is open. >> >> Ahmed >> >> 2011/2/3 Ahmed Saidi <ci7...@gm...> >>> >>> i'm using an arabic analyzer, it analyze only arabic characters, please >>> see the attached file. >>> there is no duplicate document, and no IndexReader is open. >>> >>> Ahmed >>> 2011/2/3 Veit Jahns <nun...@go...> >>>> >>>> 2011/2/2 Ahmed Saidi <ci7...@gm...>: >>>> > Even after optimizing the index, the size is 20 gb. The size of the >>>> > data which i want to index is about 8 GB. >>>> >>>> Strange indeed. Just some further questions which came into my mind: >>>> >>>> - What kind of analyzer do you use for tokenizing? >>>> - Is the correct number of documents in the indexed and no document >>>> indexed twice? >>>> >>>> And this disuccussion [1] may be useful to you. >>>> >>>> > if i add a set of fields that have the same values to the index, will >>>> > clucene do any kind of compression? >>>> >>>> Not directly. But as far as I understand the index format [2] the >>>> terms are only stored in the term dictionary and which are references >>>> in an implicit manner in the frequency files. >>>> >>>> Veit >>>> >>>> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622 >>>> [2] http://lucene.apache.org/java/2_3_2/fileformats.html >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >>>> Finally, a world-class log management solution at an even better >>>> price-free! >>>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >>>> February 28th, so secure your free ArcSight Logger TODAY! >>>> http://p.sf.net/sfu/arcsight-sfd2d >>>> _______________________________________________ >>>> CLucene-developers mailing list >>>> CLu...@li... >>>> https://lists.sourceforge.net/lists/listinfo/clucene-developers >>> >> >> >> ------------------------------------------------------------------------------ >> The modern datacenter depends on network connectivity to access resources >> and provide services. The best practices for maximizing a physical >> server's >> connectivity to a physical network are well understood - see how these >> rules translate into the virtual world? >> http://p.sf.net/sfu/oracle-sfdevnlfb >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> > > > > -- > ------------------------------------- > Ben van Klinken > > Mob: 0401 921847 > Em: be...@vi... > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access resources > and provide services. The best practices for maximizing a physical server's > connectivity to a physical network are well understood - see how these > rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > -- Envoyé avec mon mobile |
From: Ben v. K. <bva...@gm...> - 2011-02-03 22:16:11
|
Stored fields are kept as plain text. It is possible to compress the fields if it is a lot of data, but you could look into not storing certain fields (but of course you won't be able to retrieve the data out of the document after a search). depending on your requirements this may be interesting. another thing i suggest is looking at the index using a tool called 'luke' (http://www.getopt.org/luke/). You can analyse what's going on, see how much data there is, perhaps run the check index tool, check to see if there are any extra segments that aren't used, etc. hope that helps ben On Fri, Feb 4, 2011 at 7:00 AM, Ahmed Saidi <ci7...@gm...> wrote: > i'm using an arabic analyzer, it analyze only arabic characters, please see > the attached file. > there is no duplicate document, and no IndexReader is open. > > Ahmed > > 2011/2/3 Ahmed Saidi <ci7...@gm...> >> >> i'm using an arabic analyzer, it analyze only arabic characters, please >> see the attached file. >> there is no duplicate document, and no IndexReader is open. >> >> Ahmed >> 2011/2/3 Veit Jahns <nun...@go...> >>> >>> 2011/2/2 Ahmed Saidi <ci7...@gm...>: >>> > Even after optimizing the index, the size is 20 gb. The size of the >>> > data which i want to index is about 8 GB. >>> >>> Strange indeed. Just some further questions which came into my mind: >>> >>> - What kind of analyzer do you use for tokenizing? >>> - Is the correct number of documents in the indexed and no document >>> indexed twice? >>> >>> And this disuccussion [1] may be useful to you. >>> >>> > if i add a set of fields that have the same values to the index, will >>> > clucene do any kind of compression? >>> >>> Not directly. But as far as I understand the index format [2] the >>> terms are only stored in the term dictionary and which are references >>> in an implicit manner in the frequency files. >>> >>> Veit >>> >>> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622 >>> [2] http://lucene.apache.org/java/2_3_2/fileformats.html >>> >>> >>> ------------------------------------------------------------------------------ >>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >>> Finally, a world-class log management solution at an even better >>> price-free! >>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >>> February 28th, so secure your free ArcSight Logger TODAY! >>> http://p.sf.net/sfu/arcsight-sfd2d >>> _______________________________________________ >>> CLucene-developers mailing list >>> CLu...@li... >>> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > > > ------------------------------------------------------------------------------ > The modern datacenter depends on network connectivity to access resources > and provide services. The best practices for maximizing a physical server's > connectivity to a physical network are well understood - see how these > rules translate into the virtual world? > http://p.sf.net/sfu/oracle-sfdevnlfb > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > -- ------------------------------------- Ben van Klinken Mob: 0401 921847 Em: be...@vi... |