You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(16) |
Jul
(56) |
Aug
(2) |
Sep
(62) |
Oct
(71) |
Nov
(45) |
Dec
(6) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(12) |
Feb
(22) |
Mar
|
Apr
(62) |
May
(15) |
Jun
(57) |
Jul
(4) |
Aug
(24) |
Sep
(7) |
Oct
(34) |
Nov
(81) |
Dec
(41) |
| 2005 |
Jan
(70) |
Feb
(51) |
Mar
(46) |
Apr
(16) |
May
(22) |
Jun
(34) |
Jul
(23) |
Aug
(13) |
Sep
(43) |
Oct
(42) |
Nov
(54) |
Dec
(68) |
| 2006 |
Jan
(81) |
Feb
(43) |
Mar
(64) |
Apr
(141) |
May
(37) |
Jun
(101) |
Jul
(112) |
Aug
(32) |
Sep
(85) |
Oct
(63) |
Nov
(84) |
Dec
(81) |
| 2007 |
Jan
(25) |
Feb
(64) |
Mar
(46) |
Apr
(28) |
May
(14) |
Jun
(42) |
Jul
(19) |
Aug
(34) |
Sep
(29) |
Oct
(25) |
Nov
(12) |
Dec
(9) |
| 2008 |
Jan
(15) |
Feb
(34) |
Mar
(37) |
Apr
(23) |
May
(18) |
Jun
(47) |
Jul
(28) |
Aug
(61) |
Sep
(29) |
Oct
(48) |
Nov
(24) |
Dec
(79) |
| 2009 |
Jan
(48) |
Feb
(50) |
Mar
(28) |
Apr
(10) |
May
(51) |
Jun
(22) |
Jul
(125) |
Aug
(29) |
Sep
(38) |
Oct
(29) |
Nov
(58) |
Dec
(32) |
| 2010 |
Jan
(15) |
Feb
(10) |
Mar
(12) |
Apr
(64) |
May
(4) |
Jun
(81) |
Jul
(41) |
Aug
(82) |
Sep
(84) |
Oct
(35) |
Nov
(43) |
Dec
(26) |
| 2011 |
Jan
(59) |
Feb
(25) |
Mar
(23) |
Apr
(14) |
May
(22) |
Jun
(8) |
Jul
(5) |
Aug
(20) |
Sep
(10) |
Oct
(12) |
Nov
(29) |
Dec
(7) |
| 2012 |
Jan
(1) |
Feb
(22) |
Mar
(9) |
Apr
(5) |
May
(2) |
Jun
|
Jul
(6) |
Aug
(2) |
Sep
|
Oct
(5) |
Nov
(9) |
Dec
(10) |
| 2013 |
Jan
(9) |
Feb
(3) |
Mar
(2) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(5) |
Sep
|
Oct
(3) |
Nov
(3) |
Dec
(2) |
| 2014 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(10) |
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(3) |
| 2015 |
Jan
(8) |
Feb
(3) |
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(8) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2020 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
| 2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Ben v. K. <s40...@st...> - 2003-10-21 05:43:14
|
I believe the release candidate for java lucene 1.3 does this. I want to upgrade clucene code to 1.3. But only when it's a stable release. cheers, ben On Mon, 20 Oct 2003 23:47:53 -0400, <am...@ma...> wrote: > Hi Ben > > Well, I was just thinking ahead and while browsing the Lucene list > someone > mentioned that they extended the ability of Lucene to actually pick > Analyzers > on a per field basis (there was some java code supplied as a sample), > which was intriguing in itself because it would open up alot of new > possiblities > for indexing various types of data. > > But for now I'll just settle with what we got here, I'm not complaining, > StandardAnalyzer is good enough for me :0 > > thanks > > -pedja > > > > ----- Original Message ----- > From: "Ben van Klinken" <s40...@st...> > To: <clu...@li...> > Sent: Monday, October 20, 2003 7:33 PM > Subject: Re: [CLucene-dev] file release > > >> Pedja, >> >> I don't think it matters. The standardanalyzer has slightly more >> overhead >> than the simpleanalyzer so it's better to use it where possible. THe >> analyzers are only used when dealing with queries or when inserting >> documents(to split the text up into words). The optimize and open > functions >> doesn't do that. >> >> However, it might be a better idea to have a global, or a resource based >> analyzer - with the future possibility of choosing which analyzer to >> use. >> Then analysers don't need to be loaded for each operation since the same >> analyzer would be used. I don't think it's a priority, but if you want >> to >> work on it your more than welcome to. >> >> hope that's ok, >> >> cheers, >> ben >> >> On Mon, 20 Oct 2003 15:42:33 -0400, <am...@ma...> wrote: >> >> > So what is the concensus on the use of Analyzers in CLucene ? >> > >> > I wonder because I see in the DLL/PHP wrapper we are still creating >> > or opening an index using a SimpleAnalyzer and optimizing with the >> same >> > yet using StandardAnalyzer to insert, search or delete a document. >> > >> > I know I dont want to sound as if I'm nagging about this, I just want >> to >> > get >> > it out of the way and move on with other things :) >> > >> > Hopefully I can get this newest PHP extension to work fully and then > post >> > it up for testing and c&c. >> > >> > >> > thanks >> > >> > -pedja >> > >> > >> > >> > >> > >> > ------------------------------------------------------- >> > This SF.net email is sponsored by OSDN developer relations >> > Here's your chance to show off your extensive product knowledge >> > We want to know what you know. Tell us and you have a chance to win >> $100 >> > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 >> > _______________________________________________ >> > CLucene-developers mailing list >> > CLu...@li... >> > https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > >> > >> >> >> >> >> >> ------------------------------------------------------- >> This SF.net email is sponsored by OSDN developer relations >> Here's your chance to show off your extensive product knowledge >> We want to know what you know. Tell us and you have a chance to win $100 >> http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > > > > > ------------------------------------------------------- > This SF.net email is sponsored by OSDN developer relations > Here's your chance to show off your extensive product knowledge > We want to know what you know. Tell us and you have a chance to win $100 > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: Ben v. K. <s40...@st...> - 2003-10-21 05:36:37
|
Hi All, Still haven't got CVS access... so here's some more changes. I followed on from Rob's work and found a few more instances of using delete where delete[] should have been used. I've replaced the stringDuplicate function with a function that allocates memory using new[] then doing a stringCopy. So I think there shouldn't be any code that uses malloc anymore. However i ran a quick comparison check using the tests program, comparing the old code that uses malloc against the new code that uses new[]. The old codes seems to be the slightest big faster. Does anyone know about if if there's any performance issues of malloc vs new[]? I also looked through all the places where "delete " was being used and found a few arrays that needed to be changed to "delete[]". Hopefully that gets them all, but Rob if you could check with valgrind when you get time. Also Rob, if you have time could you look at the tests program. I know there's some terrible test code in there (almost no resources are freed ;~} but when i run the tests in cygwin i get a segfault during the === Test Query Parser === test. I've posted 0.8.3 on sourceforge. Sorry about CVS, does anyone have any ideas how to get through a firewall onto sourceforge (no socks server either)? cheers, ben |
|
From: <rg...@sd...> - 2003-10-21 05:35:12
|
>>>>> "amigo" == amigo <am...@ma...> writes: amigo> amigo> So what is the concensus on the use of Analyzers in CLucene ? amigo> I wonder because I see in the DLL/PHP wrapper we are still amigo> creating or opening an index using a SimpleAnalyzer and amigo> optimizing with the same yet using StandardAnalyzer to insert, amigo> search or delete a document. amigo> amigo> I know I dont want to sound as if I'm nagging about this, I amigo> just want to get it out of the way and move on with other amigo> things :) amigo> amigo> Hopefully I can get this newest PHP extension to work fully and amigo> then post it up for testing and c&c. Is it possible that you could just try it both ways, and compare the results to grep, and see if your missing hits re-appear ? Did you look over the FAQ from the original Java Lucene, and in particular this question: http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q15 --Rob |
|
From: Ben v. K. <s40...@st...> - 2003-10-21 05:30:49
|
I mean after you add your documents and before you try to retrieve the documents you just added. I had the same problem before, but after i close and reopen ...no problem. let me know how it goes. ben On Mon, 20 Oct 2003 23:51:47 -0400, <am...@ma...> wrote: > Hmm I'll have to check that out, you mean I need to close and reopen > during/inbetween requests? > > I havent had a chance to look at the code today, I spent the day worrying > about my wisdom teeth and reading on DOM and CSS programming, > but I'll give it a try tomorrow pending unusual things dont happen > (something > always does, I cannot explain it...) > > > -pedja > > > ----- Original Message ----- > From: "Ben van Klinken" <s40...@st...> > To: <clu...@li...> > Sent: Monday, October 20, 2003 7:03 PM > Subject: Re: [CLucene-dev] file release > > >> Hi, >> >> Small problem with the DLL wrapper. CL_Open returns 1 instead of "return >> resource;" - in 2 different places. >> I couldn't find any problem with the CL_HitCount though. Maybe you need >> to >> close and reopen the resource. Sometimes changes are not flushed until >> clucene is closed. >> >> ben >> >> On Sun, 19 Oct 2003 23:47:20 -0400, <am...@ma...> wrote: >> >> > Thanks Ben, I got it now ! >> > >> > The library compiled fine here and the demo works good (it felt faster >> > indexing the Reuters sample texts, though it could be just me) >> > >> > This new "resource" stuff throw me off a bit, I wasn't sure what was > that >> > all >> > about and at first I thought I needed to create a true zend resource >> in >> > the >> > php >> > extension but it turns out (luckilly) to be just a plain int >> parameter. >> > So, I got the extension compiled after messing a bit with the >> > php_clucene.c >> > and clucene.cpp, but I dont get proper hitcount back...I get 0 back >> when >> > there's more than 0 hits. Well that's what you get when a "weekend >> > warrior" >> > is trying to get some cpp code going :) >> > >> > I'm going to stop now, I'm tired and one of my teeth is giving me >> > tingling >> > sensations (uggh have to go to the dentist tomorrow) so I'll pick up >> on >> > this >> > tomorrow hopefully (depending on the workload and mood I'm in...) >> > >> > >> > -pedja >> > >> > >> > >> > >> > ----- Original Message ----- >> > From: "Ben van Klinken" <s40...@st...> >> > To: <clu...@li...> >> > Sent: Sunday, October 19, 2003 7:16 PM >> > Subject: Re: [CLucene-dev] file release >> > >> > >> >> Sorry, >> >> I've put up the proper one now :) >> >> Note that it has no help files, so it's a bit smaller. >> >> After this release is deemed "stable", i'll start making a stable >> >> release >> >> and working version. >> >> >> >> ben >> >> >> >> On Sun, 19 Oct 2003 12:26:40 -0400, <am...@ma...> wrote: >> >> >> >> > Hi Ben >> >> > >> >> > I tried downloading the 0.8.2 PRE version and I get a 1.1KB PHP >> file >> >> ?! >> >> > Could you please re-upload the 0.8.2PRE source again? >> >> > >> >> > This PRE does not contain any of Rob's changes yet, correct, so we >> >> have >> >> > to use his patch over the CVS version or over this PRE version? >> >> > >> >> > Once I get this PRE, I'll try my PHP extension and if it works well >> >> I'll >> >> > send >> >> > you the code via email or post a link here for everyone, whichever >> > way... >> >> > >> >> > >> >> > thanks >> >> > >> >> > -pedja >> >> > >> >> > >> >> > ----- Original Message ----- >> >> > From: "Ben van Klinken" <s40...@st...> >> >> > To: <clu...@li...> >> >> > Sent: Sunday, October 19, 2003 8:45 AM >> >> > Subject: [CLucene-dev] file release >> >> > >> >> > >> >> >> Hi Everyone, >> >> >> >> >> >> Sorry, I couldn't send my changes to the CVS. I'm behind a >> firewall >> >> and >> >> >> can't get through. Will have to wait till i get onto a better >> >> >> connection. >> >> >> But I have put up a release version 0.8.2 PRE. Hopefully it has >> > included >> >> >> all the changes. (except PHP which i haven't done yet). >> >> >> >> >> >> ben >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> >> This SF.net email sponsored by: Enterprise Linux Forum Conference >> & >> > Expo >> >> >> The Event For Linux Datacenter Solutions & Strategies in The >> >> Enterprise >> >> >> Linux in the Boardroom; in the Front Office; & in the Server Room >> >> >> http://www.enterpriselinuxforum.com >> >> >> _______________________________________________ >> >> >> CLucene-developers mailing list >> >> >> CLu...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------- >> >> > This SF.net email sponsored by: Enterprise Linux Forum Conference & >> >> Expo >> >> > The Event For Linux Datacenter Solutions & Strategies in The >> >> Enterprise >> >> > Linux in the Boardroom; in the Front Office; & in the Server Room >> >> > http://www.enterpriselinuxforum.com >> >> > _______________________________________________ >> >> > CLucene-developers mailing list >> >> > CLu...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> > >> >> > >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> This SF.net email sponsored by: Enterprise Linux Forum Conference & > Expo >> >> The Event For Linux Datacenter Solutions & Strategies in The >> Enterprise >> >> Linux in the Boardroom; in the Front Office; & in the Server Room >> >> http://www.enterpriselinuxforum.com >> >> _______________________________________________ >> >> CLucene-developers mailing list >> >> CLu...@li... >> >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> >> > >> > >> > >> > >> > ------------------------------------------------------- >> > This SF.net email sponsored by: Enterprise Linux Forum Conference & >> Expo >> > The Event For Linux Datacenter Solutions & Strategies in The >> Enterprise >> > Linux in the Boardroom; in the Front Office; & in the Server Room >> > http://www.enterpriselinuxforum.com >> > _______________________________________________ >> > CLucene-developers mailing list >> > CLu...@li... >> > https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > >> > >> >> >> >> >> >> ------------------------------------------------------- >> This SF.net email is sponsored by OSDN developer relations >> Here's your chance to show off your extensive product knowledge >> We want to know what you know. Tell us and you have a chance to win $100 >> http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > > > > > ------------------------------------------------------- > This SF.net email is sponsored by OSDN developer relations > Here's your chance to show off your extensive product knowledge > We want to know what you know. Tell us and you have a chance to win $100 > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: <am...@ma...> - 2003-10-21 04:01:44
|
Hmm I'll have to check that out, you mean I need to close and reopen during/inbetween requests? I havent had a chance to look at the code today, I spent the day worrying about my wisdom teeth and reading on DOM and CSS programming, but I'll give it a try tomorrow pending unusual things dont happen (something always does, I cannot explain it...) -pedja ----- Original Message ----- From: "Ben van Klinken" <s40...@st...> To: <clu...@li...> Sent: Monday, October 20, 2003 7:03 PM Subject: Re: [CLucene-dev] file release > Hi, > > Small problem with the DLL wrapper. CL_Open returns 1 instead of "return > resource;" - in 2 different places. > I couldn't find any problem with the CL_HitCount though. Maybe you need to > close and reopen the resource. Sometimes changes are not flushed until > clucene is closed. > > ben > > On Sun, 19 Oct 2003 23:47:20 -0400, <am...@ma...> wrote: > > > Thanks Ben, I got it now ! > > > > The library compiled fine here and the demo works good (it felt faster > > indexing the Reuters sample texts, though it could be just me) > > > > This new "resource" stuff throw me off a bit, I wasn't sure what was that > > all > > about and at first I thought I needed to create a true zend resource in > > the > > php > > extension but it turns out (luckilly) to be just a plain int parameter. > > So, I got the extension compiled after messing a bit with the > > php_clucene.c > > and clucene.cpp, but I dont get proper hitcount back...I get 0 back when > > there's more than 0 hits. Well that's what you get when a "weekend > > warrior" > > is trying to get some cpp code going :) > > > > I'm going to stop now, I'm tired and one of my teeth is giving me > > tingling > > sensations (uggh have to go to the dentist tomorrow) so I'll pick up on > > this > > tomorrow hopefully (depending on the workload and mood I'm in...) > > > > > > -pedja > > > > > > > > > > ----- Original Message ----- > > From: "Ben van Klinken" <s40...@st...> > > To: <clu...@li...> > > Sent: Sunday, October 19, 2003 7:16 PM > > Subject: Re: [CLucene-dev] file release > > > > > >> Sorry, > >> I've put up the proper one now :) > >> Note that it has no help files, so it's a bit smaller. > >> After this release is deemed "stable", i'll start making a stable > >> release > >> and working version. > >> > >> ben > >> > >> On Sun, 19 Oct 2003 12:26:40 -0400, <am...@ma...> wrote: > >> > >> > Hi Ben > >> > > >> > I tried downloading the 0.8.2 PRE version and I get a 1.1KB PHP file > >> ?! > >> > Could you please re-upload the 0.8.2PRE source again? > >> > > >> > This PRE does not contain any of Rob's changes yet, correct, so we > >> have > >> > to use his patch over the CVS version or over this PRE version? > >> > > >> > Once I get this PRE, I'll try my PHP extension and if it works well > >> I'll > >> > send > >> > you the code via email or post a link here for everyone, whichever > > way... > >> > > >> > > >> > thanks > >> > > >> > -pedja > >> > > >> > > >> > ----- Original Message ----- > >> > From: "Ben van Klinken" <s40...@st...> > >> > To: <clu...@li...> > >> > Sent: Sunday, October 19, 2003 8:45 AM > >> > Subject: [CLucene-dev] file release > >> > > >> > > >> >> Hi Everyone, > >> >> > >> >> Sorry, I couldn't send my changes to the CVS. I'm behind a firewall > >> and > >> >> can't get through. Will have to wait till i get onto a better > >> >> connection. > >> >> But I have put up a release version 0.8.2 PRE. Hopefully it has > > included > >> >> all the changes. (except PHP which i haven't done yet). > >> >> > >> >> ben > >> >> > >> >> > >> >> > >> >> ------------------------------------------------------- > >> >> This SF.net email sponsored by: Enterprise Linux Forum Conference & > > Expo > >> >> The Event For Linux Datacenter Solutions & Strategies in The > >> Enterprise > >> >> Linux in the Boardroom; in the Front Office; & in the Server Room > >> >> http://www.enterpriselinuxforum.com > >> >> _______________________________________________ > >> >> CLucene-developers mailing list > >> >> CLu...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > >> >> > >> >> > >> > > >> > > >> > > >> > > >> > ------------------------------------------------------- > >> > This SF.net email sponsored by: Enterprise Linux Forum Conference & > >> Expo > >> > The Event For Linux Datacenter Solutions & Strategies in The > >> Enterprise > >> > Linux in the Boardroom; in the Front Office; & in the Server Room > >> > http://www.enterpriselinuxforum.com > >> > _______________________________________________ > >> > CLucene-developers mailing list > >> > CLu...@li... > >> > https://lists.sourceforge.net/lists/listinfo/clucene-developers > >> > > >> > > >> > >> > >> > >> > >> > >> ------------------------------------------------------- > >> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > >> The Event For Linux Datacenter Solutions & Strategies in The Enterprise > >> Linux in the Boardroom; in the Front Office; & in the Server Room > >> http://www.enterpriselinuxforum.com > >> _______________________________________________ > >> CLucene-developers mailing list > >> CLu...@li... > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > >> > > > > > > > > > > ------------------------------------------------------- > > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > > Linux in the Boardroom; in the Front Office; & in the Server Room > > http://www.enterpriselinuxforum.com > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by OSDN developer relations > Here's your chance to show off your extensive product knowledge > We want to know what you know. Tell us and you have a chance to win $100 > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > |
|
From: <am...@ma...> - 2003-10-21 03:51:26
|
Hi Ben Well, I was just thinking ahead and while browsing the Lucene list someone mentioned that they extended the ability of Lucene to actually pick Analyzers on a per field basis (there was some java code supplied as a sample), which was intriguing in itself because it would open up alot of new possiblities for indexing various types of data. But for now I'll just settle with what we got here, I'm not complaining, StandardAnalyzer is good enough for me :0 thanks -pedja ----- Original Message ----- From: "Ben van Klinken" <s40...@st...> To: <clu...@li...> Sent: Monday, October 20, 2003 7:33 PM Subject: Re: [CLucene-dev] file release > Pedja, > > I don't think it matters. The standardanalyzer has slightly more overhead > than the simpleanalyzer so it's better to use it where possible. THe > analyzers are only used when dealing with queries or when inserting > documents(to split the text up into words). The optimize and open functions > doesn't do that. > > However, it might be a better idea to have a global, or a resource based > analyzer - with the future possibility of choosing which analyzer to use. > Then analysers don't need to be loaded for each operation since the same > analyzer would be used. I don't think it's a priority, but if you want to > work on it your more than welcome to. > > hope that's ok, > > cheers, > ben > > On Mon, 20 Oct 2003 15:42:33 -0400, <am...@ma...> wrote: > > > So what is the concensus on the use of Analyzers in CLucene ? > > > > I wonder because I see in the DLL/PHP wrapper we are still creating > > or opening an index using a SimpleAnalyzer and optimizing with the same > > yet using StandardAnalyzer to insert, search or delete a document. > > > > I know I dont want to sound as if I'm nagging about this, I just want to > > get > > it out of the way and move on with other things :) > > > > Hopefully I can get this newest PHP extension to work fully and then post > > it up for testing and c&c. > > > > > > thanks > > > > -pedja > > > > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by OSDN developer relations > > Here's your chance to show off your extensive product knowledge > > We want to know what you know. Tell us and you have a chance to win $100 > > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by OSDN developer relations > Here's your chance to show off your extensive product knowledge > We want to know what you know. Tell us and you have a chance to win $100 > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > |
|
From: Ben v. K. <s40...@st...> - 2003-10-21 03:40:10
|
Hi, Small problem with the DLL wrapper. CL_Open returns 1 instead of "return resource;" - in 2 different places. I couldn't find any problem with the CL_HitCount though. Maybe you need to close and reopen the resource. Sometimes changes are not flushed until clucene is closed. ben On Sun, 19 Oct 2003 23:47:20 -0400, <am...@ma...> wrote: > Thanks Ben, I got it now ! > > The library compiled fine here and the demo works good (it felt faster > indexing the Reuters sample texts, though it could be just me) > > This new "resource" stuff throw me off a bit, I wasn't sure what was that > all > about and at first I thought I needed to create a true zend resource in > the > php > extension but it turns out (luckilly) to be just a plain int parameter. > So, I got the extension compiled after messing a bit with the > php_clucene.c > and clucene.cpp, but I dont get proper hitcount back...I get 0 back when > there's more than 0 hits. Well that's what you get when a "weekend > warrior" > is trying to get some cpp code going :) > > I'm going to stop now, I'm tired and one of my teeth is giving me > tingling > sensations (uggh have to go to the dentist tomorrow) so I'll pick up on > this > tomorrow hopefully (depending on the workload and mood I'm in...) > > > -pedja > > > > > ----- Original Message ----- > From: "Ben van Klinken" <s40...@st...> > To: <clu...@li...> > Sent: Sunday, October 19, 2003 7:16 PM > Subject: Re: [CLucene-dev] file release > > >> Sorry, >> I've put up the proper one now :) >> Note that it has no help files, so it's a bit smaller. >> After this release is deemed "stable", i'll start making a stable >> release >> and working version. >> >> ben >> >> On Sun, 19 Oct 2003 12:26:40 -0400, <am...@ma...> wrote: >> >> > Hi Ben >> > >> > I tried downloading the 0.8.2 PRE version and I get a 1.1KB PHP file >> ?! >> > Could you please re-upload the 0.8.2PRE source again? >> > >> > This PRE does not contain any of Rob's changes yet, correct, so we >> have >> > to use his patch over the CVS version or over this PRE version? >> > >> > Once I get this PRE, I'll try my PHP extension and if it works well >> I'll >> > send >> > you the code via email or post a link here for everyone, whichever > way... >> > >> > >> > thanks >> > >> > -pedja >> > >> > >> > ----- Original Message ----- >> > From: "Ben van Klinken" <s40...@st...> >> > To: <clu...@li...> >> > Sent: Sunday, October 19, 2003 8:45 AM >> > Subject: [CLucene-dev] file release >> > >> > >> >> Hi Everyone, >> >> >> >> Sorry, I couldn't send my changes to the CVS. I'm behind a firewall >> and >> >> can't get through. Will have to wait till i get onto a better >> >> connection. >> >> But I have put up a release version 0.8.2 PRE. Hopefully it has > included >> >> all the changes. (except PHP which i haven't done yet). >> >> >> >> ben >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> This SF.net email sponsored by: Enterprise Linux Forum Conference & > Expo >> >> The Event For Linux Datacenter Solutions & Strategies in The >> Enterprise >> >> Linux in the Boardroom; in the Front Office; & in the Server Room >> >> http://www.enterpriselinuxforum.com >> >> _______________________________________________ >> >> CLucene-developers mailing list >> >> CLu...@li... >> >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> >> >> >> > >> > >> > >> > >> > ------------------------------------------------------- >> > This SF.net email sponsored by: Enterprise Linux Forum Conference & >> Expo >> > The Event For Linux Datacenter Solutions & Strategies in The >> Enterprise >> > Linux in the Boardroom; in the Front Office; & in the Server Room >> > http://www.enterpriselinuxforum.com >> > _______________________________________________ >> > CLucene-developers mailing list >> > CLu...@li... >> > https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > >> > >> >> >> >> >> >> ------------------------------------------------------- >> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo >> The Event For Linux Datacenter Solutions & Strategies in The Enterprise >> Linux in the Boardroom; in the Front Office; & in the Server Room >> http://www.enterpriselinuxforum.com >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> > > > > > ------------------------------------------------------- > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > Linux in the Boardroom; in the Front Office; & in the Server Room > http://www.enterpriselinuxforum.com > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: Ben v. K. <s40...@st...> - 2003-10-21 02:53:45
|
Pedja, I don't think it matters. The standardanalyzer has slightly more overhead than the simpleanalyzer so it's better to use it where possible. THe analyzers are only used when dealing with queries or when inserting documents(to split the text up into words). The optimize and open functions doesn't do that. However, it might be a better idea to have a global, or a resource based analyzer - with the future possibility of choosing which analyzer to use. Then analysers don't need to be loaded for each operation since the same analyzer would be used. I don't think it's a priority, but if you want to work on it your more than welcome to. hope that's ok, cheers, ben On Mon, 20 Oct 2003 15:42:33 -0400, <am...@ma...> wrote: > So what is the concensus on the use of Analyzers in CLucene ? > > I wonder because I see in the DLL/PHP wrapper we are still creating > or opening an index using a SimpleAnalyzer and optimizing with the same > yet using StandardAnalyzer to insert, search or delete a document. > > I know I dont want to sound as if I'm nagging about this, I just want to > get > it out of the way and move on with other things :) > > Hopefully I can get this newest PHP extension to work fully and then post > it up for testing and c&c. > > > thanks > > -pedja > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by OSDN developer relations > Here's your chance to show off your extensive product knowledge > We want to know what you know. Tell us and you have a chance to win $100 > http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: <am...@ma...> - 2003-10-20 19:44:36
|
So what is the concensus on the use of Analyzers in CLucene ? I wonder because I see in the DLL/PHP wrapper we are still creating or opening an index using a SimpleAnalyzer and optimizing with the same yet using StandardAnalyzer to insert, search or delete a document. I know I dont want to sound as if I'm nagging about this, I just want to get it out of the way and move on with other things :) Hopefully I can get this newest PHP extension to work fully and then post it up for testing and c&c. thanks -pedja |
|
From: <rg...@sd...> - 2003-10-20 19:22:48
|
>>>>> "amigo" == amigo <am...@ma...> writes: amigo> amigo> Thanks Ben, I got it now ! amigo> The library compiled fine here and the demo works good (it felt amigo> faster indexing the Reuters sample texts, though it could be amigo> just me) It felt faster to me too. Faster than with just the changes I did. I have not looked to see what Ben might have beyond or instead of some of my patches, but it's better. I just ran it all through valgrind once again. All of the errors I had previously noted are now gone. Because of the greater speed of the indexing, I went ahead and indexed some bigger stuff, and did some searches. There are few of the same type of memory error (mismatched delete[]/delete) that occur in code that runs when you search; I did not uncover these before because it was slower before so I didn't try as many things. You can make more of these errors occur, and even crash the demo program, by entering very long search strings. Pounding in half a page of random keyboard hits, with no spaces, seems to do it. This is something I can find and patch also. I can make up another patch for those over the next few days, but it seems CLucene is definitely improving and getting a shake out. My efforts on CLucene over the next two weeks or so will be something like this: 1) finish tracking down and submitting patches for any remaining memory errors, and see those ingested by Ben and everyone else 2) turn attention to memory leaks, not reported in my current runs, and do the same to them 3) Track down any instances of inaccuracy in CLucene, that is references people have made to not getting the right number of hits For the last I may look into something automated, perhaps a modification of the demo program that could index a body, and then churn through a dictionary search on each term with both grep and CLucene and making sure the results match. I left it indexing overnight and it seems to have done a more than one gigabyte sized work ok. --Rob |
|
From: <am...@ma...> - 2003-10-20 04:58:58
|
Thanks Ben, I got it now ! The library compiled fine here and the demo works good (it felt faster indexing the Reuters sample texts, though it could be just me) This new "resource" stuff throw me off a bit, I wasn't sure what was that all about and at first I thought I needed to create a true zend resource in the php extension but it turns out (luckilly) to be just a plain int parameter. So, I got the extension compiled after messing a bit with the php_clucene.c and clucene.cpp, but I dont get proper hitcount back...I get 0 back when there's more than 0 hits. Well that's what you get when a "weekend warrior" is trying to get some cpp code going :) I'm going to stop now, I'm tired and one of my teeth is giving me tingling sensations (uggh have to go to the dentist tomorrow) so I'll pick up on this tomorrow hopefully (depending on the workload and mood I'm in...) -pedja ----- Original Message ----- From: "Ben van Klinken" <s40...@st...> To: <clu...@li...> Sent: Sunday, October 19, 2003 7:16 PM Subject: Re: [CLucene-dev] file release > Sorry, > I've put up the proper one now :) > Note that it has no help files, so it's a bit smaller. > After this release is deemed "stable", i'll start making a stable release > and working version. > > ben > > On Sun, 19 Oct 2003 12:26:40 -0400, <am...@ma...> wrote: > > > Hi Ben > > > > I tried downloading the 0.8.2 PRE version and I get a 1.1KB PHP file ?! > > Could you please re-upload the 0.8.2PRE source again? > > > > This PRE does not contain any of Rob's changes yet, correct, so we have > > to use his patch over the CVS version or over this PRE version? > > > > Once I get this PRE, I'll try my PHP extension and if it works well I'll > > send > > you the code via email or post a link here for everyone, whichever way... > > > > > > thanks > > > > -pedja > > > > > > ----- Original Message ----- > > From: "Ben van Klinken" <s40...@st...> > > To: <clu...@li...> > > Sent: Sunday, October 19, 2003 8:45 AM > > Subject: [CLucene-dev] file release > > > > > >> Hi Everyone, > >> > >> Sorry, I couldn't send my changes to the CVS. I'm behind a firewall and > >> can't get through. Will have to wait till i get onto a better > >> connection. > >> But I have put up a release version 0.8.2 PRE. Hopefully it has included > >> all the changes. (except PHP which i haven't done yet). > >> > >> ben > >> > >> > >> > >> ------------------------------------------------------- > >> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > >> The Event For Linux Datacenter Solutions & Strategies in The Enterprise > >> Linux in the Boardroom; in the Front Office; & in the Server Room > >> http://www.enterpriselinuxforum.com > >> _______________________________________________ > >> CLucene-developers mailing list > >> CLu...@li... > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > >> > >> > > > > > > > > > > ------------------------------------------------------- > > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > > Linux in the Boardroom; in the Front Office; & in the Server Room > > http://www.enterpriselinuxforum.com > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > ------------------------------------------------------- > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > Linux in the Boardroom; in the Front Office; & in the Server Room > http://www.enterpriselinuxforum.com > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > |
|
From: Ben v. K. <s40...@st...> - 2003-10-20 00:04:07
|
Pedja, No, it hasn't been implemented. But should be... Also the multisearcher, for searching multiple indexes hasn't been implemented. I'd like to go through and make sure everything is implemented from java lucene in clucene. There's no reason anything shouldn't be implemented... just time (or being lazy :) cheers, ben On Sun, 19 Oct 2003 13:59:30 -0400, <am...@ma...> wrote: > Is MFQP implemented in CLucene yet, or how do we query multiple fields > (beside making a very long javascript query). > I need to search inside summary and fulltext fields for example, and the > only > way I can think of doing now is with two queries which is not really > elegant > solution nor easy to join and display relevant results. > > How hard would it be to get the MFQP from the Java version into CLucene, > or is any work being done on this at all? > > > -pedja > > > > > > > ------------------------------------------------------- > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > Linux in the Boardroom; in the Front Office; & in the Server Room > http://www.enterpriselinuxforum.com > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: Ben v. K. <s40...@st...> - 2003-10-19 23:53:02
|
Sorry, I've put up the proper one now :) Note that it has no help files, so it's a bit smaller. After this release is deemed "stable", i'll start making a stable release and working version. ben On Sun, 19 Oct 2003 12:26:40 -0400, <am...@ma...> wrote: > Hi Ben > > I tried downloading the 0.8.2 PRE version and I get a 1.1KB PHP file ?! > Could you please re-upload the 0.8.2PRE source again? > > This PRE does not contain any of Rob's changes yet, correct, so we have > to use his patch over the CVS version or over this PRE version? > > Once I get this PRE, I'll try my PHP extension and if it works well I'll > send > you the code via email or post a link here for everyone, whichever way... > > > thanks > > -pedja > > > ----- Original Message ----- > From: "Ben van Klinken" <s40...@st...> > To: <clu...@li...> > Sent: Sunday, October 19, 2003 8:45 AM > Subject: [CLucene-dev] file release > > >> Hi Everyone, >> >> Sorry, I couldn't send my changes to the CVS. I'm behind a firewall and >> can't get through. Will have to wait till i get onto a better >> connection. >> But I have put up a release version 0.8.2 PRE. Hopefully it has included >> all the changes. (except PHP which i haven't done yet). >> >> ben >> >> >> >> ------------------------------------------------------- >> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo >> The Event For Linux Datacenter Solutions & Strategies in The Enterprise >> Linux in the Boardroom; in the Front Office; & in the Server Room >> http://www.enterpriselinuxforum.com >> _______________________________________________ >> CLucene-developers mailing list >> CLu...@li... >> https://lists.sourceforge.net/lists/listinfo/clucene-developers >> >> > > > > > ------------------------------------------------------- > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > Linux in the Boardroom; in the Front Office; & in the Server Room > http://www.enterpriselinuxforum.com > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: <am...@ma...> - 2003-10-19 20:34:50
|
Is MFQP implemented in CLucene yet, or how do we query multiple fields (beside making a very long javascript query). I need to search inside summary and fulltext fields for example, and the only way I can think of doing now is with two queries which is not really elegant solution nor easy to join and display relevant results. How hard would it be to get the MFQP from the Java version into CLucene, or is any work being done on this at all? -pedja |
|
From: <am...@ma...> - 2003-10-19 16:40:16
|
Hi Ben I tried downloading the 0.8.2 PRE version and I get a 1.1KB PHP file ?! Could you please re-upload the 0.8.2PRE source again? This PRE does not contain any of Rob's changes yet, correct, so we have to use his patch over the CVS version or over this PRE version? Once I get this PRE, I'll try my PHP extension and if it works well I'll send you the code via email or post a link here for everyone, whichever way... thanks -pedja ----- Original Message ----- From: "Ben van Klinken" <s40...@st...> To: <clu...@li...> Sent: Sunday, October 19, 2003 8:45 AM Subject: [CLucene-dev] file release > Hi Everyone, > > Sorry, I couldn't send my changes to the CVS. I'm behind a firewall and > can't get through. Will have to wait till i get onto a better connection. > But I have put up a release version 0.8.2 PRE. Hopefully it has included > all the changes. (except PHP which i haven't done yet). > > ben > > > > ------------------------------------------------------- > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > Linux in the Boardroom; in the Front Office; & in the Server Room > http://www.enterpriselinuxforum.com > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |
|
From: Ben v. K. <s40...@st...> - 2003-10-19 14:07:17
|
Hi Everyone, Sorry, I couldn't send my changes to the CVS. I'm behind a firewall and can't get through. Will have to wait till i get onto a better connection. But I have put up a release version 0.8.2 PRE. Hopefully it has included all the changes. (except PHP which i haven't done yet). ben |
|
From: Ben v. K. <s40...@st...> - 2003-10-19 13:52:27
|
Hi all, Thanks for all that Rob, great work! I've added some code that should fix the voidlist.h and voidmap.h deallocation problems. I made them so you could choose how the objects should be deleted,freed or array deleted. There's a few other things i changed while looking through the code. Nothing major. The dll wrapper has been slightly updated to include a reference number, so you can work on more than 1 directory at once. Haven't tried it with PHP yet though - pedja maybe you could let me know how it goes? Also changed the analyzer type in the wrapper. Where it uses the simpleanalyzer it's not actually being used - the queryparser was using the wrong analyzer though. There was also a stupid bug in the prepend function of stringbuffer - oops :) Which now fixes things like emails in queries. The analyser runs slightly faster now, 20 ish % maybe. The token was stringDuping the token type, which actually comes straight from the tokenImage array. So a pointer is sufficient. The analyser speed could be improved even more if the token class could take a pointer to the text buffer instead of having to copy the string... but then there may be other problems... hmm something to think about anyway. I won't make a release just yet. I'll just put it on CVS, then in a few days i'll do a proper release. cheers everyone. and thanks for the great work! ben |
|
From: <rg...@sd...> - 2003-10-18 02:59:33
|
>>>>> "amigo" == amigo <am...@ma...> writes: amigo> amigo> Hi all, amigo> While playing with the Java Lucene, getting it connected to PHP amigo> I was reading some its docs and it says there that documents amigo> should be indexed and querried with the same Analyzer. amigo> amigo> Now I went back to the dll/php wrapper code supplied with amigo> CLucene and going through the dll wrapper there's alot of amigo> mixing and matching, so I wonder if that's an error or not in amigo> it, because it uses different analyzers for different things? amigo> amigo> For example: API method CL_Open goes to open/create the CLucene amigo> index with a SimpleAnalyzer, and so does the CL_Optimize and amigo> CL_Search, yet on the other hand CL_Insert_Document and amigo> CL_Delete use StandardAnalyzer. amigo> amigo> Can't say I to worried about optimize or delete right now, amigo> though they are also important functions, but shouldn't insert amigo> and search use the same analyzer? amigo> amigo> Would that be the reason that some of the queries do not come amigo> out the way they should, ie return all/proper hits? I don't know. I have to check to see if the demo program uses different analyzers. amigo> P.S. Rob excellent job on the patch, I still haven't crashed amigo> your code :-) Thanks -- I'm sure that some of the fixes will be re-done differently and better in the end. --Rob |
|
From: <am...@ma...> - 2003-10-17 00:02:15
|
Hi all, While playing with the Java Lucene, getting it connected to PHP I was reading some its docs and it says there that documents should be indexed and querried with the same Analyzer. Now I went back to the dll/php wrapper code supplied with CLucene and going through the dll wrapper there's alot of mixing and matching, so I wonder if that's an error or not in it, because it uses different analyzers for different things? For example: API method CL_Open goes to open/create the CLucene index with a SimpleAnalyzer, and so does the CL_Optimize and CL_Search, yet on the other hand CL_Insert_Document and CL_Delete use StandardAnalyzer. Can't say I to worried about optimize or delete right now, though they are also important functions, but shouldn't insert and search use the same analyzer? Would that be the reason that some of the queries do not come out the way they should, ie return all/proper hits? I know that choosing the analyzer hasn't been implemented into the wrapper, and I'm not even sure what is the current status of analyzers in CLucene, so I'm seeking some guidence here, pretty please and thank you :) thanks -pedja P.S. Rob excellent job on the patch, I still haven't crashed your code :-) |
|
From: <rg...@sd...> - 2003-10-16 19:57:31
|
I made one giant patch file after all: http://rgr.freeshell.org/clucene/ Here is how I would suggest someone (like Ben when he gets a chance, or any other other people trying this out) merge this stuff: 1) get a fresh working copy out of the CVS -- there is no module name, so do "co ." for the checkout command 2) download my patch to somewhere, right in the working directory is fine, and run this command: patch -p0 < rgr_memfix_14Oct2003.patch As you will then see by doing "cvs up" a lot of files are marked as modified, the patch has been applied. 3) Go through all the files to find what of my changes you want, don't want, or want to do over differently. In each directory, "cvs up" will list the files that are different with an M next to them. To instantly see the changes, do "cvs diff filename" and it will list them. To instantly erase my patch changes and go back to the CVS, just do "rm filename" and then "cvs up" and that will get you back to the version that is in CVS. 4) Compile and test and repeat, and if you are Ben you can commit when you are happy, or send any further or different changes you did to the list. If you use emacs and a tool such as ediff, you can download my whole tar file and do an M-x ediff-directories on the two source trees. Some people prefer to work that way. I like ediff and can give a summary of it you guys want. Since it works inside of emacs you can use it on windows too. I'll probably be out of the loop for a few days while I get some other stuff done. However I will make time to help out any of the people having problems using CLucene if they send email to this list -- I think it's important to get more people using it. --Rob |
|
From: <rg...@sd...> - 2003-10-16 18:59:23
|
This is a list of all files the differ from the cvs head and what is posted at http://rgr.freeshell.org/clucene/ in the clucene-8.1-withmemfixes-15Oct2003.tar.gz file: examples/demo/IndexFiles.cpp (on line 44 "delete buf" was changed to "delete[] buf") src/CLucene/analysis/AnalysisHeader.h (termText and type must deallocated with free() because they are allocated with stringDuplicate, which is the same as strdup() which uses malloc, on construction) src/CLucene/analysis/Analyzers.h The differences here may be unnecessary -- they consist of removing the const from the types of variables going into the VoidMap template, which is necessary if VoidMap is to call free() on those pointers, unless they are explicitly cast to void*. Probably a general avoidance of putting malloc'd memory into the VoidMap would be better. It's fine not to apply these changes if you can get things to compile without them, by using the casts or not sending malloc'd memory in so you can avoid the free()'s. src/CLucene/analysis/standard/StandardAnalyzer.cpp src/CLucene/analysis/standard/StandardAnalyzer.h Same as above. src/CLucene/document/Document.cpp src/CLucene/document/Document.h A simple type change in one function, just to make de-allocating something easier in the DocumentWriter.cpp and FieldInfos.cpp files below. src/CLucene/document/Field.cpp src/CLucene/document/Field.h Made a variable non-const, so that it could be free()'d, because it was allocated with stringDuplicate. Could have used a cast in the free call instead of changing the type. src/CLucene/index/DocumentWriter.cpp The variables "buf" and "rv" have to be delete[]'d not delete'd. Also, the elements of the list fields must be delete[]'d, see the change I did to Document.[h,cpp] above. src/CLucene/index/FieldInfo.h changed "delete name" to "free(name)" because it is allocated using stringDuplicate(). src/CLucene/index/FieldInfos.cpp See the Document.[h,cpp] changes -- this allowed me to avoid doing a "delete &var" and do a normal "delete var" instead. src/CLucene/index/FieldsReader.cpp buf must be delete[]'d, fvalue free()'d src/CLucene/index/FieldsWriter.cpp buf and rv must be delete[]'d, "delete &fields" changed to "delete fields" as per the Document.[h,cpp] changes. src/CLucene/index/IndexWriter.cpp mergedName must be delete[]'d. In newSegmentName() buf has to be allocated with new[], it can't just be on the stack, because it is allocated in other areas which means it will need to be delete[]'d. Instead of DocumentWriter& dw = *new DocumentWriter(); and then later delete &dw; I did DocumentWriter* dw = new DocumentWriter() and then later delete dw; and also changed a few dw.method() to dw->method accordingly. src/CLucene/index/SegmentInfo.h name was allocated with stringDuplicate, but is also declared const, so I changed "delete name" to "free( (void*) name);" src/CLucene/index/SegmentMerger.cpp buf and match must be delete[]'d, not delete'd. src/CLucene/index/SegmentReader.cpp segment must be free'd not deleted, and tmp must be delete[]'d not delete'd. The variable buf is sometimes static, sometimes new[]'d, and sometime stringDuplicated -- I got rid of the static one and handled the others appropriately, but it probably should be allocated via the same method everywhere. src/CLucene/index/SegmentTermDocs.cpp whitespace only src/CLucene/index/SegmentTermEnum.cpp buffer must be delete[]'d src/CLucene/index/Term.cpp field and text must be free'd not deleted src/CLucene/index/TermInfosReader.cpp src/CLucene/index/TermInfosWriter.cpp The variables n, indexTerms, indexPointers, indexInfos, and buf must all be delete[]'d. src/CLucene/search/FuzzyQuery.cpp whitespace only -- added newline at end of file (gcc complains) src/CLucene/store/FSDirectory.cpp src/CLucene/store/FSDirectory.h Changed type of fname to non-const so it could be free'd instead of deleted; could just as well cast it to void* before freeing. src/CLucene/store/InputStream.cpp src/CLucene/store/OutputStream.cpp Variables buffer and chars must be delete'd. src/CLucene/util/PriorityQueue.h variable heap must be delete[]'d src/CLucene/util/VoidMap.h Changed a delete to a free of itr->first. However, I think the correct fix here may be to make sure you always put ONLY new'd memory in this structure, and then to delete it. I think that VoidList.h has the same problem as VoidMap.h, in that sometimes the wrong kind of memory is put in there. To fix that we need to look carefully at how the code that uses those templates works. I plan to follow this up with a real patch file, and how to work through these changes line by line. I'll be sending another email in a few minutes. --Rob |
|
From: <rg...@sd...> - 2003-10-16 17:34:28
|
>>>>> "Albert" == Albert Vila Puig <av...@im...> writes:
Albert>
Albert> Hi all,
Albert> I downloaded the patch, compiled and installed. I add another patch, the
Albert> StringBuffer.cpp class has a few problems.
Albert>
Albert> 1- The prepend method
Albert> //if ( len+sl+1 > bufferLength )
Albert> // growBuffer ( );
Albert> while(len+sl+1>bufferLength)
Albert> growBuffer();
It is correct that this should be a while instead of an if.
Albert> 2- The clear method
Albert> delete buffer;
Albert> buffer = new char_t[LUCENE_STREAM_BUFFER_SIZE];
Albert> bufferLength = LUCENE_STREAM_BUFFER_SIZE;
Albert> len = 0;
I have not had to modify this particular file in my patches, I am
working from exactly what is in the CVS as far as this file goes.
However, I think I will have to carefully work through it, and I would
like to point out the problems so you guys can fix it up if you get to
it before me:
1) Buffer is allocated with the syntax "buffer = new
char_t[LUCENE_STREAM_BUFFER_SIZE]". This means that buffer MUST be
deallocated with "delete[] buffer" not "delete buffer".
2) In this constructor
StringBuffer::StringBuffer(const char_t* value):
buffer (stringDuplicate(value)),
bufferLength(stringLength(value)),
len(stringLength(value))
{
}
buffer is allocated by doing a stringDuplicate() of the value, which
is the same as standard C library strdup() (at least on linux), which
uses malloc() to allocate, not new / new[]. So the buffer in that
case MUST be de-allocated with free() not delete or delete[]. But of
course once you get to the functions were you are deallocating, you
don't have anyway of telling if you got the buffer pointer from one
constructor or another, so you can't figure out which way to
deallocate.
For a class like this it is a good idea to make sure the variable
buffer always is allocated and deallocated by the same machanism,
instead of keeping flags to indicate it, or something like that.
Another thing that can get you in trouble is loosing track of whether
the memory has been allocated by the function that called you, and
should be de-allocated by that function, or you allocated it and
should de-allocate it.
Sometimes people working on a largish thing like CLucene make a global
decision that "all strings will be C-style malloc/free char*'s" or
"all strings will be allocated with new[]" or use the C++ basic string
everywhere. As we get some of this sorted out we may find ourselves
settling into some similar policy.
Albert> Now, when I execute my demo application, it just crash by
Albert> segmentation fault after the _cout << "start indexing" <<
Albert> endln; line.
Albert> If I add the _cout << "test" line after the error line, the
Albert> program terminates by segmentation fault as well.
Albert>
Albert>
Albert> Any suggestions?
Albert>
Albert> Thanks.
Are you set up so that you can run this program in gdb or valgrind ?
If you can compile CLucene with debugging flags, then you should be
able to use gdb to see what specific pointer it is that is pointing
outside the program's memory (that's what a segmentation fault usually
is). If you run it in valgrind, it should give you some errors to
work on, and the last one listed is probably causing the immediate
problem.
I can assist you in using gdb and valgrind if you are working in a
unix environment.
--Rob
|
|
From: <am...@ma...> - 2003-10-16 13:41:30
|
Albert, I have no idea why would your demo application crash, perhaps there's something in the data you are trying to index that it doesnt like? I've compiled Rob's patched version last night and it worked "out of the box". It went through that sample Reuters textbase without any core dumps, while the original 0.8.1 core dumps few files into it, so this is a great improvement! Then I got my PHP extension compiled with this modified source and after a few small changes that works great too. Now I can index, optimize, delete, search documents with the sample php included without any problems. My own proof of concept code also works (you upload any M$ Word file, it gets converted to plain text, indexed and the file stored in some directory. Afterwards you can search through the contents with title and author fields and find whatever...) Just one word: wooohooo :) -pedja > Hi all, > > I downloaded the patch, compiled and installed. I add another patch, > the StringBuffer.cpp class has a few problems. > 1- The prepend method > //if ( len+sl+1 > bufferLength ) > // growBuffer ( ); > while(len+sl+1>bufferLength) > growBuffer(); > 2- The clear method > delete buffer; > buffer = new char_t[LUCENE_STREAM_BUFFER_SIZE]; > bufferLength = LUCENE_STREAM_BUFFER_SIZE; > len = 0; > > Now, when I execute my demo application, it just crash by > segmentation fault after the _cout << "start indexing" << endln; line. > |
|
From: Albert V. P. <av...@im...> - 2003-10-16 09:19:25
|
Hi all,
I downloaded the patch, compiled and installed. I add another patch,
the StringBuffer.cpp class has a few problems.
1- The prepend method
//if ( len+sl+1 > bufferLength )
// growBuffer ( );
while(len+sl+1>bufferLength)
growBuffer();
2- The clear method
delete buffer;
buffer = new char_t[LUCENE_STREAM_BUFFER_SIZE];
bufferLength = LUCENE_STREAM_BUFFER_SIZE;
len = 0;
Now, when I execute my demo application, it just crash by
segmentation fault after the _cout << "start indexing" << endln; line.
My Indexing file looks like:
#include "stdafx.h"
#ifndef _lucene_demo_IndexFiles_
#define _lucene_demo_IndexFiles_
#include "CLucene/CLucene.h"
#include "CLucene/util/Reader.h"
#include <iostream>
using namespace std;
namespace lucene{ namespace demo {
using namespace lucene::index;
using namespace lucene::analysis;
using namespace lucene::util;
using namespace lucene::store;
using namespace lucene::document;
static Document& createDocument(const char_t* f){
// make a new, empty document
Document& doc = *new Document();
string documentTmp(f);
tmp =
documentTmp.substr(documentTmp.find("<tit>")+5,(documentTmp.find("</tit>")-documentTmp.find("<tit>")-5));
doc.add(Field::Text(_T("tit"), _T(tmp.c_str())));
tmp =
documentTmp.substr(documentTmp.find("<con>")+5,(documentTmp.find("</con>")-documentTmp.find("<con>")-5));
doc.add(Field::Text(_T("con"), _T(tmp.c_str())));
// return the document
return doc;
}
static char_t* readLine(Reader& r){
StringBuffer line;
char_t c;
bool end = false;
while(!end && r.available()){
c = r.readChar();
line.append(c);
if(c == '\n'){
end = true;
}
}
if (!strcmp(line.ToString(),""))
return NULL;
return line.ToString();
}
static int stringFind(char_t* text, char_t* pattern){
string f1(text);
return f1.find(pattern,0);
}
static void indexDocument(IndexWriter* writer, char_t* file){
try {
FileReader reader(file);
StringBuffer document;
char_t* line;
while ((line = readLine(reader))!=NULL) {
if (stringFind(line, "</xml")!=string::npos){
document.append(line);
writer->addDocument(createDocument(document.ToString()));
document.clear();
StringBuffer document;
}
else
document.append(line);
}
reader.close();
}catch (exception& e){
cout << "Exception: " << e.what() << endl;
}
}
static void indexDocs(IndexWriter* writer, char_t* directory) {
DIR* dir = opendir(directory);
struct dirent* fl;
struct Struct_Stat buf;
char_t path[MAX_PATH];
stringCopy(path,directory);
stringCat(path,PATH_DELIMITER);
char_t* pathP = path + stringLength(path);
fl = readdir(dir);
while ( fl != NULL ){
if ( (stringCompare(fl->d_name, _T("."))) &&
(stringCompare(fl->d_name, _T(".."))) ) {
pathP[0]=0;
stringCat(pathP,fl->d_name);
int_t ret = Cmd_Stat(path,&buf);
if ( buf.st_mode & S_IFDIR ) {
indexDocs(writer, path);
}else{
printFormatted( _T("adding: %s\n"), fl->d_name );
indexDocument(writer,path);
}
}
fl = readdir(dir);
}
closedir(dir);
}
static void IndexFiles(char_t* path, char_t* target, const bool
clearIndex){
long_t str = lucene::util::Misc::currentTimeMillis();
IndexWriter* writer = NULL;
Directory* d = NULL;
//lucene::analysis::standard::StandardAnalyzer& an = *new
lucene::analysis::standard::StandardAnalyzer();
lucene::analysis::SimpleAnalyzer& an = *new
lucene::analysis::SimpleAnalyzer();
//lucene::analysis::WhitespaceAnalyzer& an = *new
lucene::analysis::WhitespaceAnalyzer();
if ( !clearIndex && IndexReader::indexExists(target) ){
d = &FSDirectory::getDirectory( target,false );
if ( IndexReader::isLocked(*d) ){
_cout << _T("Index was locked... unlocking it.")<<endl;
IndexReader::unlock(*d);
}
writer = new IndexWriter( *d, an, false);
}else{
d = &FSDirectory::getDirectory(target,true);
writer = new IndexWriter( *d ,an, true);
}
_cout << "start indexDocs" << endl;
//writer->infoStream = &cout; //TODO: infoStream - unicode
indexDocs(writer, path);
_cout << "end indexDocs" << endl;
writer->optimize();
_cout << "end optimize"<< endl;
writer->close();
_cout << "end close"<< endl;
delete writer;
delete &an;
_cout << _T("Indexing took: ") <<
(lucene::util::Misc::currentTimeMillis() - str) << _T("ms.") << endl <<
endl;
}
static void IndexFiles(char_t* index){
cout << "Files to index location: ";
char_t files[250];
_cin.getline(files,250);
IndexFiles(files, index, true);
}
}
}
#endif
If I add the _cout << "test" line after the error line, the program
terminates by segmentation fault as well.
Any suggestions?
Thanks.
am...@ma... wrote:
>Cheers Rob !
>
>I'll give this a go tonight, I'm presently messing with Lucene and calling
>its methods from PHP with a few Java classes (I know...desperate :-) )
>Good thing is that I finally got Java working from PHP so beside the slow
>response times it actually can index a given record from MySQL.
>Now I'll try to get the search going and see if that can be hacked fast.
>
>Afterwards I'll play with your CLucene source and see if that makes
>the difference when used with a php extension (I hope it does), so some
>tests can be run/tried Lucene vs CLucene.
>
>
>-pedja
>
>----- Original Message -----
>From: "Rob Ristroph" <rg...@sd...>
>To: <clu...@li...>
>Sent: Wednesday, October 15, 2003 3:38 PM
>Subject: [CLucene-dev] Memory fixes to clucene
>
>
>
>
>>Hi everyone,
>>
>> I put up a complete tar file of my current source tree here:
>>
>>http://rgr.freeshell.org/clucene/
>>
>> I would like to submit more organized patches, however, as Ben
>> will be able to figure them out more easily. However that's
>> going to take some time, and I know a couple of people were
>> waiting to try something out now.
>>
>> I found all the errors I fixed using valgrind. There is still
>> one I didn't fix, which is caused when a key value is inserted
>> into the VoidList.h template, where that key value is
>> allocated with new[] but when you remove something from that
>> structure it is de-allocated with delete not delete[]. Most
>> things inserted into it are allocated with new so you can't
>> just change delete to delete[]. We can track that down later.
>>
>> I did not search for or fix memory leaks, just things which
>> caused crashes.
>>
>> I can now run on the reuter's test set, or a directory that
>> has two copies of it, without crashing. However, if I use the
>> demo example program, and index it and the search on
>> "Microsoft", it misses some occurances of the word "Microsoft"
>> from the results (you can check with grep). That is a bug I
>> can help work on later after the current changes have been
>> ingested and tested by a few people.
>>
>> Note that it won't help Ben much to send him a patch that is a
>> difference between something you did and this tar file. You
>> want to make patches against what is in the CVS.
>>
>> The way I am doing it is ( now that sourceforge's CVS servers
>> seem to be back ! ) is to use the command "cvs diff". For
>> example, I can cd into src/CLucene/utils and type "cvs diff"
>> and it will print out a diff suitable to be used as a patch
>> file that Ben (or anyone working from the CVS version) can
>> apply.
>>
>> My plan is to do exactly that, but instead of one big patch
>> for all I did, submit little ones, all changes per directory
>> or all changes that fix a particular type of problem.
>> Otherwise any mistakes I made will get lost in the noise and
>> slip past Ben.
>>
>> I'm busy in the near term, so those will start to trickle in
>> over the next couple of days, and I can have them all
>> organized and submitted by Saturday afternoon, I think.
>>
>>--Rob
>>
>>
>>-------------------------------------------------------
>>This SF.net email is sponsored by: SF.net Giveback Program.
>>SourceForge.net hosts over 70,000 Open Source Projects.
>>See the people who have HELPED US provide better services:
>>Click here: http://sourceforge.net/supporters.php
>>_______________________________________________
>>CLucene-developers mailing list
>>CLu...@li...
>>https://lists.sourceforge.net/lists/listinfo/clucene-developers
>>
>>
>>
>
>
>
>
>-------------------------------------------------------
>This SF.net email is sponsored by: SF.net Giveback Program.
>SourceForge.net hosts over 70,000 Open Source Projects.
>See the people who have HELPED US provide better services:
>Click here: http://sourceforge.net/supporters.php
>_______________________________________________
>CLucene-developers mailing list
>CLu...@li...
>https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
>
>
>
|
|
From: <am...@ma...> - 2003-10-15 20:44:26
|
Cheers Rob ! I'll give this a go tonight, I'm presently messing with Lucene and calling its methods from PHP with a few Java classes (I know...desperate :-) ) Good thing is that I finally got Java working from PHP so beside the slow response times it actually can index a given record from MySQL. Now I'll try to get the search going and see if that can be hacked fast. Afterwards I'll play with your CLucene source and see if that makes the difference when used with a php extension (I hope it does), so some tests can be run/tried Lucene vs CLucene. -pedja ----- Original Message ----- From: "Rob Ristroph" <rg...@sd...> To: <clu...@li...> Sent: Wednesday, October 15, 2003 3:38 PM Subject: [CLucene-dev] Memory fixes to clucene > > Hi everyone, > > I put up a complete tar file of my current source tree here: > > http://rgr.freeshell.org/clucene/ > > I would like to submit more organized patches, however, as Ben > will be able to figure them out more easily. However that's > going to take some time, and I know a couple of people were > waiting to try something out now. > > I found all the errors I fixed using valgrind. There is still > one I didn't fix, which is caused when a key value is inserted > into the VoidList.h template, where that key value is > allocated with new[] but when you remove something from that > structure it is de-allocated with delete not delete[]. Most > things inserted into it are allocated with new so you can't > just change delete to delete[]. We can track that down later. > > I did not search for or fix memory leaks, just things which > caused crashes. > > I can now run on the reuter's test set, or a directory that > has two copies of it, without crashing. However, if I use the > demo example program, and index it and the search on > "Microsoft", it misses some occurances of the word "Microsoft" > from the results (you can check with grep). That is a bug I > can help work on later after the current changes have been > ingested and tested by a few people. > > Note that it won't help Ben much to send him a patch that is a > difference between something you did and this tar file. You > want to make patches against what is in the CVS. > > The way I am doing it is ( now that sourceforge's CVS servers > seem to be back ! ) is to use the command "cvs diff". For > example, I can cd into src/CLucene/utils and type "cvs diff" > and it will print out a diff suitable to be used as a patch > file that Ben (or anyone working from the CVS version) can > apply. > > My plan is to do exactly that, but instead of one big patch > for all I did, submit little ones, all changes per directory > or all changes that fix a particular type of problem. > Otherwise any mistakes I made will get lost in the noise and > slip past Ben. > > I'm busy in the near term, so those will start to trickle in > over the next couple of days, and I can have them all > organized and submitted by Saturday afternoon, I think. > > --Rob > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > |