From: Rui O. <rui...@ho...> - 2010-04-26 12:55:31
|
How can I check this? I just get text from files to a CString, and after this put them in CLucene. Apparently, the text I get from file to CString it is right, I have checked in degub mode and looks good. Rui > Date: Mon, 26 Apr 2010 14:44:56 +0200 > From: nun...@go... > To: clu...@li... > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > Rui, > > which encoding do you use internally before you give it to CLucene? > Maybe you use an encoding different to the encoding expected by > CLucene. > > Kind regards, > > Veit > > 2010/4/26 Rui Oliveira <rui...@ho...>: > > Hi, > > > > I have been using luke to analyze index. > > > > Well, all Portuguese characters appear replaced by an strange character. > > > > What I can do to avoid this? > > It is not possible make clucene working with Portuguese characters? > > > > Thanks & Regards, > > Rui > > > > > > > >> Date: Fri, 23 Apr 2010 20:43:49 +0200 > >> From: bva...@gm... > >> To: clu...@li... > >> Subject: Re: [CLucene-dev] Clucene search - Do not found some words > >> > >> I suggest using a program called luke (google it). You can then look > >> into the index and see what is indexed. Let us know if u see all the > >> words you would expect to see. And see if u can find the document if u > >> search from luke > >> > >> handy program :) > >> > >> cheers > >> ben > >> > >> On Friday, April 23, 2010, Rui Oliveira <rui...@ho...> wrote: > >> > > >> > > >> > > >> > > >> > > >> > Itamar, > >> > > >> > The test results are made all them in same file. The same file have > >> > "orçamento" and "administração" and found "administração" and do not found > >> > "orçamento". > >> > > >> > The results are the same for a file in ANSI, Unicode or UTF8 encoded. > >> > The problem is not loading files because I debug the text loaded from file > >> > and this text are ok. > >> > > >> > Rui > >> > > >> > > >> > > >> > > >> > From: it...@di... > >> > To: clu...@li... > >> > Date: Fri, 23 Apr 2010 17:59:27 +0300 > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > >> > > >> > Rui, > >> > > >> > This file is ANSI encoded. Are the other files you do succeed in finding > >> > are Unicode / UTF8 encoded perhaps? If that's the case your routine for > >> > loading the files is buggy. You should either have them all encoded using > >> > the same encoding, or have more intelligent code to convert incompatible > >> > encoding. > >> > > >> > HTH > >> > > >> > Itamar. > >> > > >> > > >> > From: Rui Oliveira [mailto:rui...@ho...] > >> > Sent: Friday, April 23, 2010 4:32 PM > >> > To: clucene-developers; oni...@gm... > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > >> > > >> > > >> > I just attach the file. > >> > > >> > Tks, Rui > >> > > >> > > >> > From: oni...@gm... > >> > Date: Fri, 23 Apr 2010 09:22:05 -0400 > >> > To: clu...@li... > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > >> > > >> > Can you send me this file that has both "orçamento" and administração? > >> > > >> > Or you can do a test: Open the file and delete the ç form orçamento and > >> > administração. > >> > And then type ç again. > >> > > >> > Index again and try to search both words again. > >> > > >> > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <rui...@ho...> > >> > wrote: > >> > > >> > They are text file (*.txt) and both words are in same document. > >> > When I search for "orçamento" don't found anything and when I search for > >> > "administração" the document is found. > >> > > >> > > >> > Rui > >> > > >> > > >> > From: oni...@gm... > >> > Date: Fri, 23 Apr 2010 09:09:30 -0400 > >> > > >> > > >> > > >> > To: clu...@li... > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > >> > > >> > Seems like an encoding problem with these documents. Are they html > >> > pages? > >> > Are the words "orçamento" and "administração" in the same page? for > >> > example? > >> > > >> > Can you dump one of these files here? (One that has the problem and one > >> > that has not) > >> > > >> > > >> > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <rui...@ho...> > >> > wrote: > >> > > >> > I am indexing some separated documents. > >> > > >> > The document that have these words are a small text document. This > >> > document is indexed without any visible error. This same document is found > >> > when I search for other words on it. > >> > > >> > > >> > Rui > >> > > >> > > >> > From: oni...@gm... > >> > Date: Fri, 23 Apr 2010 08:58:05 -0400 > >> > > >> > > >> > > >> > To: clu...@li... > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > >> > > >> > What are you indexing? > >> > > >> > Just a big document? > >> > Or a lot of sepparate documents ? (html documents?) > >> > > >> > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <rui...@ho...> > >> > wrote: > >> > > >> > Hi Onilton, > >> > > >> > I have tested with "orcamento" instead of "orçamento" and didn't get > >> > anything. > >> > > >> > I do not know if lucene indexes "orçamento" in a wrong way, because > >> > indexes without any error, but when I search for it do not get anything. > >> > > >> > Thnaks & Regards, > >> > Rui > >> > > >> > > >> > From: > >> > > >> > >> > >> ------------------------------------------------------------------------------ > >> _______________________________________________ > >> CLucene-developers mailing list > >> CLu...@li... > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > ________________________________ > > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. > > Learn more. > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 |