From: Onilton M. <oni...@gm...> - 2010-04-26 13:37:12
|
Can you send the code where you index? On Mon, Apr 26, 2010 at 9:55 AM, Rui Oliveira <rui...@ho...> wrote: > How can I check this? > > I just get text from files to a CString, and after this put them in > CLucene. > > Apparently, the text I get from file to CString it is right, I have checked > in degub mode and looks good. > > Rui > > > > > Date: Mon, 26 Apr 2010 14:44:56 +0200 > > From: nun...@go... > > > To: clu...@li... > > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > > > Rui, > > > > which encoding do you use internally before you give it to CLucene? > > Maybe you use an encoding different to the encoding expected by > > CLucene. > > > > Kind regards, > > > > Veit > > > > 2010/4/26 Rui Oliveira <rui...@ho...>: > > > Hi, > > > > > > I have been using luke to analyze index. > > > > > > Well, all Portuguese characters appear replaced by an strange > character. > > > > > > What I can do to avoid this? > > > It is not possible make clucene working with Portuguese characters? > > > > > > Thanks & Regards, > > > Rui > > > > > > > > > > > >> Date: Fri, 23 Apr 2010 20:43:49 +0200 > > >> From: bva...@gm... > > >> To: clu...@li... > > >> Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > >> I suggest using a program called luke (google it). You can then look > > >> into the index and see what is indexed. Let us know if u see all the > > >> words you would expect to see. And see if u can find the document if u > > >> search from luke > > >> > > >> handy program :) > > >> > > >> cheers > > >> ben > > >> > > >> On Friday, April 23, 2010, Rui Oliveira <rui...@ho...> > wrote: > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > Itamar, > > >> > > > >> > The test results are made all them in same file. The same file have > > >> > "orçamento" and "administração" and found "administração" and do not > found > > >> > "orçamento". > > >> > > > >> > The results are the same for a file in ANSI, Unicode or UTF8 > encoded. > > >> > The problem is not loading files because I debug the text loaded > from file > > >> > and this text are ok. > > >> > > > >> > Rui > > >> > > > >> > > > >> > > > >> > > > >> > From: it...@di... > > >> > To: clu...@li... > > >> > Date: Fri, 23 Apr 2010 17:59:27 +0300 > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > Rui, > > >> > > > >> > This file is ANSI encoded. Are the other files you do succeed in > finding > > >> > are Unicode / UTF8 encoded perhaps? If that's the case your routine > for > > >> > loading the files is buggy. You should either have them all encoded > using > > >> > the same encoding, or have more intelligent code to convert > incompatible > > >> > encoding. > > >> > > > >> > HTH > > >> > > > >> > Itamar. > > >> > > > >> > > > >> > From: Rui Oliveira [mailto:rui...@ho...] > > >> > Sent: Friday, April 23, 2010 4:32 PM > > >> > To: clucene-developers; oni...@gm... > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > > > >> > I just attach the file. > > >> > > > >> > Tks, Rui > > >> > > > >> > > > >> > From: oni...@gm... > > >> > Date: Fri, 23 Apr 2010 09:22:05 -0400 > > >> > To: clu...@li... > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > Can you send me this file that has both "orçamento" and > administração? > > >> > > > >> > Or you can do a test: Open the file and delete the ç form orçamento > and > > >> > administração. > > >> > And then type ç again. > > >> > > > >> > Index again and try to search both words again. > > >> > > > >> > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira < > rui...@ho...> > > >> > wrote: > > >> > > > >> > They are text file (*.txt) and both words are in same document. > > >> > When I search for "orçamento" don't found anything and when I search > for > > >> > "administração" the document is found. > > >> > > > >> > > > >> > Rui > > >> > > > >> > > > >> > From: oni...@gm... > > >> > Date: Fri, 23 Apr 2010 09:09:30 -0400 > > >> > > > >> > > > >> > > > >> > To: clu...@li... > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > Seems like an encoding problem with these documents. Are they html > > >> > pages? > > >> > Are the words "orçamento" and "administração" in the same page? for > > >> > example? > > >> > > > >> > Can you dump one of these files here? (One that has the problem and > one > > >> > that has not) > > >> > > > >> > > > >> > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira < > rui...@ho...> > > >> > wrote: > > >> > > > >> > I am indexing some separated documents. > > >> > > > >> > The document that have these words are a small text document. This > > >> > document is indexed without any visible error. This same document is > found > > >> > when I search for other words on it. > > >> > > > >> > > > >> > Rui > > >> > > > >> > > > >> > From: oni...@gm... > > >> > Date: Fri, 23 Apr 2010 08:58:05 -0400 > > >> > > > >> > > > >> > > > >> > To: clu...@li... > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > What are you indexing? > > >> > > > >> > Just a big document? > > >> > Or a lot of sepparate documents ? (html documents?) > > >> > > > >> > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira < > rui...@ho...> > > >> > wrote: > > >> > > > >> > Hi Onilton, > > >> > > > >> > I have tested with "orcamento" instead of "orçamento" and didn't get > > >> > anything. > > >> > > > >> > I do not know if lucene indexes "orçamento" in a wrong way, because > > >> > indexes without any error, but when I search for it do not get > anything. > > >> > > > >> > Thnaks & Regards, > > >> > Rui > > >> > > > >> > > > >> > From: > > >> > > > >> > > >> > > >> > ------------------------------------------------------------------------------ > > >> _______________________________________________ > > >> CLucene-developers mailing list > > >> CLu...@li... > > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > ________________________________ > > > Hotmail has tools for the New Busy. Search, chat and e-mail from your > inbox. > > > Learn more. > > > > ------------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > CLucene-developers mailing list > > > CLu...@li... > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ------------------------------ > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with > Hotmail. Get busy.<http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > |