From: Veit J. <nun...@go...> - 2010-04-27 20:13:35
|
Hi! 2010/4/27 Rui Oliveira <rui...@ho...>: > What you mean with "Index UTF8 / Unicode encoded files instead of your ANSI > ones" ? There are some index configuration for doing this? Or you are > talking about files we are indexing? If is about files we are indexing, > anything changes if file we are indexing is UTF8 / Unicode instead of ANSI. I think, he meant the encoding of the files. But it depends on the function m_GetFileContents. If this function only reads the text from the files and put them into the CString without modifying it, e.g., transcoding the text to a particular encoding, then this may cause problems, if the encoding of the file not the same as CLucene expects. If you configured your CLucene to use Unicode, then it expects all text in UTF-16/UCS2. And if your file is encoded in a different encoding, e.g., UTF-8, then you have to transcode your text to UTF-16/UCS2 before it can be indexed correctly by CLucene. By the way, I think ANSI isn't ANSI, but rather the system-specific locale, isn't it? Kind regards, Veit |