Just replying to my own email to record the answer.
DSpace 1.8 included an upgrade to Lucene 3.3.0. With 3.3 Lucene seems to
have changed the default Analyzer to no longer be an english language
analyser. From the javadocs...
"* ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior
to 3.1.
* As of 3.1, {@link StandardAnalyzer} implements Unicode text
segmentation,
* as specified by UAX#29."
I think the old behaviour can be reinstated by changing dspace.cfg to
have...
search.analyzer = org.apache.lucene.analysis.standard.ClassicAnalyzer
Cheers.
On 01/11/12 16:11, TAYLOR Robin wrote:
> Hi all,
>
> I've just been comparing a search at DSpace version 1.6 with a search at
> 1.8 and notice that at 1.8 an apostrophe is treated as a token
> delimiter, so a search term of "O'Connor" is split into "O" and
> "Connor", whereas at 1.6 it was treated as one token. I presume it was a
> conscious change made at some point and I was just wondering when and
> where (in terms of the source code). Its not a problem for me I just
> need to be able to provide an explanation to the repository administrator.
>
> Thanks, Robin.
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
|