I was looking at this feature request and I wanted to discuss my ideas here:
Currently, DocFetcher uses Standard Analyzer for indexing and compairing the search query with the indexes. Standard Analyzer calls lowercase analyser that converts all the characters to lower case characters and uses them for indexing and searching. Thus, we cannot do a case-sensitive search.
However, I tried to use WhitespaceAnalyzer instead of Standard Analyzer and now, I am able to do a case-sensitive search (as the strings are stored/indexed as the way it appears). But, to support both case-sensitive and case-insensitive serach in a single application, I think we need to create two sets of indexes; one using standard analyzer and one with whitespace analyzer, and then search the respective index based on how the user supplies their search queries.
Of-course, there will be quite a memory consumption to store two version of indexes on the RAM while searching, but, I can only think of such a method now.
Please let me know, if you have any points/concerns on such a strategy. I will keep this post updated if I can find any other method that can handle this case.
Thanks,
Ankush H Prasad
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Also, keep the following in mind: (1) An analyzer is used at two points: During the parsing of the user-entered query, and during indexing. (2) At these two points, the same analyzer must be used. So if you use WhitespaceAnalyzer for indexing, you should also use WhitespaceAnalyzer to parse the query.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Agree. I am using WhiteSpaceAnalyzer at both the places. I infact tested the feature after implementing it, it perfectly works fine for case sensitive searches. Now what I am working on is on how to combine both features (one with WhiteSpaceAnalyzer and one with the Standard Analyzer) together, so that the application supports both.
Thanks,
Ankush H Prasad
Last edit: Ankush Prasad 2016-02-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
At present, there will mostly be only bugfixes due to lack of time. So, no plans to implement this.
Best regards
q:-) <= Quang
Hi Quang,
I was looking at this feature request and I wanted to discuss my ideas here:
Currently, DocFetcher uses Standard Analyzer for indexing and compairing the search query with the indexes. Standard Analyzer calls lowercase analyser that converts all the characters to lower case characters and uses them for indexing and searching. Thus, we cannot do a case-sensitive search.
However, I tried to use WhitespaceAnalyzer instead of Standard Analyzer and now, I am able to do a case-sensitive search (as the strings are stored/indexed as the way it appears). But, to support both case-sensitive and case-insensitive serach in a single application, I think we need to create two sets of indexes; one using standard analyzer and one with whitespace analyzer, and then search the respective index based on how the user supplies their search queries.
Of-course, there will be quite a memory consumption to store two version of indexes on the RAM while searching, but, I can only think of such a method now.
Please let me know, if you have any points/concerns on such a strategy. I will keep this post updated if I can find any other method that can handle this case.
Thanks,
Ankush H Prasad
I don't know much about case-sensitive search with Lucene either. Your best bet would be to look around on StackOverflow. For instance, this post seems helpful: http://stackoverflow.com/questions/2487736/lucene-case-sensitive-insensitive-search
Also, keep the following in mind: (1) An analyzer is used at two points: During the parsing of the user-entered query, and during indexing. (2) At these two points, the same analyzer must be used. So if you use WhitespaceAnalyzer for indexing, you should also use WhitespaceAnalyzer to parse the query.
Hi Quang,
Agree. I am using WhiteSpaceAnalyzer at both the places. I infact tested the feature after implementing it, it perfectly works fine for case sensitive searches. Now what I am working on is on how to combine both features (one with WhiteSpaceAnalyzer and one with the Standard Analyzer) together, so that the application supports both.
Thanks,
Ankush H Prasad
Last edit: Ankush Prasad 2016-02-29