From: Bryan T. <br...@sy...> - 2015-12-11 21:45:52
|
Great! Igor, can you please file and accept a ticket to document this? Jeremy put together the ConfigurableAnalyzerFactory. Please check both the developer list and git commits if you do not find enough info in the javadoc. Thanks, Bryan On Dec 11, 2015 4:08 PM, "Jim Balhoff" <ba...@gm...> wrote: > I figured it out. I needed to add these 4 lines to the properties file: > > > com.bigdata.search.FullTextIndex.analyzerFactoryClass=com.bigdata.search.ConfigurableAnalyzerFactory > > com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.analyzerClass=org.apache.lucene.analysis.standard.StandardAnalyzer > com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.stopwords=none > com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.like=eng > > This gives me the desired results—no stopwords. > > Thanks, > Jim > > > On Dec 10, 2015, at 5:12 PM, Bryan Thompson <br...@sy...> wrote: > > I suggest either applying a debugger or creating a unit test that > replicates the issue and filing a ticket and attaching the test. You > should be able to configure any aspect of the tokenization behavior. > > I would also try a namespace specific override. E.g., > > com.bigdata.*foo.lex.*search.ConfigurableAnalyzerFactory.stopwords=none >> > > Thanks, > Bryan > > On Thu, Dec 10, 2015 at 4:55 PM, Jim Balhoff <ba...@gm...> wrote: > >> Yes, I am deleting the journal file and reloading in between attempts. >> >> Thanks, >> Jim >> >> On Dec 10, 2015, at 4:50 PM, Brad Bebee <be...@sy...> wrote: >> >> Jim, >> >> Did you reload the namespace after the configuration change? >> >> Thanks, --Brad >> >> On Thu, Dec 10, 2015 at 4:38 PM, Jim Balhoff <ba...@gm...> wrote: >> >>> I had tried some similar syntax, but wasn’t sure how it should look. I >>> just tried the form you suggested and it did not have an affect on >>> stopwords. They seem to still be active, because if my search input is >>> simply “of”, I get this message: >>> >>> WARN : FullTextIndex.java:1052: No terms after stopword extraction: >>> query=com.bigdata.rdf.lexicon.ITextIndexer$FullTextQuery@7aef6039 >>> >>> Thanks, >>> Jim >>> >>> >>> > On Dec 9, 2015, at 11:34 PM, Brad Bebee <be...@sy...> wrote: >>> > >>> > Jim, >>> > >>> > Thank you. Have you tried configuring your journal with the property >>> below? >>> > >>> > com.bigdata.search.ConfigurableAnalyzerFactory.stopwords=none >>> > >>> > >>> > >>> > Thanks, --Brad >>> > >>> > >>> > On Wed, Dec 9, 2015 at 9:36 AM, Jim Balhoff <ba...@gm...> wrote: >>> > Hi Brad, >>> > >>> > I see, I can look into providing my own implementation. I got the >>> impression from the JavaDoc that I could provide config options to modify >>> the behavior of some of the analyzers. I have been looking at these pages: >>> > >>> > >>> https://www.blazegraph.com/docs/api/com/bigdata/search/ConfigurableAnalyzerFactory.AnalyzerOptions.html#STOPWORDS_VALUE_NONE >>> > https://www.blazegraph.com/docs/api/constant-values.html >>> > >>> > I also tried to switch from the default analyzer to the >>> TermCompletionAnalyzer, but haven’t been able to get the property value set >>> correctly for “wordBoundary” in the config file. >>> > >>> > Understanding how to translate option constants from the JavaDoc into >>> correctly written config file properties has been a challenge. It would be >>> really helpful to have more of those spelled out on the wiki. >>> > >>> > Thanks! >>> > Jim >>> > >>> > > On Dec 8, 2015, at 9:54 PM, Brad Bebee <be...@sy...> wrote: >>> > > >>> > > Jim, >>> > > >>> > > I believe you could do this by overriding the Analyzer Factory Class >>> [1] with your own implementation that does not filter stopwords [2]. Other >>> may have more specific suggestions. >>> > > >>> > > [1] >>> https://www.blazegraph.com/docs/api/com/bigdata/search/FullTextIndex.Options.html#ANALYZER_FACTORY_CLASS >>> > > >>> > > [2] >>> https://www.blazegraph.com/docs/api/com/bigdata/search/IAnalyzerFactory.html >>> > > >>> > > Thanks, --Brad >>> > > >>> > > On Tue, Dec 8, 2015 at 9:12 PM, Jim Balhoff <ba...@gm...> >>> wrote: >>> > > Hi, >>> > > >>> > > I was wondering if anyone could provide examples for how to set up >>> the Blazegraph properties file to configure options for the full text >>> search. I have looked through the various options in the JavaDoc, but I >>> can’t quite figure out the right properties file syntax for >>> ‘stopwords=none’. Here is what I am trying to do: >>> > > >>> > > I have a term in the database with rdfs:label "skeletal element of >>> eye region”. When users search for terms, I append a “*” to their input >>> text by default. However this is failing when the label contains a stopword >>> like “of”. So, searching with “skeletal element of” and “skeletal element*” >>> do find the term as a match, but “skeletal element of*” does not. Can I >>> disable stopwords entirely? >>> > > >>> > > Thanks, >>> > > Jim >>> > > >>> > > >>> > > >>> ------------------------------------------------------------------------------ >>> > > _______________________________________________ >>> > > Bigdata-developers mailing list >>> > > Big...@li... >>> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>> > > >>> > > >>> > > >>> > > -- >>> > > _______________ >>> > > Brad Bebee >>> > > CEO, Managing Partner >>> > > SYSTAP, LLC >>> > > e: be...@sy... >>> > > m: 202.642.7961 >>> > > f: 571.367.5000 >>> > > w: www.blazegraph.com >>> > > >>> > > Blazegraph™ is our ultra high-performance graph database that >>> supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ is our >>> disruptive new technology to use GPUs to accelerate data-parallel graph >>> analytics. >>> > > >>> > > CONFIDENTIALITY NOTICE: This email and its contents and attachments >>> are for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. > > >>> > > >>> > >>> > >>> > >>> > >>> > -- >>> > _______________ >>> > Brad Bebee >>> > CEO, Managing Partner >>> > SYSTAP, LLC >>> > e: be...@sy... >>> > m: 202.642.7961 >>> > f: 571.367.5000 >>> > w: www.blazegraph.com >>> > >>> > Blazegraph™ is our ultra high-performance graph database that supports >>> both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ is our disruptive >>> new technology to use GPUs to accelerate data-parallel graph analytics. >>> > >>> > CONFIDENTIALITY NOTICE: This email and its contents and attachments >>> are for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. > >>> > >>> >>> >> >> >> -- >> _______________ >> Brad Bebee >> CEO, Managing Partner >> SYSTAP, LLC >> e: be...@sy... >> m: 202.642.7961 >> f: 571.367.5000 >> w: www.blazegraph.com >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. Mapgraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> >> > > |