From: Igor K. <igo...@ms...> - 2015-12-14 04:39:58
|
Bryan, I've created a ticket: https://jira.blazegraph.com/browse/BLZG-1687 On Sat, Dec 12, 2015 at 2:19 AM, Bryan Thompson <br...@sy...> wrote: > Great! > > Igor, can you please file and accept a ticket to document this? Jeremy > put together the ConfigurableAnalyzerFactory. Please check both the > developer list and git commits if you do not find enough info in the > javadoc. > > Thanks, > Bryan > On Dec 11, 2015 4:08 PM, "Jim Balhoff" <ba...@gm...> wrote: > >> I figured it out. I needed to add these 4 lines to the properties file: >> >> >> com.bigdata.search.FullTextIndex.analyzerFactoryClass=com.bigdata.search.ConfigurableAnalyzerFactory >> >> com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.analyzerClass=org.apache.lucene.analysis.standard.StandardAnalyzer >> com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.stopwords=none >> com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.like=eng >> >> This gives me the desired results—no stopwords. >> >> Thanks, >> Jim >> >> >> On Dec 10, 2015, at 5:12 PM, Bryan Thompson <br...@sy...> wrote: >> >> I suggest either applying a debugger or creating a unit test that >> replicates the issue and filing a ticket and attaching the test. You >> should be able to configure any aspect of the tokenization behavior. >> >> I would also try a namespace specific override. E.g., >> >> com.bigdata.*foo.lex.*search.ConfigurableAnalyzerFactory.stopwords=none >>> >> >> Thanks, >> Bryan >> >> On Thu, Dec 10, 2015 at 4:55 PM, Jim Balhoff <ba...@gm...> wrote: >> >>> Yes, I am deleting the journal file and reloading in between attempts. >>> >>> Thanks, >>> Jim >>> >>> On Dec 10, 2015, at 4:50 PM, Brad Bebee <be...@sy...> wrote: >>> >>> Jim, >>> >>> Did you reload the namespace after the configuration change? >>> >>> Thanks, --Brad >>> >>> On Thu, Dec 10, 2015 at 4:38 PM, Jim Balhoff <ba...@gm...> wrote: >>> >>>> I had tried some similar syntax, but wasn’t sure how it should look. I >>>> just tried the form you suggested and it did not have an affect on >>>> stopwords. They seem to still be active, because if my search input is >>>> simply “of”, I get this message: >>>> >>>> WARN : FullTextIndex.java:1052: No terms after stopword extraction: >>>> query=com.bigdata.rdf.lexicon.ITextIndexer$FullTextQuery@7aef6039 >>>> >>>> Thanks, >>>> Jim >>>> >>>> >>>> > On Dec 9, 2015, at 11:34 PM, Brad Bebee <be...@sy...> wrote: >>>> > >>>> > Jim, >>>> > >>>> > Thank you. Have you tried configuring your journal with the property >>>> below? >>>> > >>>> > com.bigdata.search.ConfigurableAnalyzerFactory.stopwords=none >>>> > >>>> > >>>> > >>>> > Thanks, --Brad >>>> > >>>> > >>>> > On Wed, Dec 9, 2015 at 9:36 AM, Jim Balhoff <ba...@gm...> >>>> wrote: >>>> > Hi Brad, >>>> > >>>> > I see, I can look into providing my own implementation. I got the >>>> impression from the JavaDoc that I could provide config options to modify >>>> the behavior of some of the analyzers. I have been looking at these pages: >>>> > >>>> > >>>> https://www.blazegraph.com/docs/api/com/bigdata/search/ConfigurableAnalyzerFactory.AnalyzerOptions.html#STOPWORDS_VALUE_NONE >>>> > https://www.blazegraph.com/docs/api/constant-values.html >>>> > >>>> > I also tried to switch from the default analyzer to the >>>> TermCompletionAnalyzer, but haven’t been able to get the property value set >>>> correctly for “wordBoundary” in the config file. >>>> > >>>> > Understanding how to translate option constants from the JavaDoc into >>>> correctly written config file properties has been a challenge. It would be >>>> really helpful to have more of those spelled out on the wiki. >>>> > >>>> > Thanks! >>>> > Jim >>>> > >>>> > > On Dec 8, 2015, at 9:54 PM, Brad Bebee <be...@sy...> wrote: >>>> > > >>>> > > Jim, >>>> > > >>>> > > I believe you could do this by overriding the Analyzer Factory >>>> Class [1] with your own implementation that does not filter stopwords [2]. >>>> Other may have more specific suggestions. >>>> > > >>>> > > [1] >>>> https://www.blazegraph.com/docs/api/com/bigdata/search/FullTextIndex.Options.html#ANALYZER_FACTORY_CLASS >>>> > > >>>> > > [2] >>>> https://www.blazegraph.com/docs/api/com/bigdata/search/IAnalyzerFactory.html >>>> > > >>>> > > Thanks, --Brad >>>> > > >>>> > > On Tue, Dec 8, 2015 at 9:12 PM, Jim Balhoff <ba...@gm...> >>>> wrote: >>>> > > Hi, >>>> > > >>>> > > I was wondering if anyone could provide examples for how to set up >>>> the Blazegraph properties file to configure options for the full text >>>> search. I have looked through the various options in the JavaDoc, but I >>>> can’t quite figure out the right properties file syntax for >>>> ‘stopwords=none’. Here is what I am trying to do: >>>> > > >>>> > > I have a term in the database with rdfs:label "skeletal element of >>>> eye region”. When users search for terms, I append a “*” to their input >>>> text by default. However this is failing when the label contains a stopword >>>> like “of”. So, searching with “skeletal element of” and “skeletal element*” >>>> do find the term as a match, but “skeletal element of*” does not. Can I >>>> disable stopwords entirely? >>>> > > >>>> > > Thanks, >>>> > > Jim >>>> > > >>>> > > >>>> > > >>>> ------------------------------------------------------------------------------ >>>> > > _______________________________________________ >>>> > > Bigdata-developers mailing list >>>> > > Big...@li... >>>> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>> > > >>>> > > >>>> > > >>>> > > -- >>>> > > _______________ >>>> > > Brad Bebee >>>> > > CEO, Managing Partner >>>> > > SYSTAP, LLC >>>> > > e: be...@sy... >>>> > > m: 202.642.7961 >>>> > > f: 571.367.5000 >>>> > > w: www.blazegraph.com >>>> > > >>>> > > Blazegraph™ is our ultra high-performance graph database that >>>> supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ is our >>>> disruptive new technology to use GPUs to accelerate data-parallel graph >>>> analytics. >>>> > > >>>> > > CONFIDENTIALITY NOTICE: This email and its contents and >>>> attachments are for the sole use of the intended recipient(s) and are >>>> confidential or proprietary to SYSTAP, LLC. Any unauthorized review, use, >>>> disclosure, dissemination or copying of this email or its contents or >>>> attachments is prohibited. If you have received this communication in >>>> error, please notify the sender by reply email and permanently delete all >>>> copies of the email and its contents and attachments. > > >>>> > > >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > _______________ >>>> > Brad Bebee >>>> > CEO, Managing Partner >>>> > SYSTAP, LLC >>>> > e: be...@sy... >>>> > m: 202.642.7961 >>>> > f: 571.367.5000 >>>> > w: www.blazegraph.com >>>> > >>>> > Blazegraph™ is our ultra high-performance graph database that >>>> supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ is our >>>> disruptive new technology to use GPUs to accelerate data-parallel graph >>>> analytics. >>>> > >>>> > CONFIDENTIALITY NOTICE: This email and its contents and attachments >>>> are for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments is >>>> prohibited. If you have received this communication in error, please notify >>>> the sender by reply email and permanently delete all copies of the email >>>> and its contents and attachments. > >>>> > >>>> >>>> >>> >>> >>> -- >>> _______________ >>> Brad Bebee >>> CEO, Managing Partner >>> SYSTAP, LLC >>> e: be...@sy... >>> m: 202.642.7961 >>> f: 571.367.5000 >>> w: www.blazegraph.com >>> >>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>> APIs. Mapgraph™ <http://www.systap.com/mapgraph> is our disruptive new >>> technology to use GPUs to accelerate data-parallel graph analytics. >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >>> are for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Bigdata-developers mailing list >>> Big...@li... >>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>> >>> >> >> -- *Igor Kim* | Team Leader / Backend | Maginfo, Ltd Mobile: +7-912-402-4622 |