|
From: Bryan T. <br...@sy...> - 2015-12-10 22:41:58
|
I suggest either applying a debugger or creating a unit test that replicates the issue and filing a ticket and attaching the test. You should be able to configure any aspect of the tokenization behavior. I would also try a namespace specific override. E.g., com.bigdata.*foo.lex.*search.ConfigurableAnalyzerFactory.stopwords=none > Thanks, Bryan On Thu, Dec 10, 2015 at 4:55 PM, Jim Balhoff <ba...@gm...> wrote: > Yes, I am deleting the journal file and reloading in between attempts. > > Thanks, > Jim > > On Dec 10, 2015, at 4:50 PM, Brad Bebee <be...@sy...> wrote: > > Jim, > > Did you reload the namespace after the configuration change? > > Thanks, --Brad > > On Thu, Dec 10, 2015 at 4:38 PM, Jim Balhoff <ba...@gm...> wrote: > >> I had tried some similar syntax, but wasn’t sure how it should look. I >> just tried the form you suggested and it did not have an affect on >> stopwords. They seem to still be active, because if my search input is >> simply “of”, I get this message: >> >> WARN : FullTextIndex.java:1052: No terms after stopword extraction: >> query=com.bigdata.rdf.lexicon.ITextIndexer$FullTextQuery@7aef6039 >> >> Thanks, >> Jim >> >> >> > On Dec 9, 2015, at 11:34 PM, Brad Bebee <be...@sy...> wrote: >> > >> > Jim, >> > >> > Thank you. Have you tried configuring your journal with the property >> below? >> > >> > com.bigdata.search.ConfigurableAnalyzerFactory.stopwords=none >> > >> > >> > >> > Thanks, --Brad >> > >> > >> > On Wed, Dec 9, 2015 at 9:36 AM, Jim Balhoff <ba...@gm...> wrote: >> > Hi Brad, >> > >> > I see, I can look into providing my own implementation. I got the >> impression from the JavaDoc that I could provide config options to modify >> the behavior of some of the analyzers. I have been looking at these pages: >> > >> > >> https://www.blazegraph.com/docs/api/com/bigdata/search/ConfigurableAnalyzerFactory.AnalyzerOptions.html#STOPWORDS_VALUE_NONE >> > https://www.blazegraph.com/docs/api/constant-values.html >> > >> > I also tried to switch from the default analyzer to the >> TermCompletionAnalyzer, but haven’t been able to get the property value set >> correctly for “wordBoundary” in the config file. >> > >> > Understanding how to translate option constants from the JavaDoc into >> correctly written config file properties has been a challenge. It would be >> really helpful to have more of those spelled out on the wiki. >> > >> > Thanks! >> > Jim >> > >> > > On Dec 8, 2015, at 9:54 PM, Brad Bebee <be...@sy...> wrote: >> > > >> > > Jim, >> > > >> > > I believe you could do this by overriding the Analyzer Factory Class >> [1] with your own implementation that does not filter stopwords [2]. Other >> may have more specific suggestions. >> > > >> > > [1] >> https://www.blazegraph.com/docs/api/com/bigdata/search/FullTextIndex.Options.html#ANALYZER_FACTORY_CLASS >> > > >> > > [2] >> https://www.blazegraph.com/docs/api/com/bigdata/search/IAnalyzerFactory.html >> > > >> > > Thanks, --Brad >> > > >> > > On Tue, Dec 8, 2015 at 9:12 PM, Jim Balhoff <ba...@gm...> >> wrote: >> > > Hi, >> > > >> > > I was wondering if anyone could provide examples for how to set up >> the Blazegraph properties file to configure options for the full text >> search. I have looked through the various options in the JavaDoc, but I >> can’t quite figure out the right properties file syntax for >> ‘stopwords=none’. Here is what I am trying to do: >> > > >> > > I have a term in the database with rdfs:label "skeletal element of >> eye region”. When users search for terms, I append a “*” to their input >> text by default. However this is failing when the label contains a stopword >> like “of”. So, searching with “skeletal element of” and “skeletal element*” >> do find the term as a match, but “skeletal element of*” does not. Can I >> disable stopwords entirely? >> > > >> > > Thanks, >> > > Jim >> > > >> > > >> > > >> ------------------------------------------------------------------------------ >> > > _______________________________________________ >> > > Bigdata-developers mailing list >> > > Big...@li... >> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> > > >> > > >> > > >> > > -- >> > > _______________ >> > > Brad Bebee >> > > CEO, Managing Partner >> > > SYSTAP, LLC >> > > e: be...@sy... >> > > m: 202.642.7961 >> > > f: 571.367.5000 >> > > w: www.blazegraph.com >> > > >> > > Blazegraph™ is our ultra high-performance graph database that >> supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ is our >> disruptive new technology to use GPUs to accelerate data-parallel graph >> analytics. >> > > >> > > CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. > > >> > > >> > >> > >> > >> > >> > -- >> > _______________ >> > Brad Bebee >> > CEO, Managing Partner >> > SYSTAP, LLC >> > e: be...@sy... >> > m: 202.642.7961 >> > f: 571.367.5000 >> > w: www.blazegraph.com >> > >> > Blazegraph™ is our ultra high-performance graph database that supports >> both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ is our disruptive >> new technology to use GPUs to accelerate data-parallel graph analytics. >> > >> > CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. > >> > >> >> > > > -- > _______________ > Brad Bebee > CEO, Managing Partner > SYSTAP, LLC > e: be...@sy... > m: 202.642.7961 > f: 571.367.5000 > w: www.blazegraph.com > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. Mapgraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |