Blazegraph (powered by bigdata) / Discussion / Help: com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running instance of blazegraph?

Paul Callahan - 2016-03-24

We need to build some text search indexes on an existing journal (blazegraph 1.5.3, but eventually on 2 and above) after updating some Lucene properties in the journal to make more of the text searchable. I have figured out how to do this with a Java utility that runs on a journal that has been taken offline using LexiconRelation.rebuildTextIndex().

The offline solution will probably suffice but I'm curious if it is possible to rebuild the index while blazegraph is running. The standalone utility won't run when the journal is in use, but maybe it could be run in a separate thread in the blazegraph instance. Assume we won't be using bds:search until rebuild is complete. Will there be problems with this that affect any other blazegraph operations?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bryan Thompson - 2016-03-24
  
  Paul,
  
  Support is being introduced in the next release for reindexing based on the
  same code that you are using. This is being done to support a data
  migration required for updated lucene dependencies. Brad can provide you
  with some more information about this procedure, but the documentation for
  it should be up on the wiki soon (if it is not already).
  
  Thanks,
  Bryan
  
  Bryan Thompson
  Chief Scientist & Founder
  Blazegraph
  e: bryan@blazegraph.com
  w: http://blazegraph.com
  
  Blazegraph products help to solve the Graph Cache Thrash to achieve large
  scale processing for graph and predictive analytics. Blazegraph is the
  creator of the industry’s first GPU-accelerated high-performance database
  for large graphs, has been named as one of the “10 Companies and
  Technologies to Watch in 2016” http://insideanalysis.com/2016/01/20535/.
  
  Blazegraph Database https://www.blazegraph.com/ is our ultra-high
  performance graph database that supports both RDF/SPARQL and
  Tinkerpop/Blueprints APIs. Blazegraph GPU
  https://www.blazegraph.com/product/gpu-accelerated/ andBlazegraph DAS
  https://www.blazegraph.com/product/gpu-accelerated/L are disruptive new
  technologies that use GPUs to enable extreme scaling that is thousands of
  times faster and 40 times more affordable than CPU-based solutions.
  
  CONFIDENTIALITY NOTICE: This email and its contents and attachments are
  for the sole use of the intended recipient(s) and are confidential or
  proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use,
  disclosure, dissemination or copying of this email or its contents or
  attachments is prohibited. If you have received this communication in
  error, please notify the sender by reply email and permanently delete all
  copies of the email and its contents and attachments.
  
  On Thu, Mar 24, 2016 at 1:26 PM, Paul Callahan paulcsyapse@users.sf.net
  wrote:
  
  We need to build some text search indexes on an existing journal
  (blazegraph 1.5.3, but eventually on 2 and above) after updating some
  Lucene properties in the journal to make more of the text searchable. I
  have figured out how to do this with a Java utility that runs on a journal
  that has been taken offline using LexiconRelation.rebuildTextIndex().
  
  The offline solution will probably suffice but I'm curious if it is
  possible to rebuild the index while blazegraph is running. The standalone
  utility won't run when the journal is in use, but maybe it could be run in
  a separate thread in the blazegraph instance. Assume we won't be using
  bds:search until rebuild is complete. Will there be problems with this that
  affect any other blazegraph operations?
  
  com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running
  instance of blazegraph?
  https://sourceforge.net/p/bigdata/discussion/676946/thread/18241ccb/?limit=25#945a
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Callahan - 2016-03-24

Thanks! That sounds very promising.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brad Bebee - 2016-03-24
  
  Paul,
  
  Definitely. It will also update Blazegraph to Lucene 5.3.0:
  https://jira.blazegraph.com/browse/BLZG-1328.
  
  Thanks, --Brad
  
  On Thu, Mar 24, 2016 at 3:42 PM, Paul Callahan paulcsyapse@users.sf.net
  wrote:
  
  Thanks! That sounds very promising.
  
  com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running
  instance of blazegraph?
  https://sourceforge.net/p/bigdata/discussion/676946/thread/18241ccb/?limit=25#ebd8
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Callahan - 2016-04-04

The description in https://wiki.blazegraph.com/wiki/index.php/Rebuild_Text_Index_Procedure#Rebuild_Text_Index_Utility looks exactly like what I want, particularly the link in the workbench namespaces page. I just looked in v2.0.1, and it does not seem to be there. When will it be released?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brad Bebee - 2016-04-04
  
  Paul,
  
  Thank you. 2.1.0 has cleared release testing and will be out this week.
  
  Thanks, Brad
  On Apr 4, 2016 10:35 AM, "Paul Callahan" paulcsyapse@users.sf.net wrote:
  
  The description in
  https://wiki.blazegraph.com/wiki/index.php/Rebuild_Text_Index_Procedure#Rebuild_Text_Index_Utility
  looks exactly like what I want, particularly the link in the workbench
  namespaces page. I just looked in v2.0.1, and it does not seem to be there.
  When will it be released?
  
  com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running
  instance of blazegraph?
  https://sourceforge.net/p/bigdata/discussion/676946/thread/18241ccb/?limit=25#93b7
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Callahan - 2016-04-11

I had a chance to try out 2.1, but I get this exception a lot (see below). Lucene StandardAnalyzer is the value I assign to com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.analyzerClass. It happens sometimes (not always) when I use bds:search, but always happens when I try to rebuild the index from the namespace page.

Caused by: java.lang.IllegalStateException: TokenStream contract violation: close() call missing
at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:315)
at org.apache.lucene.analysis.standard.StandardAnalyzer$1.setReader(StandardAnalyzer.java:110)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:143)
at com.bigdata.search.FullTextIndex.getTokenStream(FullTextIndex.java:883)
at com.bigdata.search.FullTextIndex.index(FullTextIndex.java:825)
at com.bigdata.search.FullTextIndex.tokenize(FullTextIndex.java:1041)
at com.bigdata.search.FullTextIndex._search(FullTextIndex.java:1143)
at com.bigdata.search.FullTextIndex.search(FullTextIndex.java:955)
at com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.getHiterator(SearchServiceFactory.java:531)
at com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:661)
at com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:362)
at com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doBigdataServiceCall(ServiceCallJoin.java:770)
at com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doServiceCall(ServiceCallJoin.java:707)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bryan Thompson - 2016-04-11
  
  Yes. See BLZG-1876. A fix just went through CI (thanks Jeremy!)
  
  Bryan
  On Apr 11, 2016 7:28 PM, "Paul Callahan" paulcsyapse@users.sf.net wrote:
  
  I had a chance to try out 2.1, but I get this exception a lot (see below).
  Lucene StandardAnalyzer is the value I assign to
  com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.analyzerClass. It
  happens sometimes (not always) when I use bds:search, but always happens
  when I try to rebuild the index from the namespace page.
  
  Caused by: java.lang.IllegalStateException: TokenStream contract
  violation: close() call missing
  at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
  at
  org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:315)
  at
  org.apache.lucene.analysis.standard.StandardAnalyzer$1.setReader(StandardAnalyzer.java:110)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:143)
  at com.bigdata.search.FullTextIndex.getTokenStream(FullTextIndex.java:883)
  at com.bigdata.search.FullTextIndex.index(FullTextIndex.java:825)
  at com.bigdata.search.FullTextIndex.tokenize(FullTextIndex.java:1041)
  at com.bigdata.search.FullTextIndex._search(FullTextIndex.java:1143)
  at com.bigdata.search.FullTextIndex.search(FullTextIndex.java:955)
  at
  com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.getHiterator(SearchServiceFactory.java:531)
  at
  com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:661)
  at
  com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:362)
  at
  com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doBigdataServiceCall(ServiceCallJoin.java:770)
  at
  com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doServiceCall(ServiceCallJoin.java:707)
  
  com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running
  instance of blazegraph?
  https://sourceforge.net/p/bigdata/discussion/676946/thread/18241ccb/?limit=25#6f72
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Callahan - 2016-04-13

After adding Jeremy's patch, the reindexer appears to run, but I don't get any results back for bds:search. That's true whether I use the URL (from either the workbench page or curl) or if I run the standalone utility com.bigdata.rdf.store.RebuildTextIndex.

But it does work (I get results from bds:search) if I run my own code with these lines:

final LocalTripleStore tripleStore = (LocalTripleStore) journal.getResourceLocator().locate(namespace, ITx.UNISOLATED); LexiconRelation lexiconRelation = tripleStore.getLexiconRelation(); lexiconRelation.rebuildTextIndex(); tripleStore.commit();

What is the blazegraph 2.1 code doing that's different?

Last edit: Paul Callahan 2016-04-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Callahan - 2016-04-13

I also noticed that this method: lexiconRelation.rebuildTextIndex() has changed in 2.1 code, but it just takes a new argument to force creation of a new index. If I use this code snippet, built against the 2.1 jar, I also get results with bds:search. (But not with the documented index rebuilding tools.)

final LocalTripleStore tripleStore = (LocalTripleStore) journal.getResourceLocator().locate(namespace, ITx.UNISOLATED); LexiconRelation lexiconRelation = tripleStore.getLexiconRelation(); lexiconRelation.rebuildTextIndex(false); tripleStore.commit();

Last edit: Paul Callahan 2016-04-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running...

Fast, scalable, robust graph database platform

Forums

Help

com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running instance of blazegraph?

Thanks! That sounds very promising.

com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running...

Fast, scalable, robust graph database platform

Forums

Help

com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running instance of blazegraph? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Thanks! That sounds very promising.

com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running instance of blazegraph?