Menu

com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running instance of blazegraph?

Help
2016-03-24
2016-04-13
  • Paul Callahan

    Paul Callahan - 2016-03-24

    We need to build some text search indexes on an existing journal (blazegraph 1.5.3, but eventually on 2 and above) after updating some Lucene properties in the journal to make more of the text searchable. I have figured out how to do this with a Java utility that runs on a journal that has been taken offline using LexiconRelation.rebuildTextIndex().

    The offline solution will probably suffice but I'm curious if it is possible to rebuild the index while blazegraph is running. The standalone utility won't run when the journal is in use, but maybe it could be run in a separate thread in the blazegraph instance. Assume we won't be using bds:search until rebuild is complete. Will there be problems with this that affect any other blazegraph operations?

     
    • Bryan Thompson

      Bryan Thompson - 2016-03-24

      Paul,

      Support is being introduced in the next release for reindexing based on the
      same code that you are using. This is being done to support a data
      migration required for updated lucene dependencies. Brad can provide you
      with some more information about this procedure, but the documentation for
      it should be up on the wiki soon (if it is not already).

      Thanks,
      Bryan


      Bryan Thompson
      Chief Scientist & Founder
      Blazegraph
      e: bryan@blazegraph.com
      w: http://blazegraph.com

      Blazegraph products help to solve the Graph Cache Thrash to achieve large
      scale processing for graph and predictive analytics. Blazegraph is the
      creator of the industry’s first GPU-accelerated high-performance database
      for large graphs, has been named as one of the “10 Companies and
      Technologies to Watch in 2016” http://insideanalysis.com/2016/01/20535/.

      Blazegraph Database https://www.blazegraph.com/ is our ultra-high
      performance graph database that supports both RDF/SPARQL and
      Tinkerpop/Blueprints APIs. Blazegraph GPU
      https://www.blazegraph.com/product/gpu-accelerated/ andBlazegraph DAS
      https://www.blazegraph.com/product/gpu-accelerated/L are disruptive new
      technologies that use GPUs to enable extreme scaling that is thousands of
      times faster and 40 times more affordable than CPU-based solutions.

      CONFIDENTIALITY NOTICE: This email and its contents and attachments are
      for the sole use of the intended recipient(s) and are confidential or
      proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use,
      disclosure, dissemination or copying of this email or its contents or
      attachments is prohibited. If you have received this communication in
      error, please notify the sender by reply email and permanently delete all
      copies of the email and its contents and attachments.

      On Thu, Mar 24, 2016 at 1:26 PM, Paul Callahan paulcsyapse@users.sf.net
      wrote:

      We need to build some text search indexes on an existing journal
      (blazegraph 1.5.3, but eventually on 2 and above) after updating some
      Lucene properties in the journal to make more of the text searchable. I
      have figured out how to do this with a Java utility that runs on a journal
      that has been taken offline using LexiconRelation.rebuildTextIndex().

      The offline solution will probably suffice but I'm curious if it is
      possible to rebuild the index while blazegraph is running. The standalone
      utility won't run when the journal is in use, but maybe it could be run in
      a separate thread in the blazegraph instance. Assume we won't be using
      bds:search until rebuild is complete. Will there be problems with this that
      affect any other blazegraph operations?


      com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running
      instance of blazegraph?
      https://sourceforge.net/p/bigdata/discussion/676946/thread/18241ccb/?limit=25#945a


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Paul Callahan

    Paul Callahan - 2016-04-11

    I had a chance to try out 2.1, but I get this exception a lot (see below). Lucene StandardAnalyzer is the value I assign to com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.analyzerClass. It happens sometimes (not always) when I use bds:search, but always happens when I try to rebuild the index from the namespace page.

    Caused by: java.lang.IllegalStateException: TokenStream contract violation: close() call missing
    at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
    at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:315)
    at org.apache.lucene.analysis.standard.StandardAnalyzer$1.setReader(StandardAnalyzer.java:110)
    at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:143)
    at com.bigdata.search.FullTextIndex.getTokenStream(FullTextIndex.java:883)
    at com.bigdata.search.FullTextIndex.index(FullTextIndex.java:825)
    at com.bigdata.search.FullTextIndex.tokenize(FullTextIndex.java:1041)
    at com.bigdata.search.FullTextIndex._search(FullTextIndex.java:1143)
    at com.bigdata.search.FullTextIndex.search(FullTextIndex.java:955)
    at com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.getHiterator(SearchServiceFactory.java:531)
    at com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:661)
    at com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:362)
    at com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doBigdataServiceCall(ServiceCallJoin.java:770)
    at com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doServiceCall(ServiceCallJoin.java:707)

     
    • Bryan Thompson

      Bryan Thompson - 2016-04-11

      Yes. See BLZG-1876. A fix just went through CI (thanks Jeremy!)

      Bryan
      On Apr 11, 2016 7:28 PM, "Paul Callahan" paulcsyapse@users.sf.net wrote:

      I had a chance to try out 2.1, but I get this exception a lot (see below).
      Lucene StandardAnalyzer is the value I assign to
      com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.analyzerClass. It
      happens sometimes (not always) when I use bds:search, but always happens
      when I try to rebuild the index from the namespace page.

      Caused by: java.lang.IllegalStateException: TokenStream contract
      violation: close() call missing
      at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
      at
      org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:315)
      at
      org.apache.lucene.analysis.standard.StandardAnalyzer$1.setReader(StandardAnalyzer.java:110)
      at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:143)
      at com.bigdata.search.FullTextIndex.getTokenStream(FullTextIndex.java:883)
      at com.bigdata.search.FullTextIndex.index(FullTextIndex.java:825)
      at com.bigdata.search.FullTextIndex.tokenize(FullTextIndex.java:1041)
      at com.bigdata.search.FullTextIndex._search(FullTextIndex.java:1143)
      at com.bigdata.search.FullTextIndex.search(FullTextIndex.java:955)
      at
      com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.getHiterator(SearchServiceFactory.java:531)
      at
      com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:661)
      at
      com.bigdata.rdf.sparql.ast.eval.SearchServiceFactory$SearchCall.call(SearchServiceFactory.java:362)
      at
      com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doBigdataServiceCall(ServiceCallJoin.java:770)
      at
      com.bigdata.bop.controller.ServiceCallJoin$ChunkTask$ServiceCallTask.doServiceCall(ServiceCallJoin.java:707)


      com.bigdata.rdf.lexicon.LexiconRelation.rebuildTextIndex() on running
      instance of blazegraph?
      https://sourceforge.net/p/bigdata/discussion/676946/thread/18241ccb/?limit=25#6f72


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Paul Callahan

    Paul Callahan - 2016-04-13

    After adding Jeremy's patch, the reindexer appears to run, but I don't get any results back for bds:search. That's true whether I use the URL (from either the workbench page or curl) or if I run the standalone utility com.bigdata.rdf.store.RebuildTextIndex.

    But it does work (I get results from bds:search) if I run my own code with these lines:

            final LocalTripleStore tripleStore = (LocalTripleStore) journal.getResourceLocator().locate(namespace,
                    ITx.UNISOLATED);
            LexiconRelation lexiconRelation = tripleStore.getLexiconRelation();
            lexiconRelation.rebuildTextIndex();
            tripleStore.commit();
    

    What is the blazegraph 2.1 code doing that's different?

     

    Last edit: Paul Callahan 2016-04-13
  • Paul Callahan

    Paul Callahan - 2016-04-13

    I also noticed that this method: lexiconRelation.rebuildTextIndex() has changed in 2.1 code, but it just takes a new argument to force creation of a new index. If I use this code snippet, built against the 2.1 jar, I also get results with bds:search. (But not with the documented index rebuilding tools.)

    final LocalTripleStore tripleStore = (LocalTripleStore) journal.getResourceLocator().locate(namespace,
            ITx.UNISOLATED);
    LexiconRelation lexiconRelation = tripleStore.getLexiconRelation();
    lexiconRelation.rebuildTextIndex(false);
    tripleStore.commit();
    
     

    Last edit: Paul Callahan 2016-04-13

Log in to post a comment.