Menu

Blazegraph 1.5.2 to Support Hybrid Search using External Solr Indices

While graph databases are a perfect fit for storing and querying structured data, they are not primarily designed to deal efficiently with unstructured data and keyword queries. Therefore, such unstructured data is often kept in dedicated systems that are laid out to tackle the specific challenges for evaluating keyword queries in an efficient way — including advanced techniques such as stemming, TF-IDF indexing, support for complex keyword search requests, scoring, etc.

Graph databases, on the other hand, are about connecting things, so in many scenarios we want to combine the capabilities of structured queries with those of queries against a fulltext index. To give just one simple example, assume we have a structured graph database with data about historical characters and a complementary keyword index over a corpus of historical texts (which may or may not be under our control). Assume we now want to combine structured queries  — asking, e.g., for persons that fall into certain categories such as epochs or countries they lived in  — with historical texts from the index that prominently feature these persons.

blazegraph_by_systap_faviconThe upcoming Blazegraph 1.5.2 release will support such hybrid queries against external Solr fulltext search indices. The fulltext search feature has been implemented as a Blazegraph custom service: using a standard-compliant SPARQL SERVICE call with a reserved service URI, you can now easily combine structured search capabilities over the graph database with information held in an external Solr index.

researchspacelogo_blackonwhite

Blazegraph’s hybrid search capabilities are currently used by the British Museum in the ResearchSpace project, which aims at building a collaborative environment for humanities and cultural heritage research using knowledge representation and Semantic Web technologies.  In this context, Blazegraph’s hybrid search feature supports users in expressing complex search requests for cultural heritage objects. Hybrid SPARQL queries utilizing a Solr index are used to support a semantic autocompletion: As the user types a keyword, hybrid queries are issued in real-time to match keywords against entities in a cultural heritage knowledge graph. Depending on the current context of the search, persons, objects or places are suggested, providing a user friendly means to disambiguate terms as the user types.   If you’re going to be in San Jose for the Smart Data conference, we’re giving a tutorial on the approach.

To illustrate the new hybrid search feature by example, a single SPARQL query like

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax.ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://www.example.com/>
PREFIX fts: <http://www.bigdata.com/rdf/fts#>
SELECT ?person ?kwDoc ?snippet WHERE {
        ?person rdf:type ex:Artist .
        ?person ex:associatedWith ex:TheBlueRider .
        ?person ex:bornIn ex:Germany .
        ?person rdfs:label ?label .
  SERVICE <http://www.bigdata.com/rdf/fts#search> {
        ?kwDoc fts:search ?label .
        ?kwDoc fts:endpoint "http://my.solr.server/solr/select" .
        ?kwDoc fts:params "fl=id,score,snippet" .
        ?kwDoc fts:scoreField "score" .
        ?kwDoc fts:score ?score .
        ?kwDoc fts:snippetField "snippet" .
        ?kwDoc fts:snippet ?snippet .
  }
} ORDER BY ?person ?score

would

  • first extract all persons associated with the group “The Blue Rider” that were born in Germany, then
  • take the label of these persons as search string and send a request against a Solr server, in order to extract a ranked list of articles for the respective persons (including text snippets where these persons are mentioned), next
  • order the results by person and relevance as requested by the ORDER BY, and finally
  • return the identified person URIs (variable ?person, from the graph database), the ID of the keyword index document (variable ?kwDoc, from the fulltext index), and the associated text snippet provided by the keyword index (variable ?snippet).

As the example illustrates, parameterization of the keyword index is made via a reserved, “magic vocabulary”: for instance, within the SERVICE keyword, the object linked through fts:search identifies the search string to be submitted against the keyword index, while fts:endpoint points refers to the address of the Solr server.

Of course, the hybrid search feature is not domain dependent: no matter what data has been loaded into your database and no matter what the keyword index looks like, you can now post hybrid search queries against your data and the external index. The implementation even allows you to query multiple keyword indices within one query and, by the use of SPARQL 1.1 federation, combine this with requests against multiple SPARQL endpoints at a time. The search string can be dynamically extracted from the database (as in the example above, where we bind variable ?label through a structured query and use it as a search string) or can be  a static search string. Even more, nothing prevents you from using more complex Solr keyword search strings using boolean connectives such as AND, OR, or negation: in SPARQL, these complex search strings can be easily composed by the use of BIND in combination with string concatenation. For instance, we may modify the first part of our example as

...
        ?person ex:associatedWith ex:TheBlueRider .
        ?person ex:bornIn ex:Germany .
        ?person rdfs:label ?label .
        BIND(CONCAT("\"", ?label, "\" AND -\"expressionism\"") AS ?search)
        SERVICE <http://www.bigdata.com/rdf/fts#search> {
                ?kwDoc fts:search ?search .
                ...
        }
...

in order to search for keyword index documents mentioning these persons without explicitly mentioning “expressionism” (the “-” in Solr is used to express negation).

If you want to learn more about Blazegraph’s upcoming Solr index support, please check out the documentation in our Wiki.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?    Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin
link

Posted by SourceForge Robot 2015-07-14

Log in to post a comment.