Share

OpenLink Virtuoso (Open-Source Edition)

Email Archive: virtuoso-users (read-only)

2006:
Jan
   
Feb
   
Mar
   
Apr
(8)
May
   
Jun
   
Jul
   
Aug
(6)
Sep
   
Oct
(5)
Nov
(3)
Dec
(2)
2007:
Jan
(1)
Feb
(6)
Mar
(35)
Apr
(41)
May
(19)
Jun
(36)
Jul
(56)
Aug
(15)
Sep
(12)
Oct
(41)
Nov
(7)
Dec
(18)
2008:
Jan
(43)
Feb
(99)
Mar
(37)
Apr
(23)
May
(35)
Jun
(45)
Jul
(43)
Aug
(20)
Sep
(56)
Oct
(131)
Nov
(89)
Dec
(97)
2009:
Jan
(78)
Feb
(153)
Mar
(80)
Apr
(92)
May
(129)
Jun
(90)
Jul
(56)
Aug
(59)
Sep
(154)
Oct
(116)
Nov
(63)
Dec
   
From: Kjetil Kjernsmo <Kjetil.Kjernsmo@co...> - 2008-09-02 11:23
Dear all,

I'm finally getting around to try out Virtuoso. I'm sure many of you know it
allready, but for those who don't, I'm now working for a consultancy in
Norway called Computas, 100% on Semantic Web technologies. Previously, I was
at Opera, and my interest in RDF goes back more than 10 years.

So, to the problem: We are developing a site where SPARQL is used as the
backend query language. One of the features for end users is to search by a
simple freetext string (google type searches). Say that we have the graph

<http://example.org/resource/foobar> dct:title "Foobar" ;
dct:subject <http://example.org/topic/dahut> .
<http://example.org/topic/dahut> skos:prefLabel "Dahut" .

Now, I'd like to DESCRIBE the <http://example.org/resource/foobar> resource,
by freetext matching. The simplest case is straightforward

DESCRIBE ?resource WHERE {
?resource dct:title ?free .
?free bif:contains "Foo*" .
}

The next case is more complex, since there is a single search field, we don't
know which predicate that holds the term we are searching for, and so, we
need to search all literals, e.g.:

DESCRIBE ?resource WHERE {
?resource dct:title ?free ;
dct:subject ?var .
?var skos:prefLabel ?free .
?free bif:contains "Da*" .
}

Note that I'm now searching for the "Da"-prefix, and I suppose that this would
also return a description of <http://example.org/resource/foobar> ?

Now to the real problem:

How about if the end-user searches for "Foo* and Da*"?

In this case, the string will not match any of the literals, but it will match
the combination of the literals, and it is the latter the end user is
interested in.

Up to now, we've been running Jena, and the way we have solved this problem is
to concatenate all literals, create a "sub:literals" property, and then
freetext search this property. It works, but queries that returns a few
hundred entries usually takes around 40 seconds.

Has Virtuoso any solution to this problem?

I think I would prefer a solution that made the SPARQL very simple, like

DESCRIBE ?resource WHERE {
?resource bi:any-contains "Foo* and Da*" .
}

Not that the index not only needs to index the literals linked to the subject
I'm interested in here, but also to literals linked to subjects that is an
object of this resource... Therefore, it would be nice to have the
possibility of defining a pattern for this index.

Kind regards

Kjetil Kjernsmo
--
Senior Knowledge Engineer
Direct: +47 6783 1136 | Mobile: +47 986 48 234
Email: kjetil.kjernsmo@co...
Web: http://www.computas.com/

|  SHARE YOUR KNOWLEDGE  |

Computas AS Vollsveien 9, PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 |
Fax:+47 6783 1001


    From: Ivan Mikhailov <imikhailov@op...> - 2008-09-02 12:22
    Hello Kjetil,

    For some characters, placing quoted phrase instead of a word is enough.
    Say, When "Da~!@" is in quotes inside FT expression, like

    ?free bif:contains "'Da~!@'"
    or
    ?free bif:contains "'Da~!@' and 'Foo*' and not 'Foo Bar' "

    the free-text expression parser will extract and normalize words from
    phrase and get a single DA word, "~!@" will be treated as garbage,
    somewhat like whitespace. At the same time, astericks and quotes and
    backslashes are supposed to be handled by an application. One can not
    expect that text from user's input can be placed into query verbatim.

    Re.
    DESCRIBE ?resource WHERE {
    ?resource dct:title ?free ;
    dct:subject ?var .
    ?var skos:prefLabel ?free .
    ?free bif:contains "Da*" .
    },

    there are two errors. First, the query will ignore all resources whose
    dct:title is not equal to skos:prefLabel, that is probably not what do
    you want. So you need two variables.

    DESCRIBE ?resource WHERE {
    ?resource dct:title ?free1 ;
    dct:subject ?var .
    ?var skos:prefLabel ?free2 .
    ?free2 bif:contains "Da*" .
    },

    that is also not what do you want.

    DESCRIBE ?resource WHERE {
    { ?resource dct:title ?free .
    ?free bif:contains "Da*" }
    UNION
    { ?resource dct:subject ?var .
    ?var skos:prefLabel ?free .
    ?free bif:contains "Da*" .
    }
    }

    is probably somewhat better.


    The extension like

    DESCRIBE ?resource WHERE {
    ?resource bi:any-contains "Foo* and Da*" .
    }

    is not very convenient for mixed storage, so I'm not sure it will ever
    appear in the schedule. It may be more practical to materialize some
    view and create either separate graph for the materialization or a
    separate predicate like sub:literals. Depending on needs of an
    application, the view may store concatenations of all literals of a
    subject or a list of short items or a list of concatenated literals that
    are grouped by some criteria.


    Best Regards,

    Ivan Mikhailov,
    OpenLink Software.

    From: Kjetil Kjernsmo <Kjetil.Kjernsmo@co...> - 2008-09-02 14:22
    Hi Ivan,

    Thanks for the response!

    On Tuesday 02 September 2008 14:23:05 you wrote:
    > One can not
    > expect that text from user's input can be placed into query verbatim.

    Yeah, we have a pre-processing level that takes care of that.

    > Re.
    > DESCRIBE ?resource WHERE {
    >         ?resource dct:title ?free ;
    >                 dct:subject ?var .
    >         ?var skos:prefLabel ?free .     
    >         ?free bif:contains "Da*" .
    > },
    >
    > there are two errors.

    Ah, yes, it was just an example. In practise, we have some OPTIONALs and a
    FILTER bound( thing that takes care of all these things.

    > The extension like
    >
    > DESCRIBE ?resource WHERE {
    > ?resource bi:any-contains "Foo* and Da*" .
    > }
    >
    > is not very convenient for mixed storage,

    Yes, I know, this is a very search-engine-y thing, not at all a database-y
    thing, but nonetheless a requirement...


    > so I'm not sure it will ever
    > appear in the schedule. It may be more practical to materialize some
    > view and create either separate graph for the materialization or a
    > separate predicate like sub:literals. Depending on needs of an
    > application, the view may store concatenations of all literals of a
    > subject or a list of short items or a list of concatenated literals that
    > are grouped by some criteria.

    Right! It requires more work, but OK.

    I suspect that part of the performance problem we now have, which may not be
    solved by simply using Virtuoso, is that the cost of serializing and
    transferring the result set is too high. In that case, I was wondering if I
    could somehow exclude the sub:literals predicate from the DESCRIBE result?

    Kind regards

    Kjetil Kjernsmo
    --
    Senior Knowledge Engineer
    Direct: +47 6783 1136 | Mobile: +47 986 48 234
    Email: kjetil.kjernsmo@co...
    Web: http://www.computas.com/

    |  SHARE YOUR KNOWLEDGE  |

    Computas AS Vollsveien 9, PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 |
    Fax:+47 6783 1001

    From: Ivan Mikhailov <imikhailov@op...> - 2008-09-02 17:12
    Kjetil,

    > > Re.
    > > DESCRIBE ?resource WHERE {
    > > ?resource dct:title ?free ;
    > > dct:subject ?var .
    > > ?var skos:prefLabel ?free .
    > > ?free bif:contains "Da*" .
    > > },
    > >
    > > there are two errors.
    >
    > Ah, yes, it was just an example. In practise, we have some OPTIONALs and a
    > FILTER bound( thing that takes care of all these things.

    OK. Just remember that bif:contains will work only with triple in same
    group at same level such that "subject" of bif:contains is object of
    that triple. So it is impossible to bind a variable in one place and
    make free-text search in other. Say, the following will report an error:

    DESCRIBE ?resource WHERE {
    { ?resource dct:title ?free .
    }
    UNION
    { ?resource dct:subject ?var .
    ?var skos:prefLabel ?free .
    }
    ?free bif:contains "Da*" .
    }

    That is because free-text index is areal index that should operate on
    real table alias, not on something "derived" like union or subquery.

    > I was wondering if I
    > could somehow exclude the sub:literals predicate from the DESCRIBE result?

    It's a problem of SPARQL spec. There's no syntax for configuration
    options; I've failed to push that idea into initial version of the W3C
    spec. We shall see if the second attempt will be better. If not, then we
    will violate the spec (and create one more interop issue) -- "the best
    critique is sabotage".


    Best Regards,

    Ivan Mikhailov
    OpenLink Software
    http://virtuoso.openlinksw.com

    From: Drew Perttula <drewp@bi...> - 2008-09-03 06:00
    Kjetil Kjernsmo wrote:
    > Up to now, we've been running Jena, and the way we have solved this problem is
    > to concatenate all literals, create a "sub:literals" property, and then
    > freetext search this property. It works, but queries that returns a few
    > hundred entries usually takes around 40 seconds.

    Hi, you're obviously ahead of me in testing the RDF stores out there, so
    would you mind describing why you're looking at virtuoso instead of jena
    now? I gather it's mostly for speed-- are you seeing good speedups for
    the queries that you -have- figured out how to run?

    thanks-
    drew

    From: Kingsley Idehen <kidehen@op...> - 2008-09-03 11:50
    Drew Perttula wrote:
    > Kjetil Kjernsmo wrote:
    >
    >> Up to now, we've been running Jena, and the way we have solved this problem is
    >> to concatenate all literals, create a "sub:literals" property, and then
    >> freetext search this property. It works, but queries that returns a few
    >> hundred entries usually takes around 40 seconds.
    >>
    >
    > Hi, you're obviously ahead of me in testing the RDF stores out there, so
    > would you mind describing why you're looking at virtuoso instead of jena
    > now? I gather it's mostly for speed-- are you seeing good speedups for
    > the queries that you -have- figured out how to run?
    >
    > thanks-
    > drew
    >
    > -------------------------------------------------------------------------
    > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
    > Build the coolest Linux based applications with Moblin SDK & win great prizes
    > Grand prize is a trip for two to an Open Source event anywhere in the world
    > http://moblin-contest.org/redirect.php?banner_id=100&url=/
    > _______________________________________________
    > Virtuoso-users mailing list
    > Virtuoso-users@li...
    > https://lists.sourceforge.net/lists/listinfo/virtuoso-users
    >
    >
    Drew,


    Please understand that Virttuoso and Jena aren't mutually exclusive
    things. Jena is a Framework and Virtuoso is a Quad Store. Unfortunately,
    lines of demarcation between Framework components haven't always been
    clear re. Jena, Sesame, and Redland, which has created a fair degree of
    confusion.

    Virtuoso fundamentally provides Jena developers with a high-performance
    and scalable native graph model storage engine via the Virtuoso Storage
    Provider for Jena [1].

    Links:

    1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtJenaProvider
    2.
    http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFDataProviders

    --


    Regards,

    Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
    President & CEO
    OpenLink Software Web: http://www.openlinksw.com

    From: Kjetil Kjernsmo <Kjetil.Kjernsmo@co...> - 2008-09-03 15:45
    On Tuesday 02 September 2008 19:12:45 Ivan Mikhailov wrote:
    > > Ah, yes, it was just an example. In practise, we have some OPTIONALs and
    > > a FILTER bound( thing that takes care of all these things.
    >
    > OK. Just remember that bif:contains will work only with triple in same
    > group at same level such that "subject" of bif:contains is object of
    > that triple. So it is impossible to bind a variable in one place and
    > make free-text search in other. Say, the following will report an error:
    >
    > DESCRIBE ?resource WHERE {
    >   {     ?resource dct:title ?free .
    >   }
    >   UNION
    >   {     ?resource dct:subject ?var .
    >         ?var skos:prefLabel ?free .    
    >   }
    >   ?free bif:contains "Da*" .
    > }
    >
    > That is because free-text index is areal index that should operate on
    > real table alias, not on something "derived" like union or subquery.

    Oh, OK, but

    DESCRIBE ?resource WHERE {
    { ?resource dct:title ?free1 . ?free1 bif:contains "Da*" .
    }
    UNION
    { ?resource dct:subject ?var .
    ?var skos:prefLabel ?free2 .
    ?free2 bif:contains "Da*" .
    }
    }

    will work?

    > >  I was wondering if I
    > > could somehow exclude the sub:literals predicate from the DESCRIBE
    > > result?  
    >
    > It's a problem of SPARQL spec. There's no syntax for configuration
    > options; I've failed to push that idea into initial version of the W3C
    > spec. We shall see if the second attempt will be better. If not, then we
    > will violate the spec (and create one more interop issue) -- "the best
    > critique is sabotage".

    Yeah, I know, but I certainly prefer to keep the conflict level low and to
    rough consensus in the standards process.

    It seems to me that a solution to this problem would be to allow matching
    graphs that are not part of the result set, e.g.

    DESCRIBE ?foo FROM </foo> WHERE GRAPH </bar> { ?foo dct:title "Foo" . } or
    something like that...



    Kind regards

    Kjetil Kjernsmo
    --
    Senior Knowledge Engineer
    Direct: +47 6783 1136 | Mobile: +47 986 48 234
    Email: kjetil.kjernsmo@co...
    Web: http://www.computas.com/

    |  SHARE YOUR KNOWLEDGE  |

    Computas AS Vollsveien 9, PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 |
    Fax:+47 6783 1001

    From: Kjetil Kjernsmo <Kjetil.Kjernsmo@co...> - 2008-09-03 15:45
    On Wednesday 03 September 2008 08:01:03 you wrote:
    > Hi, you're obviously ahead of me in testing the RDF stores out there, so
    > would you mind describing why you're looking at virtuoso instead of jena
    > now? I gather it's mostly for speed--

    Indeed!

    > are you seeing good speedups for
    > the queries that you -have- figured out how to run?

    I haven't actually run any queries on Virtuoso yet, and basically, I have been
    just a victim of massive advertising from Kingsley and Ted ;-)

    Most queries are not problematic in Jena. We have found, however, that the
    order of the statements of the pattern matters a lot in Jena, one must
    always, manually see to that the most restrictive terms go first, and that is
    a hard problem to solve when the queries are generated rather than
    hand-written. I don't know if Openlink has done anything to solve this
    problem, or if there are other things that could help in that respect, but
    I'm here to find out.

    The other problem, which is now urgent to us, is that "simple search engine"
    problem. Since we are stuffing all data into the model several times to have
    it indexed, the length of the literals seem to cause us trouble. We are
    working on several fronts with this problem, one solution involves Virtuoso,
    some does not.

    We are using Jena as a SPARQL Endpoint, but we are also using SPARQL Update.
    With the current architecture, it should be straightforward to switch back
    and forth, and that's what we intend to do.

    Kind regards

    Kjetil Kjernsmo
    --
    Senior Knowledge Engineer
    Direct: +47 6783 1136 | Mobile: +47 986 48 234
    Email: kjetil.kjernsmo@co...
    Web: http://www.computas.com/

    |  SHARE YOUR KNOWLEDGE  |

    Computas AS Vollsveien 9, PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 |
    Fax:+47 6783 1001

    From: Ivan Mikhailov <imikhailov@op...> - 2008-09-03 16:49
    Hello Kjetil,


    > DESCRIBE ?resource WHERE {
    > { ?resource dct:title ?free1 . ?free1 bif:contains "Da*" .
    > }
    > UNION
    > { ?resource dct:subject ?var .
    > ?var skos:prefLabel ?free2 .
    > ?free2 bif:contains "Da*" .
    > }
    > }
    >
    > will work?

    Yes, of course, because each of bif:contains has appropriate triple
    pattern in front of it.



    >
    > > > I was wondering if I
    > > > could somehow exclude the sub:literals predicate from the DESCRIBE
    > > > result?
    > >
    > > It's a problem of SPARQL spec. There's no syntax for configuration
    > > options
    >
    > It seems to me that a solution to this problem would be to allow matching
    > graphs that are not part of the result set, e.g.
    >
    > DESCRIBE ?foo FROM </foo> WHERE GRAPH </bar> { ?foo dct:title "Foo" . } or
    > something like that...

    Unfortunately, this trick will not work when graphs are not known in
    advance, because it is impossible to specify "from everything except
    </bar>". Otherwise it's OK.

    Best Regards,

    Ivan Mikhailov
    OpenLink Software
    http://virtuoso.openlinksw.com