Re: [DBpedia-discussion] Purpose of the DBpedia Ontology was Re: Call for Ontology Editor demos for

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 07/11/2017 03:10 PM, Sebastian Hellmann wrote:
> Hi all,
> 
> DBpedia has always had a firm founding in engineering and we should see the
> whole thing as an "information machine". From this perspective the purpose of
> the DBpedia Ontology becomes quite clear:
> 
> - Usability for queries - driven by the question, what do our users want to
> retrieve?

As a user I would like to not retrieve false information.  (I realize that if
there is false information in DBpedia that comes from Wikipedia there is
little that can be done.)  However, I don't want the DBpedia ontology to cause
extra falsities.   This means, for example, that monastery should not be a
subclass of building.

> - Data transformation and hubbing: subClassOf/subProperty linking to other
> ontologies allows to export information into other schemas without much
> ontological committment

Relying only on links between co-extensional classes is inadequate so using
inclusion relationships is better.  However, it would be useful to be able to
say that a DBpedia class is more general than a class from other ontology, not
just that it is more specific.

> - Data Quality: in my opinion data quality increases a lot if every person was
> to have a birthDate, so this can be defined as a goal. We are at around 60%
> right now. Another goal would be to validate the correctness by
> cross-referencing data, e.g. to German National Library. So the ontology
> should directly produce a system that can help us track what data is missing.
> SHACL provides us with a new modeling paradigm and we will make a call for
> SHACL specs of DBpedia applications to get concrete input for DBpedia's
> ontology. We are still discussing internally where to keep them. GitHub seems
> like a good option.

I don't see this as a viable goal.  Sure, getting more data (out of Wikipedia)
is useful, but Wikipedia isn't designed to be complete in this way.   Further,
some data, including birthdates, is always going to be incomplete.  The
birthdates of some historical figures are currently unknown and some of these
are likely never to be known.

Given that this is the case, what should be done instead?  My view is that
systems like DBpedia should be accepting of incomplete data.  Further, they
should provide facilities to get as much as possible out of incomplete data.
For example, it should be possible to state that some person belongs to the
class of people born in the seventeenth century without knowing their exact
birthdate.

> - minimal foundation: Time, Place, maybe Events and a few top classes should
> provide enough structure to build a consistent system. 

This is verging close to mandating a very simple upper ontology.  I'm not
against upper ontologies but it is possible to go too far along this route.

On the other hand, the top level of the current ontology is a mess.  I've
toyed with trying to fix this, but the needs to be some guidelines before
starting on radical changes to the DBpedia ontology.  (Yes, I've also thought
about what could be in these guidelines.)

> There are many extra
> ontologies like LHD, DBTax, Yago, etc. that can all be sorted under these
> then. We might even resort (please don't see it as a threat) to SKOS maybe for
> roles or build a separate vocabularies, that can be used as mixins, i.e.
> <Peter> a dbo:Person ; dbo:occupation dbo:Actor . # dbo:Actor as instance.
> This might also be dbr:Actor , i.e. referencing the Wikipedia Article. Given
> that Wikipedia is so heterogeneous in its article types, we can help with
> building a structure that distinguishes between Individual articles, i.e.
> Barrack Obama and Categorial / Role-type articles, i.e. 
> https://en.wikipedia.org/wiki/Actor . Such information can be taken from text,
> which is why we are building up http://wiki.dbpedia.org/textext
> 
> By the way, is there actually any sensible class that can be subclass of
> Person? As far as I see, the only essential distinction that lasts lifelong is
> fictious/non-fictious . We are thinking about disallowing subclasses of
> Person, unless there is a valid concern brought up.

How about person born in the seventeenth century, mentioned above?

Why disallow subclasses of Person in particular?  Is there anything special
about Person?   Instead there should be a rationale for designating a class as
terminal in this way.
> Also it seems like we will not be able to handle all exceptions like spouse
> being non-functional for a dozen persons. From a practical perspective, if
> functionality is consistent for 95% of the data, this might be something we
> can live with. Proper documentation of these pitfalls can be given.

This is a thorny problem.  However, spouse is probably not a good case.  There
are very many more than a dozen people in the world (and probably quite a few
more than a dozen people in Wikipedia) with more than one spouse.  The
functionality of spouse is an artifact of a particular society (unless you are
limiting spouse to that particular society).

Perhaps a better example is (biological) father for people.  Nearly all people
have only one recorded father.   Exceptions come from at least two sources -
recording both biological father and legal father and mythological beings who
have different fathers in different accounts.  Nevertheless it is useful to
state that father is functional.

What then to do about spouse?   I'm not sure.

> All the best,
> Sebastian

Peter F. Patel-Schneider
Nuance Communications

Re: [DBpedia-discussion] Purpose of the DBpedia Ontology was Re: Call for Ontology Editor demos for

Re: [DBpedia-discussion] Purpose of the DBpedia Ontology was Re: Call for Ontology Editor demos for DBpedia