|
From: Peter F. Patel-S. <pfp...@gm...> - 2017-07-14 16:31:49
|
On 07/11/2017 03:10 PM, Sebastian Hellmann wrote: > Hi all, > > DBpedia has always had a firm founding in engineering and we should see the > whole thing as an "information machine". From this perspective the purpose of > the DBpedia Ontology becomes quite clear: > > - Usability for queries - driven by the question, what do our users want to > retrieve? As a user I would like to not retrieve false information. (I realize that if there is false information in DBpedia that comes from Wikipedia there is little that can be done.) However, I don't want the DBpedia ontology to cause extra falsities. This means, for example, that monastery should not be a subclass of building. > - Data transformation and hubbing: subClassOf/subProperty linking to other > ontologies allows to export information into other schemas without much > ontological committment Relying only on links between co-extensional classes is inadequate so using inclusion relationships is better. However, it would be useful to be able to say that a DBpedia class is more general than a class from other ontology, not just that it is more specific. > - Data Quality: in my opinion data quality increases a lot if every person was > to have a birthDate, so this can be defined as a goal. We are at around 60% > right now. Another goal would be to validate the correctness by > cross-referencing data, e.g. to German National Library. So the ontology > should directly produce a system that can help us track what data is missing. > SHACL provides us with a new modeling paradigm and we will make a call for > SHACL specs of DBpedia applications to get concrete input for DBpedia's > ontology. We are still discussing internally where to keep them. GitHub seems > like a good option. I don't see this as a viable goal. Sure, getting more data (out of Wikipedia) is useful, but Wikipedia isn't designed to be complete in this way. Further, some data, including birthdates, is always going to be incomplete. The birthdates of some historical figures are currently unknown and some of these are likely never to be known. Given that this is the case, what should be done instead? My view is that systems like DBpedia should be accepting of incomplete data. Further, they should provide facilities to get as much as possible out of incomplete data. For example, it should be possible to state that some person belongs to the class of people born in the seventeenth century without knowing their exact birthdate. > - minimal foundation: Time, Place, maybe Events and a few top classes should > provide enough structure to build a consistent system. This is verging close to mandating a very simple upper ontology. I'm not against upper ontologies but it is possible to go too far along this route. On the other hand, the top level of the current ontology is a mess. I've toyed with trying to fix this, but the needs to be some guidelines before starting on radical changes to the DBpedia ontology. (Yes, I've also thought about what could be in these guidelines.) > There are many extra > ontologies like LHD, DBTax, Yago, etc. that can all be sorted under these > then. We might even resort (please don't see it as a threat) to SKOS maybe for > roles or build a separate vocabularies, that can be used as mixins, i.e. > <Peter> a dbo:Person ; dbo:occupation dbo:Actor . # dbo:Actor as instance. > This might also be dbr:Actor , i.e. referencing the Wikipedia Article. Given > that Wikipedia is so heterogeneous in its article types, we can help with > building a structure that distinguishes between Individual articles, i.e. > Barrack Obama and Categorial / Role-type articles, i.e. > https://en.wikipedia.org/wiki/Actor . Such information can be taken from text, > which is why we are building up http://wiki.dbpedia.org/textext > > By the way, is there actually any sensible class that can be subclass of > Person? As far as I see, the only essential distinction that lasts lifelong is > fictious/non-fictious . We are thinking about disallowing subclasses of > Person, unless there is a valid concern brought up. How about person born in the seventeenth century, mentioned above? Why disallow subclasses of Person in particular? Is there anything special about Person? Instead there should be a rationale for designating a class as terminal in this way. > Also it seems like we will not be able to handle all exceptions like spouse > being non-functional for a dozen persons. From a practical perspective, if > functionality is consistent for 95% of the data, this might be something we > can live with. Proper documentation of these pitfalls can be given. This is a thorny problem. However, spouse is probably not a good case. There are very many more than a dozen people in the world (and probably quite a few more than a dozen people in Wikipedia) with more than one spouse. The functionality of spouse is an artifact of a particular society (unless you are limiting spouse to that particular society). Perhaps a better example is (biological) father for people. Nearly all people have only one recorded father. Exceptions come from at least two sources - recording both biological father and legal father and mythological beings who have different fathers in different accounts. Nevertheless it is useful to state that father is functional. What then to do about spouse? I'm not sure. > All the best, > Sebastian Peter F. Patel-Schneider Nuance Communications |