Copy-pasting a thoughtful email posted to dbpedia-discussion by Lee Humphreys on 22 Oct 2007:
I would like to follow up on a point raised by Chris Richard and discussed by Richard Cyganiak on 2007-08-05; namely - how should Dbpedia deal with disambiguation pages?
Recall that Wikipedia uses disambiguation pages to allow users to navigate from ambigous query terms (like en "bank") to the appropriate article e.g. concerning financial instutions, river parts, sea beds or whatever. Usually the first link on the disambiguation page is to what is thought to be the most common sense of the term.
In Dbpedia disambiguation pages are treated like any other article. For the "bank" disambiguation page, the article-label triple is:
<http://dbpedia.org/resource/Bank_%28disambiguation%29> <http://www.w3.org/2000/01/rdf-schema#label> "Bank (disambiguation)"@en .
The links to the appropriate articles (senses of "bank") in the Wikipedia disambiguation page are available as Dbpedia pagelinks e.g.
# financial institution
<http://dbpedia.org/resource/Bank_%28disambiguation%29> <http://dbpedia.org/property/wikilink> <http://dbpedia.org/resource/Bank> .
# sea floor
<http://dbpedia.org/resource/Bank_%28disambiguation%29> <http://dbpedia.org/property/wikilink> <http://dbpedia.org/resource/Bank_%28sea_floor%29> .
I am interested in using DBpedia in applications which analyse text and try to identify which concepts are referenced. Suppose I have a sentence which contains the word "bank" and that I have identified it as a noun. I want to be able to look up "bank" in Dbpedia and find all the things this might mean; later on in the analysis I'll try to eliminate senses of "bank" that are inappropriate in the context. This is a classic scenario in language engineering and I am sure I am not alone in considering the use of Dbpedia for this sort of thing.
However, the only resource in DBpedia that has the label "bank" is the financial institution <http://dbpedia.org/resource/Bank>. To find other resources that can be referred to as "bank" I would have to do special processing (e.g somehow transforming "bank" into "Bank (Disambiguation)" and then following through pagelinks to the different articles).
Rather than sorting this out in my own corner, I would much prefer that something be done about this in the Dbpedia deliverables
One option might be to follow the disamb page links and add the ambiguous label to each of the referenced articles. So, for example, we would add a label "Bank" to the resource already labelled "Bank (sea floor)", and so on. This added label should be distinguished in some way e.g. perhaps with skos:altLabel.
<http://dbpedia.org/resource/Bank_%28sea_floor%29> <http://www.w3.org/2000/01/rdf-schema#label> "Bank (sea floor)"@en .
<http://dbpedia.org/resource/Bank_%28sea_floor%29> <http://www.w3.org/2004/02/skos/core#altLabel> "Bank"@en .
This approach does not tell us which was the first resource mentioned on the disambiguation page. But I believe that in general it is this item which has the amiguous term as its label e..g the first item on the bank disambiguation page is the financial institution sense which is labelled "Bank". An application using Dbpedia would just follow rdfs:label (rather than skos:altLabel) to find what Wikipedians consider to be the Ur-sense of the term.
For existing users of Dbpedia there would be no change: the current labels would continue to work exactly as before.
Any solution to disambiguation also needs to be compatible with a solution to redirects (which I believe is on the Dbpedia task list).
In the pagelinks, it would be nice to distinguish the links from disambiguation pages from other more innocent wikilinks. Or perhaps explicity flag resources representing disambiguation pages in some way.
As you may suspect, I would be pretty happy if Wiktionary was also available in RDF and it had systematic links from senses to Dbpedia resources. But I don't suppose that is going to happen this week ...
Lee Humphreys - SPSS - Paris
Logged In: YES
user_id=584620
Originator: YES
Note that finding the links from <Bank> or <Bank_(sea_floor)> to
<Bank_(disambiguation)> is easy, a special template is used
throughout Wikipedia to create these links.
Logged In: YES
user_id=584620
Originator: YES
Comment from Judson (http://en.wikipedia.org/wiki/User:Cohesion):
I don't know if this is obvious or not, but not all disambiguation
pages are in the style: "Foo (disambiguation)". They do all have a
template included though {{disambig}}. Sometimes the disambiguation is
"Bank" if the term is particularly ambiguous :)
Disambiugation Links are extracted using property URI http://dbpedia.org/property/disambiguates
No, it's http://dbpedia.org/ontology/wikiPageDisambiguates