On Friday 10 February 2006 20:45, Rob Lanphier wrote:
> Hi Markus,
> Thanks for the detailed reply!
Well, I apologize in advance for another, even more detailed reply ;-)
> My short-term interest in Semantic MediaWiki is for souped-up
> categories. I'm really not planning on exploiting the full power of
> semantic markup in the first deployment, but using it as an extensible
> mechanism to describe an article hierarchy, where an article can both be
> an entity onto itself, /and/ have child nodes. It's not all that
> different from categories in the base MediaWiki, except that categories
> have the annoying characteristic that they have to be designated as a
> category (by namespace) in order to become a parent article.
Well, this can certainly be realized without problems. For such a special=20
purpose, one could even "hide" the added syntax in templates, so that users=
hardly notice a difference.
Note, however, that the "annoying characteristic" of MediaWiki's categories=
typical for many knowledge representation formalisms, and for many systems=
that implement them. So there actually are some arguments in favor of this.=
=46or example, MediaWiki-categories can simply be viewed as sets of article=
while this is not admissible if you have a less restricted setting. Indeed,=
if you waive all restrictions, then you can have category articles that=20
contain themselves -- this might be confusing to users and requires specifi=
tool support to be handled (it basically corresponds to the RDFS model of=20
categories, which is more complex than the "categories are sets of=20
categorized things" viewpoint).=20
Anyway, it is possible to describe generalized categorical schemes -- wheth=
the processing works well or not, and whether the idea bears some confusion=
for uninitialized users probably depends on other aspects as well.
> I realize this is a tiny subset of what is possible with Semantic
> MediaWiki, but I think it's good that we create a system that can grow
> into something much cooler than a simple hierarchy.
Sure. We do not plan to enforce all-purpose (uppercase ;-) Semantics on eve=
usage scenario. We could even include options to switch off some features=20
that some wikis do not need/like. Choosing the right measure of technical=20
power and added complexity is an important issue.
> The work that I'm doing is on behalf of ManyOne. You can see more of
> what they are up to here:
Ah, interesting. I will need to have a second look, though, to see how wiki=
come into the picture.
> More below....
> On Fri, 2006-02-10 at 12:49 +0100, Markus Kr=C3=B6tzsch wrote:
> > There will be a content-level export (statements about the subject of
> > articles -- basically the new annotations that we allow in wikitexts),
> > but also meta-level export (information about the articles themselves:
> > authors, license, namespace, permanent link, previous version, ...). So
> > the exports provide some more information than is currently found in our
> > triple table. RDF will, of course, also use fully qualified URIs to
> > describe ressources unambiguously. On the other hand, there are very
> > efficient systems for working with RDF-data around, so we hope to
> > actually gain efficiency by providing this (somewhat verbose) output.
> Interesting. I think this may address my initial concerns, but let me
> spell them out a bit more.
> The nice part of a fully-qualified URL is that you don't have to
> distinguish between an external page and an internal page. The bad part
> about that is that for some purposes, you /can't/ distinguish between an
> external page and an internal page.
We chose this because we actually want to export the data from wikis to oth=
sites, which then actually would see the data as external pages. Inside the=
wiki, we are currently only annotating internal pages (though this is not a=
principle restriction -- describing relations to external ressources is jus=
not implemented at the moment). So the distinction is not problematic for u=
Adding information about external sources creates some problems. Especially=
it is often not clear what we actually mean when talking about some externa=
URL. Do we refer to the html-document at this location, or rather to the ma=
subject of its content? Inside the wiki, we allow people to add data about=
the subject of an article (e.g. "San Diego" "is located in" "USA"). This is=
quite unambiguous, since the title of each article says what the article is=
about, and ambiguity is typically resolved by creating multiple articles.=20
Outside the wiki, there is no such mechanism for clarifying the meaning of =
Strictly speaking, one can thus only regard external URLs as html-documents=
and make statements about these documents (author, page rank, subject, ...)=
This can be helpful to create a kind of "annotated bookmark list," but has=
not been our initial motivation of trying to make Wikipedia's content=20
> Where this manifests itself is when you attempt to use the triples in a
> SQL join. I'd like to be able to have a query like this:
> SELECT page_id, subject, relation, object
> FROM `semantic_relations`, `page`
> WHERE subject =3D page_title
> ...which gets me a table of all subject pages on my wiki, with their
> I imagine that if I tried hard enough, I'd be able to figure out some
> syntax with concat to pull this off. But it would likely be convoluted
> and inefficient to do that.
OK, I see your point. Actually, we did not at all integrate our storage for=
with MediaWiki's database layout. This is because we do not view the data t=
be about articles or documents in a database, but about the concepts that=20
articles talk about. And there seemed to be little need to find out the=20
wiki's internal article ID of the city of San Diego. Originally, we intende=
to replace the current internal database table with some independent storag=
system right away. So the data would not even have resided in the same=20
Do you think that it is important to join the conceptual data we collect wi=
the internal article management tables of MediaWiki? It is quite possible=20
that there are use cases for this, but we did not yet encounter any. Maybe=
one then should intergrate the relevant data into some other part of=20
MediaWiki's database right away, to obtain a clean solution. Another way to=
combine article data with concept data is to combine the exports that we wi=
soon provide for both. Of course, this is not giving you optimal efficiency=
if you just wanted some SQL joins over two tables.
> MediaWiki handles links by linking the article number of the subject to
> the article name of the target (in page_links). For MediaWiki 1.6,
> they're planning on adding a new mw_externallinks. While it may seem a
> bit inelegant to have two different tables for what is essentially the
> same info, I think it's the right decision. It allows for efficient
> queries of the type I described above.
Yes, it seems to be a nice step to have external links accessible via some=
search. Semantic extensions of this search should then follow too, I guess.
I should also say that we are not that much concerned about efficiency at t=
moment, since no one really knows how many pieces of which semantic data we=
have to expect in a wiki of Wikipedia's size. We intend to adjust to higher=
requirements as they arise.
> So, are there plans to move toward something a little closer to the base
> MediaWiki way of storing links, or is there a strong desire not to
> distinguish between internal and external relationships in the db?
We clearly have no strong desires in this matter -- we recompile informatio=
for export anyway. Internally we can use whatever is convenient and elegant=
We just don't have external relationships, mostly because we did not have a=
real use case for it yet. Do you have a concrete scenario where annotating=
external URLs is strongly required? In Wikipedia, most external links are=20
just references for further reading, and it did not seem very pressing to=20
annotate them (maybe some special annotations such as "has homepage" could =
helpful, but what else?).
Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe
mak@... phone +49 (0)721 608 7362
http://www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717