Lukasz Bolikowski wrote:
> I have just started my PhD research on synchronizing knowledge
> between language versions of semantically-annotated Wikipedia
> (suggesting updates and corrections in individual versions based
> on the consensus knowledge). One of the problems is to obtain
> (or create) semantic annotation in a couple of languages, ideally
> the four most popular ones on Wikipedia: English, German, French,
> and Polish.
I don't see any unsolvable problems in the context of SMW here. You just create the same Property or Class again in the other Language and use ie owl:equivalentClass to match the meaning. This leads to implicit translation. Also, because the Properties and Relations are in separate Namespaces the Names won't collide.
The bigger Problem here is, that owl:equivalentClass works in both directions. So when it gets used within the english Wikipedia its logic spans over the translated articles. I think its a design-fault of Mediawiki that there is no centralized translation-store like in Omegawiki. In the ideal case there would be such a store that handles each Relation or Attribute as Entity with a unique ID and maps this ID with a trivial Name in any language. Therefore the backup won't be english, but more generally a ID which translates implicit to a language the user speaks.
SMW uses already IDs internally, so the only problem is how to map different Names to the same ID. Further, I think there should be some kind of merge-tool, which merges two Expressions so they link to the same ID rather than letting users doing stumble manual-merge work.
In Mediawiki versioning of Semantics might get handled like versioning of Templates: just edit all Pages that are using the Attribute/Relation that got renamed - a bot is helpful for this. Redirects are also a possible solution, but a bit dirty.
A more clean solution would be to have a parser that is also able to rewrite the article, so that the Name of the Relation/Attribute used gets changed automatically. Therefore a entry like:
gets changed to
where 548315498 is the ID of the Attribute having a Translation (among others; separated by namespace) called "Color":
(548315498, has_Label, "en:Color")
This solution makes it possible to keep the Article untouched when renaming the Attribute, but comes with the drawback that a number is not intuitive for the users, so the Parser change this back from ID to the current Name when the Article gets edited. Therefore this technique would be transparent for the user, except the Attribute gets deleted, because then the user will see the ID.
> Note that I don't want to agree on a common ontology, just on
> the technical things. I'd like to reproduce the chaos that
> is likely to emerge when each Wikipedia starts the semantic
> annotation on its own. "Aligning" knowledge from multiple
> not-entirely-compatible ontologies is a much more interesting
> research problem.
Therefore I suggest to let Wikipedia use only one big ontology, even
when Wikipedia itself gets separated into serval atomic Wikis. Further
it is a current design-goal to don't let users link to external
Ontologies because of this unsolved Versioning- and Trust-Problems.
Also, a centralized store solves Problems with translations and
Relations that are effecting other articles. I guess solving Versioning
and Trust will be really hard, so I don't want to touch this things. If
you have ideas let us hear.
Finally, I hope this scratch of my kind-of-chaotic toughs is helpful to