From: Olivier D. <oli...@un...> - 2006-05-02 16:38:22
|
Dear semantic mediawiki users, I am confused by what the proper use of categories should be, in the context of a semantically-enabled wiki such as mediawiki. Questions are at the end of the message. From what I read in [1], categories are mainly for indexing purpose and for avoiding to overwhelm the user with a huge number of pages. However, these features are also what a semantic search should do. Let's consider the Gustave Eiffel example [2]. On one hand, it seems sensible to say that he is/was an instance of the Engineer and of the Architect classes. The pages of these two resources indicates that both are rdfs:subClassOf Person. Therefore, any search about engineers, architects, or engineers and architects will return a reference to Eiffel. On the other hand, an approach compatible with non semantically-enabled wikis would consist in stating that the Gustave Eiffel resource belongs to the categories Category:Engineer and Category:Architect, both of which being subcategories of Category:Person. Clearly, there is an overlap between the two approaches. The second one is not compatible with complex queries (e.g. who are the french people being both architects and engineers in the 19th century) so it is not likely to scale-up very well (e.g. why not mentionning that Eiffel's page also belongs to Category:Century19, and so on and so forth). Yet, I am not sure that we should only use the first approch exclusively and use a dummy category for all the semantically-described pages. Q1: is there a widely accepted policy on the use of categories vs. classes that I am not aware of? Q2: is it desirable to maintain both approaches in parallel (with the subsequent risk of inconsistencies)? Q3: would it make sense to have the categories automatically generated from the semantic descriptions? Q4: if answer to Q3 is yes, how do we draw the line between relevant categories and not-so-relevant categories? Thank you, Olivier [1] http://meta.wikimedia.org/wiki/Category [2] http://wiki.ontoworld.org/wiki/Gustave_Eiffel |
From: Denny V. <dv...@ai...> - 2006-05-02 17:18:59
|
Hi Olivier, I don't think there is any widely accepted policy yet. So the following is just my two cents, and also basically what we described in the papers: http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation?publ_id=1055 http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation?publ_id=963 I wouldn't want to use a relation called is-a. Is-a is very problematic, as there are like twothousand possible meanings to it. Instead, if he is an instance of engineer, just keep the Category:Engineer. And the Category:Man, and Category:Architect as well. For the French -- not sure. Either make a relationship "nationality::France" or make it a category as well. I'd go for the first one. There is stuff, which shouldn't be described in a category. Like people born in Edinburgh. And there is stuff, that shouldn't be described in a relation -- like if it has a Male or Female sex. But between there's a wide field, and often both alternatives are possible. We don't know yet how it will turn out. Regarding Q3: the automatic generation of categories is basically description logics. A language like OWL DL offers very powerful features to do exactly this. But we thought that it would be overkill to add this to the system right now. But then again, you can export your RDF from the wiki, put it into a DL reasoner, and make complex descriptions of your categories. An idea we were exploring lately (see our upcoming SemWiki paper). I hope this helps, cheers, denny Olivier Dameron wrote: > Dear semantic mediawiki users, > > I am confused by what the proper use of categories should be, in the context of a semantically-enabled wiki such as mediawiki. Questions are at the end of the message. > > From what I read in [1], categories are mainly for indexing purpose and for avoiding to overwhelm the user with a huge number of pages. However, these features are also what a semantic search should do. > > Let's consider the Gustave Eiffel example [2]. > On one hand, it seems sensible to say that he is/was an instance of the Engineer and of the Architect classes. The pages of these two resources indicates that both are rdfs:subClassOf Person. Therefore, any search about engineers, architects, or engineers and architects will return a reference to Eiffel. > On the other hand, an approach compatible with non semantically-enabled wikis would consist in stating that the Gustave Eiffel resource belongs to the categories Category:Engineer and Category:Architect, both of which being subcategories of Category:Person. > > Clearly, there is an overlap between the two approaches. The second one is not compatible with complex queries (e.g. who are the french people being both architects and engineers in the 19th century) so it is not likely to scale-up very well (e.g. why not mentionning that Eiffel's page also belongs to Category:Century19, and so on and so forth). Yet, I am not sure that we should only use the first approch exclusively and use a dummy category for all the semantically-described pages. > > Q1: is there a widely accepted policy on the use of categories vs. classes that I am not aware of? > Q2: is it desirable to maintain both approaches in parallel (with the subsequent risk of inconsistencies)? > Q3: would it make sense to have the categories automatically generated from the semantic descriptions? > Q4: if answer to Q3 is yes, how do we draw the line between relevant categories and not-so-relevant categories? > > Thank you, > Olivier > > [1] http://meta.wikimedia.org/wiki/Category > [2] http://wiki.ontoworld.org/wiki/Gustave_Eiffel > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Semediawiki-user mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-user |
From: Jakob V. <jak...@ni...> - 2006-05-03 17:42:54
|
Olivier Dameron wrote: > From what I read in [1], categories are mainly for indexing purpose > and for avoiding to overwhelm the user with a huge number of pages. Yes. Categories are for organising and finding.[1] > However, these features are also what a semantic search should do. Not exactely. Semantic Wikipedia is for predicating and answering logic questions (as far as I understand it). > Q1: is there a widely accepted policy on the use of categories vs. > classes that I am not aware of? I can only speak for Wikipedia and there it was a long process that is still going on to find out what categories are. > Q2: is it desirable to maintain both approaches in parallel (with the > subsequent risk of inconsistencies)? No but unavoidable. > Q3: would it make sense to have the categories automatically > generated from the semantic descriptions? > Q4: if answer to Q3 is yes, how do we draw the line between > relevant categories and not-so-relevant categories? I think that you can prove that if there is a simple answer to Q3 and Q4 then you can derive a method to combine collaborative tagging (mostly fuzzy statistics) and semantic web (mostly description logics). Greetings, Jakob [1] <advertising> It's what I call a collaborative Thesaurus. See http://arxiv.org/abs/cs.IR/0604036 for a first analysis of Wikipedia's category system. </advertising> |
From: Denny V. <dv...@ai...> - 2006-05-10 15:40:59
|
Hi Jakob, finally I got around to read your paper. Very interesting indeed. I've seen you submitted it to the WWW -- will you be there? > Not exactely. Semantic Wikipedia is for predicating and answering logic > questions (as far as I understand it). I think there are plenty - if not all - categories in the Wikipedia that can be useful for such querying. Calling it predicating and answering logic questions sounds a bit too narrow (although in facts it is not more than that, but its all just zeros and ones anyway). But you are right that the semantics of Categories in Wikipedia are usually not that of an instantion relation, i.e. a page in the Category:Italy is not an instance of Italies. But in this case we have to take the semantics of the class into account: a page in the Category:Italy is a page with the topic Italy. And now we are fine. >> Q1: is there a widely accepted policy on the use of categories vs. >> classes that I am not aware of? > > I can only speak for Wikipedia and there it was a long process that is > still going on to find out what categories are. The same for a semantic wiki. You have to define it within your project, or rather, the users have to define it collaboratively. Having the possibility to query these categories presupposes a certain definition into the queried category, which should be documented, as to have a higher consistency. Cheers, denny |
From: Olivier D. <oli...@un...> - 2006-05-10 15:57:05
|
On Wed, 10 May 2006 17:40:39 +0200, Denny Vrandecic <dv...@ai...> wrote: > But you are > right that the semantics of Categories in Wikipedia are usually not > that of an instantion relation, i.e. a page in the Category:Italy is > not an instance of Italies. But in this case we have to take the > semantics of the class into account: a page in the Category:Italy is > a page with the topic Italy. And now we are fine. Exactly, so my point is that it is redundant (i.e. prone to errors) to maintain categories while they can be computed dynamically by following the foaf:topic relation (for instance) Olivier |
From: Denny V. <dv...@ai...> - 2006-05-10 16:35:19
|
>> But you are >> right that the semantics of Categories in Wikipedia are usually not >> that of an instantion relation, i.e. a page in the Category:Italy is >> not an instance of Italies. But in this case we have to take the >> semantics of the class into account: a page in the Category:Italy is >> a page with the topic Italy. And now we are fine. > > Exactly, so my point is that it is redundant (i.e. prone to errors) to maintain categories while they can be computed dynamically by following the foaf:topic relation (for instance) Yes, in that case it is. One thing that categories allow that are not possible with relations (i.e. without using a reasoner who can do soemthing like SWRL or datalog) is the sub-category relationship, i.e. if something is in the category Italy, it is somehow related to the category Europe, which is supercategory of Italy. With foaf:topic you would need to make this explicit (or, as said, use a rule like foaf:topic(a,t) & skos:wider(t,s) -> foaf:topic(a,s) ). Treating categories/articles as instantiaions you get this for free from a RDFS or OWL reasoner (or, from Semantic Mediawiki Version 0.4 ;) Cheers, denny |
From: Jakob <jak...@s1...> - 2006-05-11 09:13:28
|
Denny Vrandecic wrote: > finally I got around to read your paper. Very interesting indeed. > I've seen you submitted it to the WWW -- will you be there? I will be in Edinbourgh but I won't be part of the official conference - entrees are just to high for independent researchers like me. > But you are right that the semantics of Categories in Wikipedia are > usually not that of an instantion relation, i.e. a page in the > Category:Italy is not an instance of Italies. But in this case we > have to take the semantics of the class into account: a page in the > Category:Italy is a page with the topic Italy. And now we are fine. Yes and categories are very helpful also to detect semantics. Olivier wrote: > Exactly, so my point is that it is redundant (i.e. prone to errors) > to maintain categories while they can be computed dynamically by > following the foaf:topic relation (for instance) In Wikipedia categories *are* maintained by a large amount of people because they are quite simple and have relatively fuzzy semantics. I don't unterstand where you want to get all the foaf or other relationships from. If you start a new semantic wiki you could encourage users to use more typed links instead of categories but I bet collaborative tagging with categories is easier because you don't have to specify the exact relationship. If you don't know it yet you should have a look at W3C's SKOS working draft - the Simple Knowledge Organisation System RDF vocabulary is designed for thesauri and similar knowledge organisation systems - and that's exactely what Wikipedia categories are as I have pointed out before. Thesauri are used since the 1950s in information retrieval with success - but its a lot of (intellectual) work to maintain them, so fulltext search and other automatic methods are more know, especially in the visible web. >>> Q1: is there a widely accepted policy on the use of categories vs. >>> classes that I am not aware of? >> >> I can only speak for Wikipedia and there it was a long process that >> is still going on to find out what categories are. > > The same for a semantic wiki. You have to define it within your > project, or rather, the users have to define it collaboratively. > Having the possibility to query these categories presupposes a > certain definition into the queried category, which should be > documented, as to have a higher consistency. I hope the possibility to query will also increase consistence because you get more feedback on the consequences you classify articles. Greetings, Jakob |
From: Olivier D. <oli...@un...> - 2006-05-11 09:58:44
|
On Thu, 11 May 2006 11:13:19 +0200, Jakob <jak...@s1...> wrote: > > Exactly, so my point is that it is redundant (i.e. prone to errors) > > to maintain categories while they can be computed dynamically by > > following the foaf:topic relation (for instance) > > In Wikipedia categories *are* maintained by a large amount of people > because they are quite simple and have relatively fuzzy semantics. I > don't unterstand where you want to get all the foaf or other > relationships from. If you start a new semantic wiki you could > encourage users to use more typed links instead of categories but I > bet collaborative tagging with categories is easier because you don't > have to specify the exact relationship. But the point of semantic wikis (at least in my opinion) is exactly to go beyond the fuzzy semantics so that all this material and knowledge could be used by software. Categories are for the ease of use by humans. I simply assumed that some of them can be automatically generated (and should be), while retaining the possibility to manually add others (just like people do now). Practically, a bot could wander into a semantic wiki and add add new categories according to the values of relationships such as foaf:topic or skos:subject Olivier |
From: Jakob V. <jak...@ni...> - 2006-05-11 20:22:19
|
Olivier Dameron wrote: > <jak...@s1...> wrote: > >> In Wikipedia categories *are* maintained by a large amount of >> people because they are quite simple and have relatively fuzzy >> semantics. I don't unterstand where you want to get all the foaf or >> other relationships from. If you start a new semantic wiki you >> could encourage users to use more typed links instead of categories >> but I bet collaborative tagging with categories is easier because >> you don't have to specify the exact relationship. > > But the point of semantic wikis (at least in my opinion) is exactly > to go beyond the fuzzy semantics so that all this material and > knowledge could be used by software. Semantic wikis won't go nowhere if the people won't precede - that's why I still prefer to call Semantic Web a "vision" or "plan" but nothing that will actually happen in the way TBL promised us ;-) As long as people provide the data it will always be fuzzy by some degree[1] > Categories are for the ease of use by humans. I simply assumed that > some of them can be automatically generated (and should be), while > retaining the possibility to manually add others (just like people do > now). Practically, a bot could wander into a semantic wiki and add > add new categories according to the values of relationships such as > foaf:topic or skos:subject Automatic generation of categories, tags, semantic relationships etc. is a very interesting and promising approach but it sound more like data mining. I am very anxious to get some result with comparisions between automatic semantic indexing and manual added relationships. In information retrieval indexing consistency is usually higher with automatic methods but on a lower level of indexing precision. Greetings, Jakob [1] See this article on experiences collecting metadata from different data providers. I anticipate similar problems when semantic services begin to grow: http://www.cdlib.org/inside/projects/harvesting/bitter_harvest.html |
From: Denny V. <dv...@ai...> - 2006-05-11 10:00:23
|
> I will be in Edinbourgh but I won't be part of the official conference - > entrees are just to high for independent researchers like me. The fees are terrible this year! Danny Ayers and Jermey Carroll had pulled some magic to somehow manage lower entrance fees. Also students or PhD students get lower entrance fees. Anyway, if you like, we should meet up there. Boringly, I just agree to the rest :) Best, denny >> Exactly, so my point is that it is redundant (i.e. prone to errors) to >> maintain categories while they can be computed dynamically by >> following the foaf:topic relation (for instance) > > In Wikipedia categories *are* maintained by a large amount of people > because they are quite simple and have relatively fuzzy semantics. I > don't unterstand where you want to get all the foaf or other > relationships from. If you start a new semantic wiki you could encourage > users to use more typed links instead of categories but I bet > collaborative tagging with categories is easier because you don't have > to specify the exact relationship. > > If you don't know it yet you should have a look at W3C's SKOS working > draft - the Simple Knowledge Organisation System RDF vocabulary is > designed for thesauri and similar knowledge organisation systems - and > that's exactely what Wikipedia categories are as I have pointed out > before. Thesauri are used since the 1950s in information retrieval with > success - but its a lot of (intellectual) work to maintain them, so > fulltext search and other automatic methods are more know, especially in > the visible web. > >>>> Q1: is there a widely accepted policy on the use of categories vs. >>>> classes that I am not aware of? >>> >>> I can only speak for Wikipedia and there it was a long process that >>> is still going on to find out what categories are. >> >> The same for a semantic wiki. You have to define it within your >> project, or rather, the users have to define it collaboratively. >> Having the possibility to query these categories presupposes a certain >> definition into the queried category, which should be documented, as >> to have a higher consistency. > > I hope the possibility to query will also increase consistence because > you get more feedback on the consequences you classify articles. > > Greetings, > Jakob |
From: S P. <ski...@ea...> - 2006-05-11 11:50:32
|
Markus wrote: > * categories can be interpreted by default as RDFS or OWL classes and we have > software support for these. > A page's Category information isn't in the RDF export currently. SMW_SpecialExportRDF has code to export a page's categories as rdf:type and a category's categories as rdf:subClassOf, but it's triggered by the deprecated (?) SMW_SP_HAS_CATEGORY special property rather than standard category assignments. Is this a bug? I think rdf:type and rdfs:subClassOf are enough to make an RDF query for cities with a population greater than 1 million still find Budapest, which is in Category:Cities_on_the_Danube but not in Category:Cities. Is reasoning using rdf:subClassOf messed up by cycles? Note there can be, and in fact there are, cycles in Wikipedia categories, e.g. Education: Social sciences: Academic disciplines: Academia: Education: ... I notice the inline query feature <ask>[[Category:Cities]] [[population:=>1,000,000]]</ask> finds Budapest, because it has explicit code to walk several levels down into subcategories. Very nice, this a feature MediaWiki users have asked for ( http://meta.wikimedia.org/wiki/Category_flatten ). -- =S |
From: Denny V. <dv...@ai...> - 2006-05-11 12:02:28
|
> A page's Category information isn't in the RDF export currently. > SMW_SpecialExportRDF has code to export a page's categories as rdf:type > and a category's categories as rdf:subClassOf, but it's triggered by the > deprecated (?) SMW_SP_HAS_CATEGORY special property rather than standard > category assignments. Is this a bug? It was temporarily disabled. It is in again (since yesterday, thanks to Markus), but not in the CVS yet. As soon as CVS is up again we have to do some careful committing. > I think rdf:type and rdfs:subClassOf are enough to make an RDF query for > cities with a population greater than 1 million still find Budapest, > which is in Category:Cities_on_the_Danube but not in Category:Cities. Yes. If you have a RDFS- or OWL-enabled inferencer, yes, this is enough. > Is reasoning using rdf:subClassOf messed up by cycles? Note there can > be, and in fact there are, cycles in Wikipedia categories, e.g. > Education: Social sciences: Academic disciplines: Academia: > Education: ... No, it is not. Because the subClassOf relation is defined as "is a subset of", the a cycle means that all parts of the cycle will be interpreted with the same extension. Or, put differently: any query for Education, Social sciences, Academic disciplines or Academia will return the same set of answers. > I notice the inline query feature <ask>[[Category:Cities]] > [[population:=>1,000,000]]</ask> finds Budapest, because it has > explicit code to walk several levels down into subcategories. Very > nice, this a feature MediaWiki users have asked for ( > http://meta.wikimedia.org/wiki/Category_flatten ). Cool, thanks for the link. And the feature can be switched on or off, as wished (even gradually, I mean, you can set the number of levels to walk). This is for two reasons: first, in some wikis subcategories may just not be used as subsets, because the community uses them differently; second, I suspect it to be rather performance hungry. But this only experience will show. all the best, denny |