Home
Name Modified Size InfoDownloads / Week
README 2012-12-17 4.0 kB
mapwiki.zip 2012-12-17 172.8 kB
Totals: 2 Items   176.8 kB 0
'''Wiki Category Matrix Visualization''' is a tool that generates a visual 
representation of data sizes across topics of a multi-level category hierarchy 
in matrix form. It provides a "big picture" overview of topics in terms of 
categorization.

It analyzes the Wikipedia category data to determine the number of articles 
assigned to any category, and to determine the most similar parent category 
for each category. The resulting visualization takes the first two levels of 
categories from the given Wikipedia and plots these on both the x and y axes,
and plots a disc representing the number of co-assignments of articles to the 
given pair of categories. 

To illustrate this, the figure below shows an example. A certain number of 
articles (about 100) are assigned to both category "Transportation" 
(hightlighted in the figure below with number 1) and category "Engineering"
(number 2). The visualization shows a proportionally sized disc (number 3) 
at the intersection of these two categories. Moreover, as the category 
"Transportation" belongs to parent category "Everyday life" (number 4), and 
category "Engineering" belongs to category "Science" (number 5), these 100 
co-assigned articles would also contribute to the count of all articles co-
assigned to categories "Everyday life" and "Science", shown as a larger disc of 
first level category co-assignments (number 6).

(See matrix-visualization-example.png for the figure)

This tool was developed by Cheong-Iao Pang as part of his master degree studies
at the University of Macau, supervised by Dr. Robert P. Biuk-Aghai. Further 
improvements were made by Peter Kin-Fong Fong.

== License ==

This tool is released under Educational Community License Version 2.0.

== Requirements ==

* Read access to Mediawiki database (only 'pages' and 'categorylinks' tables 
  are needed)
* Java SE 1.6 or above
* Libraries (to be placed in lib directory)
** MySQL Connector/J 5.1.22 or above
** jopt-simple 4.3 or above

== Usage instruction ==
'''Step 1:''' Download the libraries mentioned above and place them into the lib 
	directory. Please go to the following web pages to get the files.
	
	* MySQL Connector/J
	http://www.mysql.com/downloads/connector/j/
	
	* jopt-simple
	http://pholser.github.com/jopt-simple/download.html

'''Step 2:''' Edit run.sh (Linux or Mac OS X users) or run.bat (Windows users) 
	Change the following parameters to fit your setup:

	dbconn: JDBC connection string, in the following format
		jdbc:mysql://<host>:<port>/<database>
	dbuser: Username of the database user that have read access to required DB
	dbpass: Password of the user above
	
	root_title: Category title of the "root category", i.e. the category that 
		contains all the other content categories. Different wikis usually have 
		different root category title, please lookup your wiki.
		
'''Step 3:''' Run the run.sh / run.bat to generate the matrix visualization 
	graph. A few text files will be created in the process, containing the 
	category tree and similarity data. The file name of output visualization 
	image is output.png
	
	If an out of memory error occurs, try to increase the maximum heap memory. 
	Replace -Xmx256M at the last line of batch file with larger values, 
	like -Xmx512M.

== Related papers ==

Cheong-Iao Pang and Robert P. Biuk-Aghai. 2010. 
A method for category similarity calculation in Wikis. 
In Proceedings of the 6th International Symposium on Wikis and Open Collaboration (WikiSym '10). 
ACM, New York, NY, USA, , Article 19 , 2 pages. 
DOI=10.1145/1832772.1832798 
http://doi.acm.org/10.1145/1832772.1832798

Robert P. Biuk-Aghai, Cheong-Iao Pang, and Felix Hon Hou Cheang. 2011. 
Visualization of large category hierarchies. 
In Proceedings of the 2011 Visual Information Communication - International Symposium (VINCI '11). 
ACM, New York, NY, USA, , Article 2 , 10 pages. 
DOI=10.1145/2016656.2016658 
http://doi.acm.org/10.1145/2016656.2016658
Source: README, updated 2012-12-17