Wiki Category Matrix Visualization is a tool that generates a visual representation of data sizes across topics of a multi-level category hierarchy in matrix form. It provides a "big picture" overview of topics in terms of categorization.
It analyzes the Wikipedia category data to determine the number of articles assigned to any category, and to determine the most similar parent category for each category. The resulting visualization takes the first two levels of categories from the given Wikipedia and plots these on both the x and y axes, and plots a disc representing the number of co-assignments of articles to the given pair of categories.
To illustrate this, the figure below shows an example (only an extract of the whole visualization is shown). A certain number of articles (about 100) are assigned to both category "Transportation" (highlighted in the figure below with number 1) and category "Engineering" (number 2). The visualization shows a proportionally sized disc (number 3) at the intersection of these two categories. Moreover, as the category "Transportation" belongs to parent category "Everyday life" (number 4), and category "Engineering" belongs to category "Science" (number 5), these 100 co-assigned articles would also contribute to the count of all articles co-assigned to categories "Everyday life" and "Science", shown as a larger disc of first level category co-assignments (number 6).
![]()
This tool was developed by Cheong-Iao Pang as part of his master degree studies at the University of Macau, supervised by Dr. Robert P. Biuk-Aghai. Further improvements were made by Peter Kin-Fong Fong.
This tool is released under Educational Community License Version 2.0.
Step 1: Download the libraries mentioned above and place them into the lib directory. Please go to the following web pages to get the files.
MySQL Connector/J
http://www.mysql.com/downloads/connector/j/
jopt-simple
http://pholser.github.com/jopt-simple/download.html
Step 2: Edit run.sh (Linux or Mac OS X users) or run.bat (Windows users). Change the following parameters to fit your setup:
dbconn: JDBC connection string, in the following format
jdbc:mysql://<host>:<port>/<database>
dbuser: Username of the database user that have read access to required DB
dbpass: Password of the user above
root_title: Category title of the "root category", i.e. the category that contains all the other content categories. Different wikis usually have different root category title, please lookup your wiki.
Step 3: Run the run.sh / run.bat to generate the matrix visualization graph. A few text files will be created in the process, containing the category tree and similarity data. The file name of output visualization image is output.png
If an out of memory error occurs, try to increase the maximum heap memory. Replace -Xmx256M at the last line of batch file with larger values, like -Xmx512M.
Cheong-Iao Pang and Robert P. Biuk-Aghai. 2010. A method for category similarity calculation in Wikis. In Proceedings of the 6th International Symposium on Wikis and Open Collaboration (WikiSym '10). ACM, New York, NY, USA, Article 19, 2 pages. DOI=10.1145/1832772.1832798
http://doi.acm.org/10.1145/1832772.1832798
Robert P. Biuk-Aghai, Cheong-Iao Pang, and Felix Hon Hou Cheang. 2011. Visualization of large category hierarchies. In Proceedings of the 2011 Visual Information Communication - International Symposium (VINCI '11). ACM, New York, NY, USA, Article 2, 10 pages. DOI=10.1145/2016656.2016658
http://doi.acm.org/10.1145/2016656.2016658