Welcome to the wiki page!
In this page, I will give a tutorial about the use of the Chordalysis software.
Here is the step-by-step procedure that I'll develop.
This tutorial had been written for the main of the 'Run.java' file. Chordalysis now uses a GUI via the class 'RunGUIProof', Step 4 has to be replaced by:
Open a Windows/Linux/MacOS terminal and navigate to the repository in which you extracted Chordalysis.jar (cd command). Then if you simply type:
java -jar -Xmx1g Chordalysis.jar
The download part is pretty easy, just download the .zip here:
Not the hardest step either: just extract the zip where you want to (:
Chordalysis analyses data, so... you need a dataset.
The requirements for the file containing the dataset are:
You can have a look at the mushroom dataset for an example of correctly formatted CSV.
Open a Windows/Linux/MacOS terminal and navigate to the repository in which you extracted Chordalysis.jar (cd command). Then if you simply type:
java -jar Chordalysis.jar
the terminal will tell you how to use the program:
petitjean@xx:~$ java -jar Chordalysis.jar
Usage: java -Xmx1g -jar Chordalysis.jar dataFile pvalue imageOutputFile useGUI?
Example: java -Xmx1g -jar Chordalysis.jar dataset.csv 0.05 graph.png false
Note: '1g' means that you authorize 1GB of memory.
Note: It should be adjusted depending upon the size of your data set (mostly required to load the data set).
So, if you saved the file mushroom.csv in the same folder as Chordalysis.jar, you can call the program with:
java -Xmx1g -jar Chordalysis.jar mushroom.csv 0.05 mush-graph.png false
After having launched the previous command, you should be able to read (in the terminal) something like:
The model selected is: (selected in 1701ms)
[cap-shape cap-surface bruises? ring-type class]
[cap-shape bruises? gill-spacing ring-type class]
[cap-shape bruises? gill-spacing stalk-root class]
[bruises? gill-spacing veil-color ring-type class]
[bruises? gill-spacing gill-color veil-color class]
[cap-shape gill-spacing ring-type spore-print-color class]
[bruises? gill-spacing veil-color habitat class]
[bruises? gill-size veil-color habitat class]
[bruises? odor gill-size veil-color class]
[bruises? gill-size stalk-surface-below-ring veil-color class]
[bruises? stalk-surface-above-ring stalk-surface-below-ring veil-color class]
[bruises? gill-size stalk-surface-below-ring ring-number class]
[cap-color bruises? gill-size stalk-surface-below-ring ring-number]
[cap-color bruises? gill-size stalk-shape ring-number]
[bruises? gill-spacing veil-color population habitat]
[bruises? gill-attachment stalk-shape ring-number]
[cap-shape bruises? stalk-color-above-ring class]
[cap-shape bruises? stalk-color-below-ring class]
[veil-type]
This is the model that has been selected, using the classical log-linear model formatting.
Let's take an example. [cap-shape cap-surface bruises? ring-type class] means that there is a 5-way correlation between the variables cap-shape, cap-surface, bruises?, ring-type and the class.
It turns out that models that Chordalysis is discovering are all graphical. What this means is that the previous model, for instance, can be represented graphically. The nodes are the variables and every multi-way correlation will correspond to a clique in the graph (set of fully connected variables).
In the previous command, we indicated that we want the graphical representation of the model to be saved as a file named mush-graph.png. You should have such a file available in the same folder now:
For any questions/problems/feedback, do not hesitate to contact me at petitjean(at)tiny-clues(dot)eu