Menu

The Vikamine Kernel by Examples

Florian Lemmerich

The Vikamine Kernel by Examples: A Busy Developers' Guide to the VIKAMINE Kernel

How can I perform subgroup discovery?
The basic process is:
1. Load a dataset
2. Specify your task in a MiningTask object
3. Run the task

These steps are described in the next questions in more detail.
An example of this is implemented in the class org.vikamine.kernel.examples.SimpleTaskRunnerExample

How do I load a dataset?
The onto object then stores your dataset and related knowledge.
```Ontology onto = DataFactory.createOntology(new File(
"../org.vikamine.eval/resources/datasets/adults.arff"));

**How do I specify a mining task?**
1) Create a `MiningTask` object for your data stored in the Ontology `onto`
```MiningTask task = new MiningTask(onto);```

2) Set your target concept (property of interest)
2a) Do this for a boolean target concept. Let assume the target concept is that the attribute *class* takes the value *>50K*:
```task.setTarget(new SelectorTarget(new DefaultSGSelector(onto, "class", ">50K")));```

2b) Alternatively, if your target concept is a numeric attribute, e.g. named *age*:
```task.setTarget(new NumericTarget(onto, "age"));```

3) Set the search space. This involves which attributes are used, which selection expressions are formed over these attributes (the standard generator will create attribute-value pairs von nominal attributes and discretized intervals for numeric attributes), and which part of the data instances is considered (`onto.getDataView()` uses all by default):
List<SGSelector> selectors = SelectorGeneratorUtils.generateSelectors(SGSelectorGeneratorFactory
    .createStandardGenerator(),
    onto.getAttributes(), onto.getDataView());
task.setSearchSpace(selectors);
4) Select your interestingness measure (= quality function).
`task.setQualityFunction(new ChiSquareQF());`

5) Select the family of algorithms you want to use, here for example the chi² function:
`task.setQualityFunction(new ChiSquareQF());` 

6) Select your favorite algorithm:
`task.setMethodType(SDMap.class);`

7) Set constraints to the search. In this example, we want to compute the best 10 subgroups according to our interestingness measure, and search only for conjunctions of at most 3 selection expressions:

task.setMaxSGCount(10);
task.setMaxSGDSize(3);

**I have created and configured a `MiningTask` object. How do I get my subgroups?**
Run the subgroup discovery task:
```SGSet result = task.performSubgroupDiscovery();```

**How do i get base statistics for a dataset?**
```System.out.println("Attributes in dataset: " + onto.getNumAttributes());
System.out.println("Instances in dataset: " + onto.getNumInstances());

Are attribute and value names case-sensitive?
Yes, they are.


Related

Wiki: Documentation