VIKAMINE Wiki
Brought to you by:
lemperator,
mna
The Vikamine Kernel by Examples: A Busy Developers' Guide to the VIKAMINE Kernel
How can I perform subgroup discovery?
The basic process is:
1. Load a dataset
2. Specify your task in a MiningTask
object
3. Run the task
These steps are described in the next questions in more detail.
An example of this is implemented in the class org.vikamine.kernel.examples.SimpleTaskRunnerExample
How do I load a dataset?
The onto
object then stores your dataset and related knowledge.
```Ontology onto = DataFactory.createOntology(new File(
"../org.vikamine.eval/resources/datasets/adults.arff"));
**How do I specify a mining task?** 1) Create a `MiningTask` object for your data stored in the Ontology `onto` ```MiningTask task = new MiningTask(onto);``` 2) Set your target concept (property of interest) 2a) Do this for a boolean target concept. Let assume the target concept is that the attribute *class* takes the value *>50K*: ```task.setTarget(new SelectorTarget(new DefaultSGSelector(onto, "class", ">50K")));``` 2b) Alternatively, if your target concept is a numeric attribute, e.g. named *age*: ```task.setTarget(new NumericTarget(onto, "age"));``` 3) Set the search space. This involves which attributes are used, which selection expressions are formed over these attributes (the standard generator will create attribute-value pairs von nominal attributes and discretized intervals for numeric attributes), and which part of the data instances is considered (`onto.getDataView()` uses all by default):
List<SGSelector> selectors = SelectorGeneratorUtils.generateSelectors(SGSelectorGeneratorFactory .createStandardGenerator(), onto.getAttributes(), onto.getDataView()); task.setSearchSpace(selectors);
4) Select your interestingness measure (= quality function). `task.setQualityFunction(new ChiSquareQF());` 5) Select the family of algorithms you want to use, here for example the chi² function: `task.setQualityFunction(new ChiSquareQF());` 6) Select your favorite algorithm: `task.setMethodType(SDMap.class);` 7) Set constraints to the search. In this example, we want to compute the best 10 subgroups according to our interestingness measure, and search only for conjunctions of at most 3 selection expressions:
task.setMaxSGCount(10);
task.setMaxSGDSize(3);
**I have created and configured a `MiningTask` object. How do I get my subgroups?** Run the subgroup discovery task: ```SGSet result = task.performSubgroupDiscovery();``` **How do i get base statistics for a dataset?** ```System.out.println("Attributes in dataset: " + onto.getNumAttributes()); System.out.println("Instances in dataset: " + onto.getNumInstances());
Are attribute and value names case-sensitive?
Yes, they are.