decodIRT_BRACIS20 - Browse /Output at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	InfoDownloads / Week
Parent folder
wilt	2020-05-13	0
wdbc	2020-05-13	0
wall-robot-navigation	2020-05-13	0
vowel	2020-05-13	0
vehicle	2020-05-13	0
tic-tac-toe	2020-05-13	0
texture	2020-05-13	0
steel-plates-fault	2020-05-13	0
splice	2020-05-13	0
spambase	2020-05-13	0
sick	2020-05-13	0
semeion	2020-05-13	0
segment	2020-05-13	0
satimage	2020-05-13	0
qsar-biodeg	2020-05-13	0
phoneme	2020-05-13	0
PhishingWebsites	2020-05-13	0
pendigits	2020-05-13	0
pc3	2020-05-13	0
pc1	2020-05-13	0
ozone-level-8hr	2020-05-13	0
optdigits	2020-05-13	1
MiceProtein	2020-05-13	0
mfeat-zernike	2020-05-13	0
mfeat-pixel	2020-05-13	0
mfeat-morphological	2020-05-13	0
mfeat-karhunen	2020-05-13	0
mfeat-fourier	2020-05-13	0
mfeat-factors	2020-05-13	0
madelon	2020-05-13	0
letter	2020-05-13	0
kr-vs-kp	2020-05-13	0
kc2	2020-05-13	0
kc1	2020-05-13	0
jm1	2020-05-13	0
isolet	2020-05-13	0
Internet-Advertisements	2020-05-13	0
ilpd	2020-05-13	0
har	2020-05-13	0
GesturePhaseSegmentationProcessed	2020-05-13	0
first-order-theorem-proving	2020-05-13	0
eucalyptus	2020-05-13	0
dresses-sales	2020-05-13	0
dna	2020-05-13	0
diabetes	2020-05-13	0
cylinder-bands	2020-05-13	0
credit-g	2020-05-13	0
credit-approval	2020-05-13	0
cnae-9	2020-05-13	0
cmc	2020-05-13	0
climate-model-simulation-crashes	2020-05-13	0
churn	2020-05-13	0
car	2020-05-13	0
breast-w	2020-05-13	0
blood-transfusion-service-center	2020-05-13	0
Bioresponse	2020-05-13	0
banknote-authentication	2020-05-13	0
balance-scale	2020-05-13	0
analcatdata_dmft	2020-05-13	0
analcatdata_authorship	2020-05-13	0
Totals: 60 Items		1

Here you can find all the files and supplementary materials for the paper "Decoding machine learning benchmarks", published in BRACIS20. The files were organized as follows:

Results BRACIS

This folder concentrates all the results generated that are used in the paper.
datasets.csv: List of ID's that the datasets used in the paper have in OpenML. This list serves as input for the first script.
clf_rating.csv: File containing the classifier ranting ranking that is shown in Table 1 of the paper.
Real_clf_nemenyi.csv: P-value matrix resulting from the Nemenyi calculation for the real classifiers.
IRT_param_freq.txt: File that shows the percentages of difficult, discriminating and easy-to-guess instances for all datasets.
modelosML.txt: File that lists all hyperparameters used in ML models that were analyzed in the paper.
Fluxograma.png: Flowchart of the decodIRT execution, shown in Figure 1 of the paper.
graph_percIRT.png: Image of the graph that compares the percentages of difficult and discriminating instances of the datasets, shown in Figure 2 of the paper.
jm1_score.png: Image of the comparison chart between the True-Score obtained by the classifiers in the "jm1" dataset, shown in Figure 3 of the paper.
heatmap_realclf.png: Image of the heapmap used to analyze the results of the Nemenyi test, shown in Figure 4 of the paper.

Output

This folder contains all the results generated for each dataset after the execution of each of the three scripts. All results are divided into folders named after each dataset. Each folder contains the following files:
- Results of the decodIRT_OtML script:
  dataset_name.csv: The file without suffix indicates that its content is the answer of each of the ML models of the real classifiers and the artificial classifiers.
  dataset_name_acuracia.csv: The file with the suffix “_acuracia” means that its content is composed of a table containing the average accuracy of each real classifier, during cross-validation.
  dataset_name_final.csv: The file with the suffix “_final” means that its content consists of a table containing the accuracy of the real classifiers on the separate instances for testing.
  dataset_name_irt.csv: Just like the file without suffix in the name (dataset_name.csv), this file has an array of responses. However, it does contain a response vector for all real, artificial and MLP classifiers. This matrix is used to generate the IRT item parameters in the second script.
  dataset_name_mlp.csv: Contains the final accuracy that the first set MLP’s classifiers obtained after the classification.
  dataset_name_test.csv: The contents of the file are a list of all instances of data that are part of the test set.
- Results of the decodIRT_MLtIRT script:
  irt_item_param.csv: Table containing all item parameters (Difficulty, Discrimination and Guessing) generated for the test set instances.
- Results of the decodIRT_analysis script:
  score_disPositivo.csv: Table containing the True-Score score obtained for each real classifier, considering only the instances with positive discrimination.
  score_total.csv: Table containing the True-Score score obtained for each real classifier, considering all instances.
  theta_list.csv: Table that shows the final Theta value obtained by each real classifier.
  dataset_name_score.png: Image of the comparison chart between the True-Score obtained by the real and artificial classifiers in the dataset.

Scripts

This folder contains all the scripts used to generate the results presented in the paper.
- decodIRT:
  decodIRT_OtML.py
  decodIRT_MLtIRT.py
  decodIRT_analysis.py
- Other Scripts:
  clf_rating_nemenyi.py: Script created to calculate the rating of the classifiers using the Glicko-2 system and to perform the Friedman and Nemenyi Tests.
  *Note: to calculate the ratings it is necessary to have the python script of the Glicko-2 system which can be downloaded through the link http://www.glicko.net/glicko.html

Source: README.md, updated 2020-05-28

decodIRT_BRACIS20 Files

Decoding machine learning benchmarks

Results BRACIS

Output

Scripts