| File | Date | Author | Commit |
|---|---|---|---|
| ml-codesmell | 2022-09-10 |
|
[3da66e] Add file via upload |
| python-code | 2022-09-10 |
|
[de8753] Delete new |
| .gitignore | 2022-08-29 |
|
[ddae7b] Initial commit |
| LICENSE | 2022-08-29 |
|
[ddae7b] Initial commit |
| README.md | 2022-10-17 |
|
[35dff3] Update README.md |
Nguyen Thanh Binh, Minh N. H. Nguyen, Le Thi My Hanh, and Nguyen Thanh Binh.
ml-Codesmell: A code smell prediction dataset for machine learning approaches. In Proceedings of The 11th International Symposium On Information And Communication Technology (SOICT ’2022).
This project proposes the ml-Codesmell dataset created by analysing source code and extracting massive source code metrics with many labelled code smells and has been used to train and predict code smell using machine learning algorithms. The proposed dataset is expected to be useful for research projects on predicting code smell based on a machine-learning approach.
The dataset includes two following folders:
This folder stores 2 data of the project as follows:
The file project_catalogue.csv contains links to the open-source code project catalogue. The source code of projects in the catalogue is cloned and downloaded into local storage to analyse and extract source code metrics.
The ml-codesmell dataset is zipped and split into three files as ml-codesmell.zip, ml-codesmell.z01, and ml-codesmell.z02. Please unzip the file ml-codesmell.zip when using it.
This folder contains two following files:
If you have any questions or problems using the dataset, please contact support over email: thanhbinh@cdtb.edu.vn.