Name | Modified | Size | Downloads / Week |
---|---|---|---|
source code | 2025-06-20 | ||
DataPrep_v0.2.0.exe | 2025-06-20 | 43.3 MB | |
About DataPrep v0.2.pdf | 2025-06-20 | 482.6 kB | |
Screenshot 2.png | 2025-06-20 | 86.8 kB | |
Screenshot 1.png | 2025-06-20 | 3.5 kB | |
README.md | 2025-06-20 | 2.2 kB | |
LICENSE | 2025-06-20 | 2.4 kB | |
DataPrep_v0.2-User_Manual.png | 2025-06-20 | 172.0 kB | |
DataPrep_v0.2-User_Manual.pdf | 2025-06-20 | 97.1 kB | |
Totals: 9 Items | 44.2 MB | 2 |
## DataPrep v0.2 ##
DataPrep is a GUI-based Python application for preprocessing tabular data, specifically designed to handle missing values, zero-variance features, and high-correlation removal. It allows users to load CSV/Excel files, configure preprocessing settings, and save processed data.
Features
- Load CSV (Preferred)/Excel files for processing.
- Select index and dependent columns.
- Choose correlation method (Pearson, Spearman, Kendall).
- Set correlation and covariance cutoff values.
- Choose processing method (SIMPLE, SPARC, SWAPCo).
- Remove missing columns and low variance features.
- Save processed data in CSV/Excel format.
- About and Info sections for user guidance.
GUI Overview
- File Selection: Browse and load the input data file.
- Index Column & Dependent Column: Dropdowns to select columns.
- Correlation Method: Choose Pearson, Spearman, or Kendall.
- Correlation & Covariance Cutoff: Set threshold values.
- Processing Method: Select from Remove Null Vars, SIMPLE, SPARC, or SWAPCo.
- Preprocessing & Saving: Buttons to start and save processed data.
- About & Info: Displays additional information about the application.
Usage:
- Browse and select an input CSV/Excel file.
- Choose index and dependent columns.
- Set correlation method, correlation cutoff, and covariance cutoff values.
- Select a processing method (Remove Null, SIMPLE, SPARC, or SWAPCo).
- Click Start Preprocessing to process the data.
- Save the processed data using the Save Reduced Data button.
Sources:
https://pandas.pydata.org/docs/
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold
https://docs.python.org/3/library/tkinter
Developed by
SUVANKAR BANERJEE suvankarbanerjee1995@gmail.com
Research Scholar, Natural Science Laboratory, Division of Medicinal Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata-700060, INDIA.