Active Learning is a Python-based research framework developed by Google for experimenting with and benchmarking various active learning algorithms. It provides modular tools for running reproducible experiments across different datasets, sampling strategies, and machine learning models. The system allows researchers to study how models can improve labeling efficiency by selectively querying the most informative data points rather than relying on uniformly sampled training sets. The main experiment runner (run_experiment.py) supports a wide range of configurations, including batch sizes, dataset subsets, model selection, and data preprocessing options. It includes several established active learning strategies such as uncertainty sampling, k-center greedy selection, and bandit-based methods, while also allowing for custom algorithm implementations. The framework integrates with both classical machine learning models (SVM, logistic regression) and neural networks.
Features
- Modular experimentation framework for active learning research
- Supports multiple datasets and models including SVMs, logistic regression, and CNNs
- Implements a variety of active learning strategies such as margin sampling and k-center greedy
- Allows flexible configuration of parameters such as batch size, warm start ratio, and noise control
- Easy integration of new models and sampling methods through an extensible API
- Provides comprehensive benchmarking and analysis tools for experimental comparison