BoxMOT is an open-source framework designed to provide modular implementations of state-of-the-art multi-object tracking algorithms for computer vision applications. The project focuses on the tracking-by-detection paradigm, where objects detected by vision models are continuously tracked across frames in a video sequence. It provides a pluggable architecture that allows developers to combine different object detectors with multiple tracking algorithms without modifying the core codebase. The framework supports integration with detection, segmentation, and pose estimation models that produce bounding box outputs. It also includes evaluation tools and benchmarking pipelines that allow researchers to test tracking performance on standard datasets such as MOT17 and MOT20. The system offers different performance modes that balance computational efficiency with tracking accuracy depending on the application requirements.
Features
- Pluggable architecture supporting multiple tracking algorithms
- Integration with object detection, segmentation, and pose estimation models
- Benchmarking pipelines for standard multi-object tracking datasets
- Performance modes balancing speed and tracking accuracy
- Support for appearance-based and motion-based tracking strategies
- Reusable detection and embedding pipelines for efficient experimentation