Quick Summary
Spotlight is an open-source platform built to help teams curate unstructured data for machine learning. It provides interactive tools that let subject-matter experts and data practitioners work together more easily, accelerating the process of preparing training datasets and improving model outcomes.
Connecting Your Data and Tools
Spotlight plugs into your existing environment with minimal setup. Load an existing DataFrame with a single command and continue using the libraries and tooling you already prefer. Its flexible templates make it simple to spin up interactive views for mixed-modality datasets (text, images, audio, etc.), and those templates can be saved and reused to keep curation consistent across projects.
Collaborative Curation Workflow
The platform emphasizes tight collaboration between domain experts and data engineers. Interactive labeling interfaces and shared views reduce friction in annotation, while built-in best-practice templates capture repeatable processes so teams don’t have to reinvent the same steps each time they curate data.
Supporting a Data-Centric Approach
Spotlight is designed for iterative, data-first ML development. It helps teams systematically refine training sets, track changes, and rerun curation cycles, which shortens iteration times and lowers the chance of costly mistakes during model development.
Advantages for ML Projects
- Streamlines teamwork between analysts, annotators, and subject experts.
- Reduces project risk by standardizing curation practices and improving dataset quality.
- Speeds up development cycles through reusable templates and fast integration.
Pricing Tiers
- Enterprise edition: customizable deployments and workflow integrations for large organizations with specific needs.
- Professional edition: focused on high-quality dataset creation and advanced curation capabilities.
- Community edition: free option for individuals and teams getting started with open-source curation tools.
Getting Started
To begin, install Spotlight and connect your dataset with a single line of code. Try one of the provided templates to create an interactive view, invite collaborators, and iterate on labels and examples. The tooling is built to fit into existing ML pipelines so you can move quickly from curation to training.
Technical
- Web App
- Full