Polymarket Data is a comprehensive data engineering pipeline designed to collect, process, and structure trading activity from the Polymarket prediction market ecosystem into analyzable datasets. The system operates as a multi-stage pipeline that integrates data from both off-chain APIs and on-chain event sources, enabling users to reconstruct full trading activity including markets, order events, and executed trades. It begins by fetching market metadata such as questions, outcomes, and trading volumes, then proceeds to scrape order-filled events from a GraphQL-based subgraph, and finally transforms these raw events into structured trade-level records with calculated prices and directions. One of its key strengths is its ability to run incrementally and resume operations automatically, making it suitable for long-running data collection without duplication or data loss.
Features
- Multi-stage pipeline for markets, events, and trade processing
- Incremental updates with automatic resume from checkpoints
- Integration of API data and on-chain event data sources
- Automatic discovery of missing or newly created markets
- Structured trade generation with pricing and direction logic
- Robust error handling with retries and rate limit management