Smallpond
A lightweight data processing framework built on DuckDB and 3FS
smallpond is a lightweight distributed data processing framework built by DeepSeek, designed to scale DuckDB workloads over clusters using their 3FS (Fire-Flyer File System) backend. The idea is to preserve DuckDB’s fast analytics engine but lift it from single-node to multi-node settings, giving you the ability to operate on large datasets (e.g. petabyte scale) without moving to a heavyweight system like Spark. Users write Python-like code (via DataFrame APIs or SQL strings) to express their transformations; behind the scenes, tasks are scheduled (often via Ray) and pushed into DuckDB instances operating on partitioned data. ...