Smallpond

smallpond is a lightweight distributed data processing framework built by DeepSeek, designed to scale DuckDB workloads over clusters using their 3FS (Fire-Flyer File System) backend. The idea is to preserve DuckDB’s fast analytics engine but lift it from single-node to multi-node settings, giving you the ability to operate on large datasets (e.g. petabyte scale) without moving to a heavyweight system like Spark. Users write Python-like code (via DataFrame APIs or SQL strings) to express their transformations; behind the scenes, tasks are scheduled (often via Ray) and pushed into DuckDB instances operating on partitioned data. Because the storage layer (3FS) is optimized for random access and high throughput, smallpond can shuffle data, repartition, and manage intermediate results across nodes.

Features

Distributed extension of DuckDB: support for running SQL / DataFrame operations across nodes
Uses 3FS as the shared data backend to manage data storage and shuffle operations
APIs for transformations via SQL strings or Python functions (map, partial_sql)
Support for repartitioning by number of partitions, row count, or hash on a column
Two execution modes: high-level dynamic (Ray-based) and low-level static graph execution
Optimized for large-scale workloads (benchmarked at ~100 TiB sorting)

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Smallpond

Smallpond Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free

Rate This Project

User Reviews

Be the first to post a review of Smallpond!

Additional Project Details

Programming Language

Python

Related Categories

Python Frameworks

Registered

13 hours ago

Report inappropriate content

Smallpond

A lightweight data processing framework built on DuckDB and 3FS

Get an email when there's a new version of Smallpond

Features

Project Samples

Project Activity

Categories

License

Follow Smallpond

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered