Data and tools for generating and inspecting OLMo pre-training data
Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX
Docker image used to run data processing workloads
A curated list of data mining papers about fraud detection
Efficient library for processing 3D data
Unified programming model for Batch and Streaming
Training data (data labeling, annotation, workflow) for all data types
Data-Centric Pipelines and Data Versioning
Blazing-fast Data-Wrangling toolkit
OpenGL Mathematics (GLM)
A simple interface for working with TeX documents
Data Science Guide With Videos And Materials
Official HDF5® Library Repository
A ranked list of awesome Python open-source libraries
Miller is like awk, sed, cut, join, and sort for name-indexed data
Instill Core is a full-stack AI infrastructure tool for data
Addax is a versatile open-source ETL tool
A GPU-accelerated library containing highly optimized building blocks
Production-ready data processing made easy and shareable
A distributed and extensible workflow scheduler platform
Flink CDC is a streaming data integration tool
Spatial data processing for geomodeling
Analyzing, storing and visualizing big data, scientifically
A network event stream processing system, in Clojure
A standalone, large scale, open project for 2D/3D image processing