Data processing for and with foundation models
Polymarket Data Retriever that fetches, processes, and structures data
SDG is a specialized framework
An end-to-end Data Scientist
ExtractThinker is a Document Intelligence library for LLMs
Software to processing and analyze of airborne measurements.
Deep Research framework, combining language models with tools
Python Stream Processing
Data Science Guide With Videos And Materials
Python ETL framework for stream processing, real-time analytics, LLM
A curated list of data mining papers about fraud detection
Docker image used to run data processing workloads
Training data (data labeling, annotation, workflow) for all data types
Open source libraries and APIs to build custom preprocessing pipelines
AI-Powered Data Processing: Use LOTUS to process all of your datasets
Production-ready data processing made easy and shareable
The lxml XML toolkit for Python
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Data and tools for generating and inspecting OLMo pre-training data
Turn WiFi signals into real-time human pose estimation and detection
Instill Core is a full-stack AI infrastructure tool for data
MineContext is your proactive context-aware AI partner
Cloud-native open source data warehouse for analytics and AI queries
A multi-cloud framework for big data analytics
Extract schema, statistics and entities from datasets