Data processing for and with foundation models
An end-to-end Data Scientist
Python Stream Processing
ExtractThinker is a Document Intelligence library for LLMs
Software to processing and analyze of airborne measurements.
Training data (data labeling, annotation, workflow) for all data types
Open source libraries and APIs to build custom preprocessing pipelines
Python ETL framework for stream processing, real-time analytics, LLM
A curated list of data mining papers about fraud detection
Data Science Guide With Videos And Materials
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Docker image used to run data processing workloads
Production-ready data processing made easy and shareable
The lxml XML toolkit for Python
Data and tools for generating and inspecting OLMo pre-training data
Extract schema, statistics and entities from datasets
A multi-cloud framework for big data analytics
Superlinked is a Python framework for AI Engineers
Fast and customizable framework for automatic ML model creation
Easy-to-use and high-performance NLP and LLM framework
OCRmyPDF adds an OCR text layer to scanned PDF files
A Repo For Document AI
Hub of ready-to-use datasets for ML models
The Classical Language Toolkit
Public opinion analysis system