Data processing for and with foundation models
An end-to-end Data Scientist
Python Stream Processing
ExtractThinker is a Document Intelligence library for LLMs
Training data (data labeling, annotation, workflow) for all data types
Open source libraries and APIs to build custom preprocessing pipelines
A curated list of data mining papers about fraud detection
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Data and tools for generating and inspecting OLMo pre-training data
A GPU-accelerated library containing highly optimized building blocks
Text mining using tidy tools
Extract schema, statistics and entities from datasets
Analyzing, storing and visualizing big data, scientifically
ArrayFire, a general purpose GPU library
A free, open-source, and cross-platform big data analytics framework
Superlinked is a Python framework for AI Engineers
Fast and customizable framework for automatic ML model creation
OCRmyPDF adds an OCR text layer to scanned PDF files
Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper
Easy-to-use and high-performance NLP and LLM framework
An image processing library written entirely in JavaScript for Node
LLM based data scientist, AI native data application
The Classical Language Toolkit
Efficient few-shot learning with Sentence Transformers
A Repo For Document AI