A GUI tool for extracting hard-coded subtitle (hardsub) from videos
ExtractThinker is a Document Intelligence library for LLMs
Structured data extraction and instruction calling with ML, LLM
Document (PDF, Word, PPTX ...) extraction and parse API
ContextGem: Effortless LLM extraction from documents
A high-quality tool for convert PDF to Markdown and JSON
Open source NLP guide with models, methods, and real use cases
No-code LLM Platform to launch APIs and ETL Pipelines
Make websites accessible for AI agents
Document content and metadata extraction microservice
Python Audio Analysis Library: Feature Extraction, Classification
A Simple and Universal Swarm Intelligence Engine
Did you say you like data?
End-to-end pipeline converting generative videos
Synthetic data curation for post-training and data extraction
NLP Cloud serves high performance pre-trained or custom models for NER
The highest-scoring AI memory system ever benchmarked
An on-premises, OCR-free unstructured data extraction
An open and fair framework for everyone to build AI agents
kaldi-asr/kaldi is the official location of the Kaldi project
The no-nonsense RAG chunking library
Your Fully-Automated Personal AI Assistant
OCR software, free and offline
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
Open-source evaluation toolkit of large multi-modality models (LMMs)