Parse files for optimal RAG
Accurate × Fast × Comprehensive
Generate audiobooks from EPUBs, PDFs and text with captions
DeepCode: Open Agentic Coding
Research-oriented chatbot framework
LongBench v2 and LongBench (ACL 25'&24')
OCR model for complex documents with layout-aware structured outputs
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Library for OCR-related tasks powered by Deep Learning
Enhances Tesseract OCR output using LLMs (local or API)
ExtractThinker is a Document Intelligence library for LLMs
Interact with your documents using the power of GPT
Big Model Application Development Practice 1
Semantic search and workflows for medical/scientific papers
A Heterogeneous Benchmark for Information Retrieval
BISHENG is an open LLM devops platform for next generation apps
ContextGem: Effortless LLM extraction from documents
An open-source RAG-based tool for chatting with your documents
The official Python client for the Huggingface Hub
Ready-to-use OCR with 80+ supported languages
Revolutionizing Database Interactions with Private LLM Technology
Visual Causal Flow
Leaderboard Comparing LLM Performance at Producing Hallucinations
Topic Modelling for Humans
Open Source Generative Process Automation