A Repo For Document AI
Assist in organizing your piles of documents
ExtractThinker is a Document Intelligence library for LLMs
A Heterogeneous Benchmark for Information Retrieval
A persistent, network resilient, full text search library
Semantic search and workflows for medical/scientific papers
Apache OpenNLP
Haystack is an open source NLP framework to interact with your data
State-of-the-art Multilingual Question Answering research
Common Resource Grep
Chinese synonyms, chat robot, intelligent question and answer toolkit
Transforms PDF, Documents and Images into Enriched Structured Data
TextRank implementation for Python 3
AiLearning, data analysis plus machine learning practice