A Repo For Document AI
Assist in organizing your piles of documents
Text mining using tidy tools
A persistent, network resilient, full text search library
Semantic search and workflows for medical/scientific papers
ExtractThinker is a Document Intelligence library for LLMs
A Heterogeneous Benchmark for Information Retrieval
Apache OpenNLP
State-of-the-art Multilingual Question Answering research
Common Resource Grep
Chinese synonyms, chat robot, intelligent question and answer toolkit
Transforms PDF, Documents and Images into Enriched Structured Data
TextRank implementation for Python 3
AiLearning, data analysis plus machine learning practice
NLP tool for statistical analysis of words, sentences, documents
JSON based text search Java Project
A corpus that could be of help for researchers working on Arabic NLP