indexing chinese texts free download

Anna’s Archive

Comprehensive search engine for books, papers, comics, magazines

Anna’s Archive is a large-scale open-source search engine and data aggregation platform designed to index and provide access to a vast collection of books, academic papers, comics, magazines, and other digital texts through a unified interface. The project includes all the infrastructure required to run a full instance locally or in production, combining web servers, databases, and search indexing systems into a scalable architecture. It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. ...

Downloads: 83 This Week

Last Update: 2026-03-23

See Project

PageIndex

Document Index for Vectorless, Reasoning-based RAG

PageIndex is an innovative open-source framework that reimagines retrieval-augmented generation (RAG) by eliminating conventional vector similarity search and instead building hierarchical semantic indexes that mirror a document’s natural structure. Rather than chunking text and embedding it into a vector database, PageIndex constructs a tree-structured index — similar to a detailed, AI-enhanced table of contents — that a large language model can traverse to locate the most relevant sections...

Downloads: 3 This Week

Last Update: 6 days ago

See Project

shuyuan

Reading book source

shuyuan is a project oriented around reading and knowledge consumption, especially targeting large-scale text content such as books, articles, or educational material. The name suggests “academy” or “study hall,” and the tool aims to help users ingest, organize, and manage reading content — possibly offering features like text parsing, annotation, metadata generation, translation, or storage for later reference. The repository is set up to support document ingestion, indexing, and maybe some...

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

PyCAPGE

PyCAPGE - Python Classic Adventure Point and Click Game Engine

PyCAPGE (Python Classic Adventure Point and Click Game Engine) is a versatile, open-source framework designed for creating retro-style 2D graphic adventures using Python and Pygame. Inspired by the golden age of SCUMM games, it features a customizable 9-verb interface and robust inventory management. Key features include a Scene Manager supporting parallax scrolling, walk-behind masks, and depth-based character scaling. It implements intelligent Pathfinding to navigate complex...

Downloads: 0 This Week

Last Update: 2026-01-04

See Project

cocoNLP

A Chinese information extraction tool

cocoNLP is a lightweight natural-language processing toolkit geared toward practical information extraction from raw text, especially for Chinese and mixed Chinese–English content. Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. ...

Downloads: 0 This Week

Last Update: 2025-11-05

See Project

Search Results for "indexing chinese texts"

Showing 5 open source projects for "indexing chinese texts"

Anna’s Archive

PageIndex

shuyuan

PyCAPGE

cocoNLP

Search Results for "indexing chinese texts"

Showing 5 open source projects for "indexing chinese texts"

Anna’s Archive

PageIndex

shuyuan

PyCAPGE

cocoNLP

Related Categories