FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). It also includes links to academic papers, open-source model implementations, and practical utilities like word segmentation or text cleaning scripts. The project is highly community-oriented, frequently updated with contributions and new resources, and it’s widely used in both academic and applied NLP research. Its value lies in providing not just tools but also curated, domain-specific data, which can be hard to find elsewhere.
Features
- Massive collection of Chinese NLP corpora and lexicons
- Sentiment and emotion dictionaries tailored for Chinese language tasks
- Datasets for classification, NER, and knowledge graph building
- Curated stopword lists, sensitive word lists, and slang dictionaries
- References to pretrained models and implementations for Chinese NLP
- Continuously updated, community-driven resource hub