"Jaba" Chinese word segmentation, do the best Python Chinese word segmentation component. Four word segmentation modes are supported. Precise mode, which tries to cut the sentence most precisely, suitable for text analysis. Full mode, scans all the words that can be formed into words in the sentence, the speed is very fast, but the ambiguity cannot be resolved. The search engine mode, on the basis of the precise mode, divides the long words again to improve the recall rate, which is suitable for word segmentation in search engines. The paddle mode uses the PaddlePaddle deep learning framework to train the sequence labeling (bidirectional GRU) network model to achieve word segmentation. Also supports part-of-speech tagging. To use paddle mode, you need to install paddlepaddle-tiny, pip install paddlepaddle-tiny==1.6.1. Currently paddle mode supports jieba v0.40 and above. For versions below jieba v0.40, please upgrade jieba, pip install jieba --upgrade.

Features

  • Although jieba has the ability to recognize new words, adding new words by yourself can ensure a higher accuracy rate
  • Developers can specify their own custom dictionaries to include words that are not in the jieba thesaurus
  • Dictionaries can be modified dynamically in the program
  • Keyword extraction based on TextRank Algorithm
  • The Inverse Document Frequency (IDF) text corpus used for keyword extraction can be switched to the path of a custom corpus
  • Dynamic programming is used to find the maximum probability path

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow jieba

jieba Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of jieba!

Additional Project Details

Operating Systems

Linux, Windows

Programming Language

Python

Related Categories

Python Word Processors, Python Languages Software, Python Deep Learning Frameworks

Registered

2022-02-18