cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
Features
- Extendable framework for detection strategies
- Byte order mark detection
- ASCII detection
- Guessing strategy (jchartdet, based on the mozilla code page detection)
- XML header detection
- HTML header detection
- Command line interface for transcoding / detecting / sorting (by codepage) trees of files
- See comparison: http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html
- Fast: http://tinyurl.com/cpdetector-icu-performance
License
Mozilla Public License 1.1 (MPL 1.1)Follow cpDetector
You Might Also Like
Our 360 recognition & rewards platform enables everyone (peer to peer & manager to employees alike) to send meaningful recognition rooted in core values. Nectar has the most extensive rewards catalog so users can choose from company branded swag, Amazon products, gift cards or custom reward types. Integrate with your other tools like Slack and Teams to make sending recognition easy. We support top organizations like MLB, SHRM, Redfin, Heineken and more.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of cpDetector!