cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
- Extendable framework for detection strategies
- Byte order mark detection
- ASCII detection
- Guessing strategy (jchartdet, based on the mozilla code page detection)
- XML header detection
- HTML header detection
- Command line interface for transcoding / detecting / sorting (by codepage) trees of files
- See comparison: http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html
- Fast: http://tinyurl.com/cpdetector-icu-performance
Be the first to post a review of cpDetector!