cpDetector / News: Recent posts

cpDetector 1.0.10 released

cpDetector is a java library to detect the encoding of documents which is needed everywhere where text is transmitted over network. The project currently is number one entry for the term "codpage detection" on google. Note: From now on Java 1.5 is required.

Version 1.0.10 is a major bugfix release.

* Fixed crash in command line mode when an invalid declared charset (the "" charset) was found.
* Fixed return code of command line tool (CodepageProcessor) always being 0 even if there was an error.
* Fixed a bug that broke the possibility to reset input streams after detection. ... read more

Posted by Achim Westermann 2011-12-04

cpDetector 1.0.9 released

cpDetector is a java library to detect the encoding of documents which is needed everywhere where text is transmitted over network. The project currently is number one entry for the term "codpage detection" on google.

Version 1.0.9 is a major bugfix release and fixes two issues in command line batch mode:
* The switch to skip moving undetected documents works now again.
* No attempt will be made to transcode undetected documents (the latter caused exceptional program flow). ... read more

Posted by Achim Westermann 2011-11-16

cpDetector 1.0.8 released

cpDetector is a java library to detect the encoding of text which is needed everywhere where information is transmitted over network. The project currently is number one entry for the term "codpage detection" on google.

Version 1.0.8 is a stability release and fixes the byte order mark detection and incompatibility with OpenJDK. Also it requires Java 1.5 now.

Changelog:
https://sourceforge.net/projects/cpdetector/files/cpdetector/history.txt/view... read more

Posted by Achim Westermann 2010-06-26

IBM Alphaworks article

cpDetector was "kind of" mentioned in an article by IBM Alphaworks about Apache Tika and Apache Nutch: http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html

Posted by Achim Westermann 2010-06-16

cpdetector vs. ICU performance

Follow this link for a minimal performance comparison (one document) between cpDetector and ICU:

http://tinyurl.com/cpdetector-icu-performance

Posted by Achim Westermann 2009-08-20

cpDetector 1.0.7 released

Version 1.0.7 of cpDetector fixes some severe release bugs of version 1.0.6.
The release structure has been changed: cpdetetor.jar does not contain 3rd party library files any more. Missing missing public functions are contained again. The proguard shrinker has been updated from version 3.8 to 4.2.

Changelog:
http://sourceforge.net/project/shownotes.php?release_id=254536

Download:
https://sourceforge.net/project/showfiles.php?group_id=114421 ... read more

Posted by Achim Westermann 2008-06-17

cpDetector 1.0.6 released

A minor bugfix release of cpDetector - the configurable Java framework for code page detection of documents - has been made.
Thanks to the proguard shrinker cpdetector jar is now more than ten times smaller. Also System.out loggings were removed. All packages have been renamed with prefix: info.monitorenter.
Thanks for bug reports go to: moss.

Changelog:
http://sourceforge.net/project/shownotes.php?release_id=254536 ... read more

Posted by Achim Westermann 2008-06-14

cpDetector in comparison

Fred Eaker published a blog article in which he compares eight different code page detection libraries. One of the available strategies of cpDetector (Charset detection by parsing) did quite a good job. Read more at:
http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html

Posted by Achim Westermann 2007-05-04

cpDetector 1.0.5 released

cpDetector is a configurable Java framework for code page detection of textual documents. It may be used to allow applications like browsers, file-sharing software or search engines to correctly handle documents received over network.

The new version 1.0.5 is a bugfix release. Severe errors like a potential infinite loop and incorrect file handling have been removed.

Changelog:
http://sourceforge.net/project/shownotes.php?release_id=254536 ... read more

Posted by Achim Westermann 2007-04-21

cpDetector awakes

After more than one year without any new release or work on cpDetector the latest report of two severe bugs will cause a new release soon. Project work has already begun. Along with the bugfixes also new code conventions and improved java documentation will be released.

Posted by Achim Westermann 2006-10-18

cpDetector 1.04 released

cpDetector is a configureable java framework for code page detection of textual documents.

The new version 1.04 is a stability release. A bug in the ANT build has been fixed, the fit document encoding test has been documented.

Posted by Achim Westermann 2005-03-02

cpdetector 1.03 released

cpdetector is a configureable java framework for code page detection of textual documents. It may be used to allow applications like browsers, file-sharing software or search engines to correctly handle data received over network.
The new version 1.03 adds a feature that allows to guess a charset out of possible results which improved the test results. A best-practice command line tool [CharsetPrinter] has been added. Furthermore it marks the start of FIT testing (http://fit.c2.com/) integrated with ANT. See the changelog for more information about FIT for cpdetector. ... read more

Posted by Achim Westermann 2004-12-14

cpDetector welcomes new Developer

Demian downloaded the project and quickly contributed a best-practice solution, that integrates cpdetector (http://cpdetector.sourceforge.net/doc/javadoc/index.html?cpdetector/CharsetPrinter.html).

Posted by Achim Westermann 2004-10-25

cpdetector makes it to sourceforge's front page!

This is not intended for the front page but a diary entry that marks a milestone for cpdetector. Started in July 2004, the project has constantly evolved to a name in information mining / internationalization. This is not only the result of development but also of promotional work. For it's problem domain, cpdetector now has a high (the highest) ranking in search engines and is visited regularly (not only upon announcements).
Special thanks go to sourceforge.net, freshmeat.net, i18ngurus.com and Magda Danish of the Unicode Consortium.
For the future, more (than almoste none) feedback would really help to improve quality.

Posted by Achim Westermann 2004-10-21

cpdetector 1.02 released

cpdetector is a configureable java framework for code page detection of textual documents. It may be used to allow applications like browsers, file-sharing software or search engines to correctly handle data received over network. The new version 1.02 is a result of beginning with quality assurance and covers 2 severe bugs, additional features (XML dtd's, test/build automation and a new ASCII fallback detection implementation. ... read more

Posted by Achim Westermann 2004-10-21

cpdetector 1.01 (stable) released

cpdetector is a configureable java framework for code page detection of textual documents. It may be used to allow applications like browsers, file-sharing software or search engines to correctly handle data received over network.

With the new version 1.01 cpdetector uses the encoding declaration in xml documents. This allows a speed up when handling mostly xml, as the fallback to the jchardet guessing / exclusion is more expensive. ... read more

Posted by Achim Westermann 2004-09-24

cpdetector documentation released.

Documentation for cpdetector, a java code page
detection framework is available in the new website:
http://cpdetector.sourceforge.net.

Posted by Achim Westermann 2004-08-21

cpDetector 1.0 released.

Initial version of cpDetector has been released.
cpDetector is an extensible and highly configureable
codepage - detection framework that ships with
detection implementations based on mozilla's codpage
guessing algorithm and an ANTLR grammar based
parser that searches for the charset attribute in html
pages.

An executable is included that allows to sort documents by their codepage.

See the release notes at: http://sourceforge.net/project/shownotes.php?release_id=254536

Posted by Achim Westermann 2004-07-20

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks