The Doc2Html commandline operating program strippes the Word produced html files (by opening the documet, saving as html) leaving pure text + minimum html code. It also has a mode to convert data berween different charsets: DOS, Windows-1250 and ISO-8859
Adapt is data conversion language developped in 1984 by Norman W. Molhant and Christophe Dupriez. It has been used in many circumstances, it translated itself in many programming environment and it should evolve now toward modern environments like Java.