From: Michael B. <bu...@im...> - 2005-07-31 15:24:37
|
> If I have utf-8 databases, dictfmt removes the hyphons for the index. > Dictd still works fone then, but serpento doesn't. The current FreeDict-dictd-database-format-converter (whose output never made it into a release) creates two index entries - one with hyphens and one without, so it doesn't have to bother which index variant has to go with or without them. Maybe you can use it for some inspiration (it is written in perl). FreeDict also has a test script, which will lookup all words from an index (and report the results). Databases converted with the FreeDict converter pass that test, so it seems to work. It does not generate any 00-database-charset entry... Nevertheless, I will use the "c5" format and dictfmt in the future, so all features would be FreeDict's and all but dictfmt's :) > If foldoc would be converted to utf8, what would happen to all the > special characters which seem to be part of ascii range then? I think in non-allchars-mode (whether utf8 or not) the the index should contain only alphanumeric characters and spaces, ie. characters for which the functions isalnum() (or iswalnum() for unicode) return true. > ... utf-8 dictionarties all have one empty line at the > beginning of the fdicht file and > > 00databaseutf8 A B > > in the index file > > 00-database-utf8 seems not to be accepted in the index (seems the > same reason as above) and is not used in the dict file as entry. > > Why this differences between otther databases and utf8 ones? > > whouldn't it be more usefull having 00-database-utf8 (or allchars or > whatever) in index and dict file? I don't understand your question. You already say that 00-database-utf8 is not used in the .dict file as entry. dictd will look it up and depending on its presence configure itself accordingly. So the definition of "00-database-utf8" in the .dict file doesn't matter. What would it mean to have 00-database-utf8 in the index file? Michael |