From: Piotr B. <ba...@o2...> - 2008-10-08 13:45:23
|
Hi Aleksey, Aleksey Cheusov writes: >> I have a well-formed c5 file, with 00-database-short defined properly. > General c5 input should not contain 00-database-short headword > just like any other 00-database-xxxx headwords. > But they may appear from dictunformat and are used especially > by dictfmt -t to REgenerate dictionary databases. The "dictfmt -t" functionality is also used by FreeDict .c5 files -- it is so useful to be able to gather all the metadata from the dictionary header, format them, place them under 00-database-info, and then pretend the c5 comes from dictunformat. I recall now from the list archives that FreeDict stuff has not been discussed here recently, so I'm glad I can tell you about this. >> If I input it to "dictfmt -t", the 00-short is nicely placed below the >> 00-info, right before the first real headword. > Before or after - do not rely on this. A position of headwords in > .index file is dictated by dictd search algorithm (binary search). Right, I keep underestimating the index, thanks :-) >> If I input the same file to "dictfmt -t -s 'new_short'", the result is >> that, despite the "-t" option (recall that "-t" entails >> "--without-headword"), dictfmt places the actual "00-database-short" >> headword in the dictionary > This is expected behaviour. Command line option -s overrides > 00-database-short given on input. OK, so let me reformulate now: dictunfmt does not place an explicit "00-database-short" in the c5 file -- it only relies on the .index. Why does option "-s" not do the same? (This is still my curiosity, mind you, I'm not implying that this is bad, I'm just wondering about the reasons for the difference, hoping to learn more). > A purpose of --without-headword is different. Run dictfmt -c5 with > this option and without it and then compare dict's (or dictunformat's) > output. You're right! (of course!) the system headwords do not get duplicated like the normal headwords. That's one more thing straightened out, thanks again. >> Why doesn't dictfmt do a simple substitution of the 00-short inherited >> from .c5 by the new_short supplied on the cmdline? > First, see above. 00-database-xxx headwords normally should not appear > in c5 files. Second, they MAY appear only for REgenerating dictionary ^^^^ I just hope you are ok with FreeDict grabbing this opportunity to handle metadata well -- it was ingenious of whoever got this idea (Michael, Horst?), to emulate unformatted c5 files. Best, Piotr |