From: Piotr B. <ba...@o2...> - 2008-10-16 20:16:20
|
[again, I sent this privately only; resending to the list, with a minor edit] Hello Andrew and Others, Andrew Dougherty writes: > While browsing legal torrent download sites, I came across Wikipedia > as an Mdict file. I tried to convert it but no luck. So I just wrote > some perl scripts to convert the Wikipedia dump to a dict file(s). > This is taking a while. I think I will generate 2 versions, one the > full version and the other using automatic summarization to reduce its > size. Sounds nice. I know that {en,fr,de,es,it}wiki is accessible via DICT, from aioe.org, but I have no idea how current their databases are. Does anyone here know anything about that? It would be good to have, let's say, a weekly dump available as .dict -- I'm thinking of the promotional aspect of this, at least. But I have no idea how aioe.org does that and whether they roll up their own databases or use someone else's dumps. > Is there any interest in this? Has this been done to death? It hasn't been done to death, I don't recall any discussion on this in the list archives. As for interest... I know that Aleksey concentrates on the software, and that Rik doesn't (apparently) have the time for the web site and the like. Maybe the ftp server at dict.org could (?) host Andrew's wiki-dicts, if those at aioe.org are obsolete? I don't know how this is handled and I have no idea who took (takes?) care of updating DICT databases, and whether any such person ever existed. > Also, I've made a slightly larger Irish dictionary than freedict's > irish english dict using some sources on the web. I doubt they are > free to reuse. So -- just making sure -- we can't hope to get this for distribution via freedict? Ah, on a slightly different though related note: I've been thinking of grabbing wiktionaries for freedict, as separate language-pair packages (they have rather amateurish lexicographic control, on the average, so are not to be fully trusted, but on the other hand, why not reuse them for a noble purpose...) -- has anyone done any manipulation of the XML output of wiktionaries and can offer some pointers on that, please? Conceptually, it seems rather trivial, but maybe there are some obstacles that I am not aware of. Thanks, Piotr |