From: Navdeep B. <pop...@jp...> - 2001-04-10 13:02:56
|
This is my reply to "Re: [dict-beta] Formatted properly?" from Radovan Garabik <ga...@me...>: Hey, is the original content in ASCII, or did I funk up the encoding when I converted it to a DICT database? Here are the entries you mentioned from the original content: Goncz, Arpad Hung. aut. & polit.; pres. of Hungary 1990-- _1922-- Yuri I (Yuri Dolgorukii) Rus. grand prince of Kiev 1149-1150, 1150, 1155-1157; founded Moscow 1147 _1090?-1157 Looks like diacretics were already missing. I'll convert the text to UTF-8 anyway, but I don't think it contains any "special" characters. About the index file, I intentionally changed the name order, users are more likely to type in names with the first name first. If someone would implement a database format like the one Rik mentioned in a recent post, I would be able to include both versions. While we're talking about database format additions, why not a category property? Database headers(see previous message) could be used to send the categories to the client(or maybe a new command completely unrelated to my headers idea), and a new command could be used to limit a database to certain categories, example: X-CATS factbook asia,europe Every time the database is searched after this command, only entries in those categories will be returned. Would anybody else find something like this useful? -- I first became aware of this message 4/10/01, 12:17:37 PM, it is now 4/10/01, 12:20:16 PM. That's a response time of 2 Minutes, and 39 Seconds. -- Navdeep Bains http://bains.hypermart.net ba...@ma... -- Your Message: >From: Radovan Garabik <ga...@me...> >To: dic...@di... >Subject: Re: [dict-beta] Formatted properly? >Date: Tue, 10 Apr 2001 10:35:02 +0200 >User-Agent: Mutt/1.3.15i > > >I wrote: >> well... this mail was written in UTF8, at least I'll >> test my MUA, I hope it appears ok :-) > >It did not appear ok... this should be correct version >(sorry for the inconvenience) > >On Mon, Apr 09, 2001 at 01:00:04PM -0700, Navdeep Bains wrote: >> I haven't received any emails about the biographical dictionary, can I assume it's been formatted properly? does it work correctly? > >well, it is ok, I have just some nitpicks (I know it does not concern you, >but I am going to tell it anyway :-)) > >citeing from README: > Each entry begins with the > person's name, the most significant name (in modern Western culture, this > means the last name) coming first. > >This is valid for database file, but index file seems >to be organized First name - Second name >This is an inconsistency that (imho) should be corrected > >Names are in ASCII, which is fine for english but bad >for other languages... >Diacritics is simply stripped, and once missing, one cannot >know how to put it back (was it Arpad Gonz or Arpd Gnz?) >(at least german names use Okish way of using ae, oe, ue) >ASCII-only index for searching is OK, but there should be at >least some way to show TRUE name in database (hint: UTF8 :-)) >This is even worse for names originating in non-latin script, >and Russian names use rather inconsistent transliteration >See transliteration of ???? ?????????? > Name: Yuri I (Yuri Dolgorukii) >Either Yuri Dolgoruki, or Yurii Dolgorukii, but NOT Yuri Dolgorukii, >it just does not make sense > >(un)fortunately, I cannot comment on arabic or chinese :-) >(though.... isn't pinyin the "official" way of transcribing chinese >names? shouldn't it be Mao Zedong instead of Mao Tse-Tung? - and >why is it not Mao Tsetung anyway?) > > >-- > ----------------------------------------------------------- >| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | >| __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | > ----------------------------------------------------------- >Antivirus alert: file .signature infected by signature virus. >Hi! I'm a signature virus! Copy me into your signature file to help me spread! > |