Re: [Libaps-general] Printer name selection
Brought to you by:
brianpirie,
tranter
From: Waldo B. <ba...@kd...> - 2000-06-14 00:25:02
|
On Tue, 13 Jun 2000, you wrote: > Hi Waldo, > > > E.g. with a printcap entry like: > > > > lp|ap|arpa|ucbarpa|LA-180 DecWriter III:\ > > > > I get "ap" as name. I rather would be able to show "LA-180 DecWriter III" > > to the user. > > Coincidentally, this is something that I've been working on today. The > heuristic that I'm using is, when a printcap entry has multiple alias > names, to prefer the first one that contains a space character as the > primary name for that printer. Do you think this rule of thumb will cover > most situations? The alternative that I thought of was to select the > longest of the names. Maybe you could differentiate between "primary name" and "human readable name" I can imagine that CUPS or so allows you to set a description for a printer. Such a description could then be used as the "human readable name". > > Slightly related, all strings are represented as "const char *". Assuming > > that in some cases these strings are user defined it would be usefull to > > define in the API what the character encoding of such a string is > > supposed to be. utf8 seems a good candidate to me. > > I can certainly see the use in this. I have to admit that I am not > personally an expert in different character encodings nor > localization/globalization issues in general. Can you give me more > information on what this character set is, and what alternatives we could > potentially specify? The idea is that there are more characters than defined by ASCII and that there are even more characters as fit in a single byte. A character encoding defines a way to map a set of characters to byte-values. The most simple ones are those that map a single character to a single byte-value. Examples of such encodings are ASCII, latin-1, latin-2, etc. It is obvious that such an encoding can never define more than 256 characters. It is equally obvious that there are characters which have a byte-value defined in one encoding, but not in another encoding. Since this started to become a bit of a mess, people invented unicode. unicode defines a lot of characters and gives them all a unique value. Typically they gave them a value that fits in 16 bits. Since a lot of computer programs think of text as sequences of bytes (char) an encoding mechanism has been introduced that encodes these 16-bit unicode values into one or more(!) bytes in a sort of clever way, this is called utf8. If you have characters in a certain encoding it is always possible to convert them to unicode. E.g. you can translate every latin1 string to a corresponding unicode string. The other way around is not true, a unicode string might contain characters that don't have a representation in latin1. With utf8 you do not have this problem. Unfortunately not everyone today uses utf8, some still use e.g. latin1 (because if you only write english texts you don't really need these other characters much). A users "locale" defines which encoding this user typically uses e.g. for filenames and such. Assuming that you don't want to discriminate anyone who happens to use a certain range of characters the options for defining a encoding in an interface are basically limited to "utf8" because it can represent every possible character, or "the encoding specified in the locale" since that is the one that the user has choosen. Note that the user could have chosen to use utf8 in his locale. E.g. all filenames are typically encoded with "the encoding specified in the locale". The problem of using "the encoding specified in the locale" is that the string might originate from another system/other user which might have another locale and that you then have to convert this to the locale in use by this user and in such a conversion you might loose information. For a more accurate description of this topic see http://www.unicode.org. Cheers, Waldo -- Make way, KDE/Linux is coming to a desktop near you! |