Re: [Libaps-general] Printer name selection

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 13 Jun 2000, you wrote:
> Hi Waldo,
>
> > E.g. with a printcap entry like:
> >
> > lp|ap|arpa|ucbarpa|LA-180 DecWriter III:\
> >
> > I get "ap" as name. I rather would be able to show "LA-180 DecWriter III"
> > to the user.
>
> Coincidentally, this is something that I've been working on today. The
> heuristic that I'm using is, when a printcap entry has multiple alias
> names, to prefer the first one that contains a space character as the
> primary name for that printer. Do you think this rule of thumb will cover
> most situations? The alternative that I thought of was to select the
> longest of the names.

Maybe you could differentiate between "primary name" and "human readable 
name" I can imagine that CUPS or so allows you to set a description for a 
printer. Such a description could then be used as the "human readable name".

> > Slightly related, all strings are represented as "const char *". Assuming
> > that in some cases these strings are user defined it would be usefull to
> > define in the API what the character encoding of such a string is
> > supposed to be. utf8 seems a good candidate to me.
>
> I can certainly see the use in this. I have to admit that I am not
> personally an expert in different character encodings nor
> localization/globalization issues in general. Can you give me more
> information on what this character set is, and what alternatives we could
> potentially specify?

The idea is that there are more characters than defined by ASCII and that 
there are even more characters as fit in a single byte. A character encoding 
defines a way to map a set of characters to byte-values. The most simple ones 
are those that map a single character to a single byte-value. Examples of 
such encodings are ASCII, latin-1, latin-2, etc. It is obvious that such an 
encoding can never define more than 256 characters. It is equally obvious 
that there are characters which have a byte-value defined in one encoding, 
but not in another encoding.

Since this started to become a bit of a mess, people invented unicode. 
unicode defines a lot of characters and gives them all a unique value.
Typically they gave them a value that fits in 16 bits. Since a lot of 
computer programs think of text as sequences of bytes (char) an encoding 
mechanism has been introduced that encodes these 16-bit unicode values
into one or more(!) bytes in a sort of clever way, this is called utf8. If 
you have characters in a certain encoding it is always possible to convert 
them to unicode. E.g. you can translate every latin1 string to a 
corresponding unicode string. The other way around is not true, a unicode 
string might contain characters that don't have a representation in latin1. 
With utf8 you do not have this problem.

Unfortunately not everyone today uses utf8, some still use e.g. latin1 
(because if you only write english texts you don't really need these other 
characters much). A users "locale" defines which encoding this user typically 
uses e.g. for filenames and such.

Assuming that you don't want to discriminate anyone who happens to use a 
certain range of characters the options for defining a encoding in an 
interface are basically limited to "utf8" because it can represent every 
possible character, or "the encoding specified in the locale" since that is 
the one that the user has choosen. Note that the user could have chosen to 
use utf8 in his locale.

E.g. all filenames are typically encoded with "the encoding specified in the 
locale". The problem of using "the encoding specified in the locale" is that 
the string might originate from another system/other user which might have 
another locale and that you then have to convert this to the locale in use by 
this user and in such a conversion you might loose information.

For a more accurate description of this topic see http://www.unicode.org.

Cheers,
Waldo
-- 
Make way, KDE/Linux is coming to a desktop near you!