|
From: Simon B. <Sim...@un...> - 2002-07-01 20:29:37
|
David Goodger (go...@us...) wrote:
> > I'd reorder this: (try command line). Try ASCII first, then UTF-8. If
> > ASCII passes, it most likely is ASCII. If not, and UTF-8 passes, it
> > most likely is UTF-8. Then try the locale's encoding.
>
> Out of curiosity, is there any point in trying both ASCII and UTF-8? UTF-8
> is a strict superset of ASCII, so shouldn't checking UTF-8 alone be enough
> for both? If we don't care what the original encoding was (we just want
> Unicode text to process), does explicitly checking for ASCII buy us
> anything?
Hmm - I think checkin ASCII first would bring us the explicit knowledge
that it actually is ASCII and we could label the output as ASCII
(which might make it more compatible for older Software that doesn't
know about UTF-8 and might spew out weird errors even when it is
ASCII labelled as UTF-8).
Bye,
Simon
--
Sim...@un... http://www.home.unix-ag.org/simon/
|