|
From: Jim B. <ba...@ne...> - 2008-02-07 19:00:07
|
On Feb 7, 2008, at 1:29 PM, Hilmar Lapp wrote: > On Feb 7, 2008, at 11:49 AM, Chris Mungall wrote: > >> I think the recommendation would be to avoid non-ascii where possible >> - most downstream consumers of obo files will react unpredictably > > > I would venture to suggest that meanwhile we live in an age in which > i) most programming languages and libraries support different > character encodings perfectly fine (for example, supporting a non- > ASCII character encoding in Java is simply a matter of passing in an > additional argument to the file reader constructor), and ii) in > science we're collaborating globally. Needing to tell collaborators > how they should specify their native-language names to fit the ASCII > limitation doesn't feel that good, frankly. > > Also, frankly, I would hate to have to entertain an argument of OWL/ > RDF/XML vs OBO on the basis of character encoding support - my take > is that that argument should be unfounded. > > Maybe there is more involved than just putting an 'encoding' tag > into the header, but it sounds unlikely that it's difficult to > accommodate? I think it would be great to have something like "encoding: UTF-8" (or whatever encoding) in the document header. It could be optional, and UTF-8 could be standardly assumed if no encoding is specified. While I don't have a good idea of what the consequences for current ontologies would be if the OBO.jar parser started assuming UTF-8, I think it would be better than the current situation which I think depends on the OS and default charset a user is running. Thanks, Jim ____________________________________________ James P. Balhoff, Ph.D. National Evolutionary Synthesis Center 2024 West Main St., Suite A200 Durham, NC 27705 USA |