Re: [Obo-format] OBO file character encoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Feb 7, 2008, at 1:29 PM, Hilmar Lapp wrote:

> On Feb 7, 2008, at 11:49 AM, Chris Mungall wrote:
>
>> I think the recommendation would be to avoid non-ascii where possible
>> - most downstream consumers of obo files will react unpredictably
>
>
> I would venture to suggest that meanwhile we live in an age in which  
> i) most programming languages and libraries support different  
> character encodings perfectly fine (for example, supporting a non- 
> ASCII character encoding in Java is simply a matter of passing in an  
> additional argument to the file reader constructor), and ii) in  
> science we're collaborating globally. Needing to tell collaborators  
> how they should specify their native-language names to fit the ASCII  
> limitation doesn't feel that good, frankly.
>
> Also, frankly, I would hate to have to entertain an argument of OWL/ 
> RDF/XML vs OBO on the basis of character encoding support - my take  
> is that that argument should be unfounded.
>
> Maybe there is more involved than just putting an 'encoding' tag  
> into the header, but it sounds unlikely that it's difficult to  
> accommodate?

I think it would be great to have something like "encoding: UTF-8" (or  
whatever encoding) in the document header.  It could be optional, and  
UTF-8 could be standardly assumed if no encoding is specified.  While  
I don't have a good idea of what the consequences for current  
ontologies would be if the OBO.jar parser started assuming UTF-8, I  
think it would be better than the current situation which I think  
depends on the OS and default charset a user is running.

Thanks,
Jim

____________________________________________
James P. Balhoff, Ph.D.
National Evolutionary Synthesis Center
2024 West Main St., Suite A200
Durham, NC 27705
USA