[Vim-latex-devel] Re: encoding woes
Brought to you by:
srinathava,
tmaas
From: Luc H. <her...@fr...> - 2002-12-20 17:17:07
|
* On Fri, Dec 20, 2002 at 10:04:43AM -0500, Benji Fisher <be...@me...> wrote: > I would like to start work on solving the encoding problems. I > think we have to consider all combinations of > > 1. 'enc' is latin1 or utf-8 > 2. 'fenc' is empty or latin1 or utf-8 > 3. the text or the "input" place holders or the "output" place holders > are outside the 7-bit ASCII range (i.e., they are "funky," to use the > technical term ;). > > What are the situations where real problems exist? Can we have any certitude about the encoding used by the script file ? So far, we write our scripts in latin1. But if for one reason or another the end user changes something in a script and its encoding on the way, can we be sure iconv(...'latin1') will still work ? (Same issue with nr2char() & co) > One part of the solution is to avoid using funky characters in script > files. I can't agree more. I'm slowy rewriting my scripts to fix this issue. So, now, when I need the markers for instance, I use the function: "Marker_Txt(the_comment)" > The only remaining problem, AFAIK, is that funky characters do not > match themselves in some contexts. I think I can get around this > using iconv() as discussed below. If possible, I'd rather not having to care of the enconding within the jumpings function, but when we set the buffer-variables only. May be, we can write mutators (aka "setters") for setting the value of the strings used for opening and closing the marker (aka placeholder). The mutator will take care of converting the funky characters received to something that can be written, matched, searched and substituted. > Luc Hermitte wrote: > A little experimentation shows that > :let foo = nr2char(char2nr("\xab")) > has the same effect as > :let bar = iconv("\xab", "latin1", &enc) > > (I.e., ":echo foo == bar" returns 1.) Note that strlen(foo) is 2, even > though strlen("\xab") is 1! Thus the solution using nr2char() and > char2nr() and the solution using iconv() are similar, but iconv() has > the advantage that it is easy to convert back: > > :echo "\xab" == iconv(foo, &enc, "latin1") > returns 1. I prefer iconv() (now you've shown me this function) for converting strings. Let's suppose the end user wishes to use "«¡" and "!»" in utf-8. iconv() in that case is easier to use. But, this seems to suppose that the file where the opening and closing strings are defined is in latin1. Does having this script file in utf-8 changes anything ? So, my proposal looks like (not tested, many bugs expected, I have to go in a few minutes): "function! s:SetPlaceholder function! s:SetMarker(open, close, ...) if (a:0 != 0) if (a:1 :!~ '[bgw]') call s:Error('incorrect scope character') return endif let {a:1}:marker_open = iconv(a:open, current_encoding, &enc) let {a:1}:marker_close = iconv(a:close,current_encoding, &enc) " or IMAP_placeholder_left and right ... don't remember exactly else let b:marker_open = iconv(a:open, current_encoding, &enc) let b:marker_close = iconv(a:close,current_encoding, &enc) endif endfunction command! -nargs=* SetMarker(<args>) " + the jumpings function that don't use iconv(), nor change the current " encoding to latin1, even momentarally -- I didn't have to with " nr2char(char2nr(...)). The big problem is to know the exact value to use for current_encoding that will work for: - a script in latin1 _or_ utf-8 that will use: :SetMarker "\xab" "\xbb" b - SetMarker executed from the command line (in interactive mode) whatever the current values for &enc and &termenconding are. Moreover using this mutator should be easier for end-users than having to set to variables. -- Luc Hermitte http://hermitte.free.fr/vim/ |