|
From: <apn...@ya...> - 2025-09-28 17:14:46
|
Thanks Sergey, that corroborates what I kind of surmised. The doubt I have is what you mention at the bottom – the point of conversion between external and internal does not matter. Once the conversion is done, so is the damage (in the case the assumed system encoding was wrong). But shall wait for Don, I’m sure there was a good reason at the time that may still hold. It’s not the kind of thing that can be an inadvertent error.
/Ashok
From: Dipl. Ing. Sergey G. Brester <se...@us...>
Sent: Thursday, September 25, 2025 5:54 PM
To: apn...@ya...
Cc: tcl...@li...
Subject: Re: [TCLCORE] Questions about Tcl{Get, Set}ProcessGlobalValue functions
IIRC, the encoding argument was added some day to satisfy the needs of TclpFindExecutable (for conversion of native executable name), where previously it was privilege of the TclInitProcessGlobalValueProc handler, which returned string and encoding on demand, e. g. used for TclpInitLibraryPath, InitializeEncodingSearchPath etc.
Blame shows [353036774ea2c180] <https://core.tcl-lang.org/tcl/info/353036774ea2c180> where it was introduced initially.
Few additional points:
1. Besides TclSetProcessGlobalValue there is also init-handlers TclInitProcessGlobalValueProc that may statically register getter for the PGV.
2. The conversion with Tcl_UtfToExternal by set was added later in [5de1d4a68b9118b0] <https://core.tcl-lang.org/tcl/info/5de1d4a68b9118b0> to fix the bug " <https://core.tcl-lang.org/tcl/info/3fc3287497> TclGetProcessGlobalValue encodes information twice on Windows",
however in my opinion at a bit "wrong" place (not completely compatible and questionable), but they are internal functions, so never mind.
3. The value would be converted from/to external using encoding, only if it was set.
4. The primary purposes (initially) were the conversion on demand if system encoding changes between set and get (especially may be important if encoding dir changes).
I think it is more or less a historic thing which grew with the time and got certain "controversial" fixes (that made an initial idea almost redundant).
But because initial concept was a bit "strange" too, the time point of conversion from/to external doesn't really matter.
In my opinion, it would be fully correct to retain original value unchanged in global storage if encoding argument were the name of encoding, and not the encoding pointer.
Regards,
Serg.
25.09.2025 11:06, apnmbx-public--- via Tcl-Core wrote:
I have a question about the TclGetProcessGlobalValue / TclSetProcessGlobalValue pair of functions that I hope someone can answer. These functions are supposed to store values or settings that are shared across all threads in the process.
TL;DR why do the above functions get/set values using the *system* encoding?
As currently implemented, TclSetProcessGlobalValue encodes the Tcl_Obj value passed in using the current system encoding and stores it in a global C struct. It also stores the *original* passed in Tcl_Obj in a thread-local cache so that its internal representation is not lost for that thread’s usage. Use of epochs ensure the stale values are not used.
When TclGetProcessGlobalValue is called, the encoded value in the global C struct is decoded using the system encoding and the result is passed back to the caller in a new Tcl_Obj which is also stored in that thread’s cache. If this function is called without TclSetProcessGlobalValue having previous set the value, an initializer function is called which returns the initial value along with encoding used.
The code accounts for the fact that system encoding may change (generally only during initialization when all encodings are not immediately available) by tracking the encodings used and converting appropriately as needed.
My question is - what the purpose of this encoding / decoding pair when storing and retrieving values? The passed in values are (effectively) internal modified UTF-8 strings. Why not just store return those? This is not just a question of efficiency but correctness. There are several issues with the current implementation:
* A value being stored may not be representable using the current system encoding. Since the encoding is done using TCL_ENCODING_PROFILE_TCL8, essentially a “corrupted” value is stored in the global struct and returned by
* Likewise, there is potential for further corruption for similar reasons when the system encoding changes and the new system encoding does not support additional characters.
* Further, because the *original* Tcl_Obj remains in the thread that called TclSetProcessGlobalValue, that thread’s perception of the “global” value differs from all other threads (which see the “corrupted” value).
This seems broken to me if the whole purpose was to have global values shared across threads. It neither preserves values, nor shares them correctly. From my perspective, the global value should directly reflect he string representation of the Tcl_Obj passed in. It is the responsibility of the caller to ensure the value is correct. Once in Tcl’s internal representation, changes in system encoding should not matter.
And yet, because there is all this additional explicit machinery for encoding / decoding that has been added, I believe there was some purpose behind it. If so, what was it?
As an aside, I think there are bugs with the sequence of encoding operations as well, e.g. it assumes single byte nul terminators, epochs are checked without any thread synchronization etc. but those are secondary to the questions above.
Anybody know the answer to the above?
/Ashok
PS The context for all this is TIP 732 – trying to fully understand Tcl initialization.
_______________________________________________
Tcl-Core mailing list
Tcl...@li... <mailto:Tcl...@li...>
https://lists.sourceforge.net/lists/listinfo/tcl-core
|