|
From: <apn...@ya...> - 2025-12-05 08:15:33
|
TIP #709: No While I have expressed my dissent many times, for the record, I want to detail the specific problems I have discovered on code review and testing in the last couple of evenings. I would ask TCT members to read through before voting, given that the TIP was put forward as a request for a license approval for a “bug fix” and the TIP itself has zero details, a single line in the Specification, something rather disturbing in my mind for such a consequential change, whether seen as a bug fix or otherwise. Disclaimer: my findings have not been peer-reviewed 😊 Check for yourself or trust Jan’s assertions and vote accordingly. My objections can be broken into a. Philosophical opposition to the use of undocumented Windows data structures and interfaces for very little gain. I have said this before but below I add evidence that Jan’s assertions about stability between releases is false. b. Reservations about quality of implementation based on observed failure cases for even simple Win32 API calls in DLL’s loaded from memory, inadequacy of test coverage, and review of the MemoryModule library. First, with respect to (a), For the benefit of those new to this debate, MemoryModule reimplements parts of the Windows loader using undocumented structures and algorithms. There is no guarantee its behavior matches the real loader across current Windows versions leading to API failures. Examples below. Further, because these internals change between Windows releases—and even between minor updates—any apparent robustness today provides no assurance of correctness tomorrow. On this last point, I have to counter Jan’s confident assertion <https://sourceforge.net/p/tcl/mailman/message/59118711/> that, quoting, The future risk in Windows Changes: I don't see that happen. The changes in Vista were probably either for security reasons, or for extension to x64, both are being used a long time now in various Windows versions. I don't see the need for more changes coming. Next change in the Windows kernel is probably the switch to the Linux kernel. I don’t know the basis for Jan’s comments but a simple Google search or comparison of SDK releases would have shown LDR_DATA_TABLE_ENTRY which is the fundamental structure tracking loaded DLL's in Windows has changed in size and offsets between Vista, Win7, Win8 and then again at least two Win 10. So much for stability of internals. The fact that MemoryModule is a thirdparty library that was basically orphaned in 2015, coincidentally right around the time Win 10 was released, means the onus to track and maintain falls on the Tcl core team. Good luck with that. Since Jan dismissed my concerns around undocumented interfaces as exaggerations <https://sourceforge.net/p/tcl/mailman/message/59116993/> , and will likely do so again, I’ll move on to (b) concrete issues around the implementation and test suite. First the implementation, In my testing of TIP 709 builds, I found even the basic functions GetModuleHandleExW and EnumProcessModules did not work correctly when loading from memory was in effect. They of course work fine with trunk, 9.0 or TIP 741. Jan can verify for himself that it was not pilot error on my part. I had no interest in trying other functions like loading resources if even these basic functions did not work. (There is one function that does work – GetModuleFileName – that I will revisit below) Looking at the code (debugging fails because debugger is oblivious to DLL’s loaded in MemoryModule), the root cause is non-trivial. It does not appear that MemoryModule integrates with the Windows loader structures at all. In particular, my trusty Windows Internals reference tells me DLL’s are tracked in Windows through three linked lists hanging off the process’ PEB. MemoryModule makes no attempt to update these (with good reason probably, access needs to be synchronized with the kernel using locks that are not exposed). The result is failure in Windows functions that cannot locate handles created via MemoryModule. As an aside, this hiding from Windows may be difficulty of implementation or intentional. MemoryModule’s primary use seems to be in malware to hide DLL’s from Windows security tools. Sufficiently widespread that CCCS has a YARA malware rule for MemoryModule on its front page <https://github.com/CybercentreCanada/CCCS-Yara/blob/master/README.md> ! There was one anomaly that aroused my curiosity – the GetModuleFileName{A,W} functions worked, which was unexpected. It turns out that the reason they work is that TIP 709 (i.e. Tcl, not MemoryModule) effectively hooks these two Windows calls via the GetProcAddress callbacks and if the pointer matches its internal handle, returns the corresponding file path. The implications are that to work correctly, Tcl has to similarly hook the entire Windows API that uses DLL module handles!!! And implement their functionality! A monumental task considering that even the very simple GetModuleFileName function implementations in 709 are faulty (they return paths using forward slashes, and in the case of the *A variant return the path in Tcl’s internal TUTF-8 encoding instead of the caller’s ANSI locale). Do we really want Tcl to be in the business of intercepting Windows API’s ? (Note the need to intercept Windows API’s and the lack of integration with Windows loader are two separate issues that need to be solved.) For added amusement, we have the situation that the result of GetModuleFileName is not usable by other Windows API’s. PathFileExists(GetModuleFileName) will return false leaving some programmer head-banging in frustration (not our problem though, they will blame Bill Gates). But why is GetModuleFileName singled out for hooking? That question brings us to the test suite … Considering MemoryModule is taking on the considerable task of the Windows loader, a reasonable test suite must be mandated. I’d rather not have a repeat of the zipfs fiasco <https://core.tcl-lang.org/tcl/rptview/28> . At a very minimum the 15-20 lower level DLL related API must be tested. Instead, the only function tested is …drum roll... GetModuleFileName! Don’t look behind the curtain! The only comment I have on this is that it vaguely reminds me of the old SPEC compiler benchmark rigging 😊 Outside of the public API itself, there is considerable loader functionality behind the scenes that I had listed way back in January. I do not see that being tested. Jan has repeatedly said “extensive tests are currently done”, “all problems have been fixed” and similar. But outside of a reference to LUCK, there is no detail as to what those tests were. Given the above issues, and similar statements being made about TLS issues earlier that were not fixed (by excluding TLS using DLL’s from loading from memory) until I supplied the actual code demonstrating breakage, forgive me for being skeptical. A non-exhaustive list … * DLL related API’s as mentioned earlier. How does “extensive testing” miss basic API calls? * x86 architecture since there are significant differences in PE formats, exception handling mechanisms and the like. I do not see this tested at all (test suite fails but I believe that is because there is no 32-bit test DLL). Ditto for ARM if we officially support ARM. Or the TIP should explicitly say those architectures are not supported. * Exception handling testing is (like the original TLS test case) testing the wrong scenario. It’s loading thrower.dll from disk, which means the exception tables are set up by the Windows loader. The test case needs to be adapted so it is MemoryModule that is responsible for setting up the exception tables. I’ll ignore the fact that the exception test case is very simple. * Exception handling only tests SEH, not C++ exceptions which use a different mechanism. * ASLR, DEP, delay load dependencies… Look at the loader flags in winternl.h for what might break * nested load scenarios (import resolution will potentially load libraries which may themselves load dependencies from zipfs) * multithread scenarios – I believe there is a race condition here. While Tcl’s own structures are under a lock, the call to MemoryModule load function is not. Since MemoryModule itself does not seem to do any synchronization, I suspect concurrent calls from multiple threads can result in the DLL being loaded twice. Which is a no-no. I understand that is a lot. But so is the Windows functionality 709 chooses to re-implement. I anticipate a question from Jan, as he has asked before in this and other instances in one form or another – “But show me which Tcl extensions use these facilities”. I am not going to do that. Not my approach to software quality. Jan did say in his CFV warning: I don't think it's possible to convince Ashok: I answered on all his remarks, even created testcases especially for those situations. Still he keeps having his reservations. No-one is implementing an alternative. Christian Werner is using it, no-one ever filed a bug-report to him. It is in fact true, I cannot be convinced at this point. I take issue with the rest of the comment but that is immaterial. No further development will convince me this is worth doing and have my support in the future. I will not even review, too much time has been spent already. Someone else from the TCT will have to supply an additional Yes vote. Sorry /Ashok |