[pywin32-bugs] [ pywin32-Bugs-1944375 ] PyRegEnumValue fails on i18n systems
OLD project page for the Python extensions for Windows
Brought to you by:
mhammond
From: SourceForge.net <no...@so...> - 2008-05-15 13:55:30
|
Bugs item #1944375, was opened at 2008-04-16 21:11 Message generated for change (Comment added) made by neverjade You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=551954&aid=1944375&group_id=78018 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: win32 Group: None Status: Open Resolution: Rejected Priority: 5 Private: No Submitted By: Christopher Nelson (neverjade) Assigned to: Nobody/Anonymous (nobody) Summary: PyRegEnumValue fails on i18n systems Initial Comment: If you try to enumerate the contents of the registry key: "SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Control Panel\\Cursors\\Schemes" pywin32 will fail with a (234, "PyRegEnumValue", ""), which means that RegEnumValue returned with ERROR_MORE_DATA. The method pywin32 currently uses to detect the size of the value name buffer does not work properly on Windows 2003 X64 Japanese. To fix this bug you must make the code look like the follwing: // @pymethod (string,object,type)|win32api|RegEnumValue|Enumerates values of the specified open registry key. The function retrieves the name of one subkey each time it is called. static PyObject * PyRegEnumValue( PyObject *self, PyObject *args ) { // This value is taken from MSDN docs. const DWORD maxValueNameSize=16384; HKEY hKey; PyObject *obKey; int index; long rc; TCHAR retValueBuf[maxValueNameSize]; BYTE *retDataBuf; DWORD retValueSize = maxValueNameSize; DWORD retDataSize=0; DWORD typ; // @pyparm <o PyHKEY>/int|key||An already open key, or any one of the following win32con constants:<nl>HKEY_CLASSES_ROOT<nl>HKEY_CURRENT_USER<nl>HKEY_LOCAL_MACHINE<nl>HKEY_USERS // @pyparm int|index||The index of the key to retrieve. if (!PyArg_ParseTuple(args, "Oi:PyRegEnumValue", &obKey, &index)) return NULL; if (!PyWinObject_AsHKEY(obKey, &hKey)) return NULL; // @pyseeapi PyRegEnumValue PyW32_BEGIN_ALLOW_THREADS rc=RegEnumValue(hKey, index, retValueBuf, &retValueSize, NULL, &typ, NULL, &retDataSize); PyW32_END_ALLOW_THREADS // Reset because the call above messed it up. retValueSize=maxValueNameSize; // Don't need to increment because the size returned from RegEnumValue includes any needed terminators. retDataBuf= (BYTE * )alloca(retDataSize); if ((retDataBuf==NULL)){ PyErr_NoMemory(); return NULL; } rc=RegEnumValue(hKey, index, retValueBuf, &retValueSize, NULL, &typ, retDataBuf, &retDataSize); if (rc!=ERROR_SUCCESS) { return ReturnAPIError("PyRegEnumValue", rc); } PyObject *obData=PyWinObject_FromRegistryValue(retDataBuf, retDataSize, typ); if (obData==NULL) { return NULL; } PyObject *retVal = Py_BuildValue("NOi", PyWinObject_FromTCHAR(retValueBuf), obData, typ); Py_DECREF(obData); return retVal; // @comm This function is typically called repeatedly, until an exception is raised, indicating no more values. } ---------------------------------------------------------------------- >Comment By: Christopher Nelson (neverjade) Date: 2008-05-15 13:55 Message: Logged In: YES user_id=13396 Originator: YES Ah. The .reg file I attached earlier fails properly on a Korean / Japanese system. It seems to work fine on an English system. I suspect this is due to some underlying difference in the windows registry handling code. I don't think it is possible to reproduce this w/o a version of Windows that is localized in this way. I could probably make an RDP session to such a system available if that would help you or someone else verify this. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2008-05-15 13:25 Message: Logged In: YES user_id=14198 Originator: NO > The test cases won't fail on Latin character sets because they are > typically 1-byte granular, so 1 TCHAR ends up being 1 byte. The Japanese > and Korean encodings are not that simple. Yeah - I was trying to say that if we could use the native Unicode registry functions to write Unicode values that can't be represented in latin1/mbcs, then we would probably be well on the way to reproducing this. But best I can see, that's not currently possible with pywin32/_winreg - although I can't see why win32api and/or _winreg can't be modified to handle Unicode natively in that case. In the meantime, maybe a .reg file could be used to simulate the same thing - as you mention, the key is probably to end up with characters that can't be represented in 1 byte inside our English registry for test purposes. Thanks! ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-15 12:39 Message: Logged In: YES user_id=13396 Originator: YES The test cases won't fail on Latin character sets because they are typically 1-byte granular, so 1 TCHAR ends up being 1 byte. The Japanese and Korean encodings are not that simple. I am also unhappy with setting to a maximum value, so I have revised RegEnumKey and friends to loop when detecting ERROR_MORE_DATA. I will also change that for RegEnumValue. The Windows API seems a bit fuzzy. However, QueryInfoKey supposedly returns the maximum size of a key and value name in Unicode characters, whereas EnumKey/EnumValue say that they return the exact size in TCHARS. TCHARS are not necessarily unicode characters, which makes it frustrating to interpret. There are a variety of unicode encodings, and they are not necessarily byte_width * 2. For example, in the output I pasted below the key was apparently 8 TCHARS long, but that translated to a buffer of 12 bytes plus a NULL sequence. I will look at the CVS version and the link you pasted below and see if I can reproduce the problem with those versions. ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-15 12:38 Message: Logged In: YES user_id=13396 Originator: YES The test cases won't fail on Latin character sets because they are typically 1-byte granular, so 1 TCHAR ends up being 1 byte. The Japanese and Korean encodings are not that simple. I am also unhappy with setting to a maximum value, so I have revised RegEnumKey and friends to loop when detecting ERROR_MORE_DATA. I will also change that for RegEnumValue. The Windows API seems a bit fuzzy. However, QueryInfoKey supposedly returns the maximum size of a key and value name in Unicode characters, whereas EnumKey/EnumValue say that they return the exact size in TCHARS. TCHARS are not necessarily unicode characters, which makes it frustrating to interpret. There are a variety of unicode encodings, and they are not necessarily byte_width * 2. For example, in the output I pasted below the key was apparently 8 TCHARS long, but that translated to a buffer of 12 bytes plus a NULL sequence. I will look at the CVS version and the link you pasted below and see if I can reproduce the problem with those versions. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2008-05-15 01:03 Message: Logged In: YES user_id=14198 Originator: NO You might like to try out starship.python.net/crew/mhammond/pywin32-210.9.win32-py2.5.exe and see if you can still repro it. ---------------------------------------------------------------------- Comment By: Roger Upole (rupole) Date: 2008-05-15 00:03 Message: Logged In: YES user_id=771074 Originator: NO In CVS, win32apimodule.cpp has already been updated to calculate the name buffer size in TCHARs, and can be compiled with UNICODE defined to call the wide-character versions of the API functions. With the current CVS code (compiled with or without UNICODE), I don't get an error while reading the registry data you posted. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2008-05-14 22:42 Message: Logged In: YES user_id=14198 Originator: NO Thanks for the update, but I'm afraid I will not have time for a few days, but a couple of thoughts: does the test case fail on English windows? If the issue is simply that the number of bytes returned is sometimes too small, presumably when multi-byte characters exist, it should be possible to repro by writing Unicode that has a multi-byte representation and trying to read it back. When you say: > If you read the documentation for that RegEnumValue closely, you will > realize that it doesn't actually return what you think. can you be more specific? I'd be happy with code that handled the ERROR_MORE_DATA case, but I'm not happy with setting a hard-coded limit to either key or value size, and ideally I'd like *some* indication why the code in question isn't working correctly - as we are discussing the API itself here, I bet we are not the first people to strike this issue. Hence I'm quite keen to see this fail myself (if for no better reason than I can then submit a patch upstream for Python's _winreg module - they will insist on a test) ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 19:43 Message: Logged In: YES user_id=13396 Originator: YES I have attached the .cpp file containing the program I used to generate these results. File Added: main.cpp ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 19:42 Message: Logged In: YES user_id=13396 Originator: YES I believe there are two problems: (1) MS's registry code works strangely (2) python and pywin32 assume that the functions are returning sizes in bytes, whereas the API documentation clearly says unicode characters (or in some places, TCHARS.) This happens to work by accident on Latin character set systems b/c they usually use utf-8 or similar 8-byte systems. Korean and Japanese systems do not. The following is output of a C++ program I wrote to examine the functionality of the registry functions on a 64-bit japanese localized system: C:\temp>\\tsclient\dev\registry_test.exe info: number of subkeys: 2 maximum length of subkey name: 8 number of values: 0 maximum length of value name: 0 info: read key name: size before call: 9 size after call: 5 buffer growth: 0 contents: agent warn: the buffer allocated was not large enough: size before call: 9 size after call: 9 resizing to: 10 warn: the buffer allocated was not large enough: size before call: 10 size after call: 10 resizing to: 12 warn: the buffer allocated was not large enough: size before call: 12 size after call: 12 resizing to: 15 info: read key name: size before call: 15 size after call: 12 buffer growth: 3 contents: \x8c\xc2\x82\xcc\x83t\x83@\x83C\x83\x8b info: done == Some explanation: Size before call is the size of the buffer we allocated in order retrieve the results of RegEnumKeyEx. Size after call is what RegEnumKeyEx wrote back INTO the buffer size parameter after the RegEnumKeyEx call. Resizing to is the size in bytes of the buffer that has been enlarged to potentially fit the contents. buffer growth is the increment over the value returned by RegEnumKeyEx in the lpcbName buffer. ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 14:33 Message: Logged In: YES user_id=13396 Originator: YES I should also clarify - this is a problem with key and value NAMES. Not the data. ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:24 Message: Logged In: YES user_id=13396 Originator: YES I have attached a registry key that causes the failure that I posted below for your debugging pleasure. ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:23 Message: Logged In: YES user_id=13396 Originator: YES File Added: opsware_reg.dat ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:22 Message: Logged In: YES user_id=13396 Originator: YES I have also discovered the same problem in RegEnumKey today. The error occurs this way: Testing: SOFTWARE\Opsware ========================= agent Traceback (most recent call last): File "c:\temp\testreg3.py", line 21, in ? print win32api.RegEnumKey(r2, index) pywintypes.error: (234, 'RegEnumKey', '\x83f\x81[\x83^\x82\xaa\x82\xb3\x82\xe7\x 82\xc9\x82\xa0\x82\xe8\x82\xdc\x82\xb7\x81B') ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:12 Message: Logged In: YES user_id=13396 Originator: YES File Added: PyRegEnumValue.c ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:09 Message: Logged In: YES user_id=13396 Originator: YES File Added: test_pywin32_reg.py ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:04 Message: Logged In: YES user_id=13396 Originator: YES File Added: test_winreg.py ---------------------------------------------------------------------- Comment By: Christopher Nelson (neverjade) Date: 2008-05-14 13:03 Message: Logged In: YES user_id=13396 Originator: YES Mark, please read the bug description. This happens on i18n systems. For example, Windows 2003 x64 localized in Japanese or Korean. I am sorry that you haven't had any bug reports on this, but I can guarantee you that if you run that code on a w2k3 Japanese or Korean localized system, you will hit that bug. With respect to you comment that // Reset because the call above messed it up. retValueSize=maxValueNameSize; is wrong, I would agree with you, except that I watched the original code not work. I spent a number of hours tracking this problem down in our production code, and it came back to this function in pywin32. FWIW, _winreg manifests the same problem, in the same way. If you read the documentation for that RegEnumValue closely, you will realize that it doesn't actually return what you think. Also, on i18n systems, the ascii version of this call simply doesn't work correctly. Partly this is due to size limitations of the API call. In any case, the MSDN documentation does not try to get the key's name size, it simply uses this value. I suspect this is because they know that the ascii version of this call does not work properly. I am attaching the scripts that fail below. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2008-05-05 12:59 Message: Logged In: YES user_id=14198 Originator: NO I had another look at this and sadly can't repro it with that key on win2003. Either way, the patch as it stands needs work - eg, the code and comment: // Reset because the call above messed it up. retValueSize=maxValueNameSize; is wrong - the whole point of the call above was to determine the size - so the correct fix given that approach would be to remove the first call completely. It also seems this would fail to work for binary values with more than 16384 bytes from working, which best I can tell does now. So while I don't doubt the Japanese version of 2003 fails, I'm rejecting this as it stands still welcome more input on the best way to approach this - eg, the smallest script that fails for you would help - I'm assuming it would be: import win32api, win32con key=win32api.RegOpenKey(win32con.HKEY_LOCAL_MACHINE, "SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Control Panel\\Cursors\\Schemes", win32con.KEY_READ) print win32api.RegEnumValue(key, 0) FWIW, Python's _winreg module still has identical code to win32api and no open bugs on a similar issue, so no one has reported a problem there either (but that still doesn't mean one doesn't exist :) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2008-05-04 11:07 Message: Logged In: YES user_id=14198 Originator: NO Could you please attach either a patch, or the complete source file with the new function (all the indentation is lost above) Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=551954&aid=1944375&group_id=78018 |