Issue: SFCB fails every other EnumerateClasses operation, repeatably. This has been observed so far only when performed on a class that has a subclass.
The invocation that fails most of the time fails with a CIM status code CIM_ERR_FAILED.
Occasionally, it fails with timeout (did not try longer timeouts than 30sec).
I tried this with class CIM_CredentialContext, which has exactly one subclass: IBMSD_CMMCredentialContext.
The command I used was cimcli, without any other parameters for the operation.
SFCB version: 1.3.12
Host: 9.72.217.204
Hm, I tried this locally with CIM_ComputerSystem, which also has only one subclass (on my system). I didn't receive any errors when doing that. Could you attach the XML of your request?
I am able to reproduce what I believe is the same problem on x86. The Class provider is crashing on every other ec call on any base class. The problem does not occur when the cimcli -di option is used, as this causes DeepInheritance=true to be set in the XML for the CIM EnumerateClasses call. The problem does not occur with wbemcli because it sets DeepInheritance=true by default.
The crash does not occur with -di because a slightly different code path is executed in classProviderGz.c. Without -di, the classProviderEnumClasses() function first calls getClass() on the queried class, then calls getChildren() to get a list of subclasses. For each child class found, getClass() is called to get a reference (CMPIConstClass*) to the child class. The result is returned via CMPIReturnInstance(). Then depending on the value of a variable "cid", CMRelease() is called to free the CMPIConstClass*.
With -di, classProviderEnumClasses() calls a function loopOnChildren() immediately following the the call to getClass() on the queried class. The logic of loopOnChildren() is similar to that just described, except that loopOnChildren() is a recursive function, presumably to implement DeepInheritance behavior. Here as well, CMRelease() may be called after the class reference is returned via CMPIReturnInstance().
The difference between the two cases is that CMRelease() always ends up getting called in the former case but not in the latter case. The conditional is the variable "cid". If cid is NULL, CMRelease() is called. In the former case, cid is always NULL. In the latter case, cid appears to be set to the value of the pointer to the current ClassRegister and is NOT NULL.
Although the crash occurs on the 2nd pass (i.e the 2nd call to classProviderEnumClasses()), the problem seems to be "set up" on the 1st pass. If CMRelease() is called on the 1st pass, any reference to the released CMPIConstClass* to the child class will fail on the 2nd pass. The crash actually occurs at the call to CMPIReturnInstance() of the child class, on the 2nd pass.
I do not completely understand the role of the "cid" pointer and so I am unclear why it is set in one case but not the other. In fact, I am unclear exactly where it IS being set. (The only place where I can see: getFirstClass() and getNext(Class), do not appear to be executed in this case). So I am not sure what the correct CMRelease() behavior should be. But I have determined: if the call to CMRelease() is removed, the problem does not occur.
The attached test patch removes the call to CMRelease() in the case where -di is not specified. This fixes the problem on x86. I would like to see if it also fixes it on CMM. This may not be the final solution but it would verify the problem is the same.
A final note: While it is also true that on the 1st pass for a class queried for the first time, the class record is read from disk and on the 2nd pass it is read from cache, this did not turn out to have any bearing. You can call the ec as many times as you like with -di and it will never crash, even though the record is read from cache on all but the first. If you then call ec without -di, the query will be successful ONCE. The next ec call following that, with or without -di, will cause the crash.
For some reason I am unable to attach a file.
Index: classProviderGz.c
RCS file: /cvsroot/sblim/sfcb/classProviderGz.c,v
retrieving revision 1.20
diff -a -u -p -r1.20 classProviderGz.c
--- classProviderGz.c 17 Dec 2009 20:15:48 -0000 1.20
+++ classProviderGz.c 25 Jun 2012 04:24:33 -0000
@@ -731,7 +731,6 @@ static CMPIStatus ClassProviderEnumClass
if (ul) for (child = (char *) ul->ft->getFirst(ul); child; child = (char *) ul->ft->getNext(ul)) {
cls = getClass(cReg,child,&cid);
CMReturnInstance(rslt, (CMPIInstance *) cls);
- if (cid==NULL) CMRelease(cls);
}
} else if (cn && (flgs & CMPI_FLAG_DeepInheritance)) {
loopOnChildren(cReg, cn, rslt);
hellerda, I've assigned you to this bug. You should be able to attach files now.
cid being non-NULL means the class is cached, and therefore the class should not be freed. I was able to see this path using test/xmltest/EnumerateClasses.xml
When ec is called with no classname, as in EnumerateClasses.xml, a slightly different codepath is executed. In this case, classProviderEnumClasses() will call getFirstClass() & getNextClass() and that is where the value of cid is set. cid will be non-NULL only if the class has been previously cached; getFirstClass(), getNextClass() will not cache the class if it is not already.
In the case where the crash occurs, ec is called for a specific classname. In this case classProviderEnumClasses() does not call getFirstClass() but instead will call getClass() with the provided classname. Confusingly, the parameter cid is also passed to getClass() but it is never used. (which gives a compiler warning with -Wextra). From what I can see, the value of cid is never set in this code path, and so will always be NULL.
If cid were to be set anywhere, it seems like it should be in getClass(). But since the behavior of getClass is to cache the class if it is not already, the class will always be cached upon return from getClass(), so cid would never be null in this case, and CMRelease() would never be called. So, getClass() could be modified so it sets cid. But it seems there is no functional difference than just removing the CMRelease() call at that point in classProviderEnumClasses()
I am also trying to understand is how cid IS being set in the case where DeepInheritance=true. This is the case where it executes loopOnChildren(), which is really just calling getClass() recursively. It seems cid is an uninitialized local variable here.
hellerda,
I've looked at it, and removing that CMRelease seems fine to me.
Also, I won't object if you want to do some code cleanup in other places where cid is disused.
On further examination it is clear the CMRelease() is required only following calls to getFirst/NextClass(), as these functions allocate a new CMPIConstClass on each call, when the class is not currently cached. If this object is not freed, subsequent calls will continue to allocate new ones, and memory will be consumed. I confirmed removing the CMRelease() there causes a memory leak.
By contrast, calls to getClass() also allocate a new CMPIConstClass, but in that case the class is always cached prior to return from the function, so it does not seem it should be freed at that point. (It should be freed when it rotates out of the cache as LRU). A CMRelease() on a cached object is in fact what is causing the crash.
I did confirm that setting the value of cid within getClass() also fixes the problem, simply because it prevents the subsequent CMRelease(). But as previously stated, there does not appear to be a need for this check since the class will always be cached upon return from getClass().
Some simple tests so far show there is no memory leak if the CMRelease() is removed at either point following the call to CMRelease(), i.e. with or without DeepInheritance. I will have a new patch shortly.
Do not CMRelease a cached class, remove extranous refs to 'cid'.
Patch committed to CVS HEAD and git master.
patch for SFCB 1.4