There is a bug in data gatherer that causes certain GI (getInstance) queries to fail with an error like "CIM_ERR_NOT_FOUND Gatherer repository reported error". Specifically, a GI query on an "interval" metric will fail when the request is for the most recent instance of the metric value, even though there is enough data in the repository to satisfy the query.
Certain "interval" metrics (e.g. TotalCPUTimePercentage) require at least two repository values to return a result. If a query is performed for a time value corresponding the most recent sample time, and there are not yet two values in the repository, as may happen when reposd is first started, then the above error is expected. But as long as reposd has been running for at least one full sampling interval, the error should not occur.
The root cause has to do with the way interval metric values are stored and reported. In this case, the metric value is calculated from the two consecutive data points. The timestamp of the reported value could be associated with the earlier data point or the later data point (i.e. the beginning of the interval or the end of the interval). The correct way is to use the end of the interval, and this is what data gatherer does. GI queries for a given time value are calculated from the two most recent data points prior to the requested time. But when the requested time value is for the most recent data point (i.e. the most recent sample time), reposd fails to retrieve the sufficient number of data points and the query fails.
This bug is related to, but is not a regression from, [bugs:#2111]
That bug fixed the interval metric reporting so that the timestamp of the reported value is associated with the end of the interval, as described above. Prior that that there was an inconsitency in the way interval and non-interval metrics were reported; interval metrics were incorrectly reported with the timestamp of the previous sample time. Now, both interval and non-interval metrics are reported with the timestamp of the most recent sample.
But, because of this change, it is more likely for the GI bug to be seen. The reason is: a EI or EIN query on the interval metric will now report the most recent timestamp, but a query done for that timestamp, when performed immediately (i.e. prior to the next sample time) will fail.
This is most easily demonstrated with the cimcli gi query in "interactive" mode. In this case, cimcli first does a EI query to retrieve a list of instances, then allows the user to select an instance from the list. cimcli then peforms the GI query against that instance:
cimcli -i -l localhost -n root/cimv2 gi Linux_ProcessorMetricValue
If the user selects one of the "TotalCPUTimePercentage" instances, the query will always fail. But if the user pauses for a period of at least one sampling interval (typically 60s) before selecting the instance (that is, allow cimcli to display the list, wait, then select the instance) the query will succeed. The reason is: at this point, you are no longer requesting the latest value.
The reason the bug was not seen here prior to [bugs:#2111] is: previously the EI query returned the older timestamp for the interval metric, so the subsequent GI, even if done immediately, was never for the latest value.
The fix is to ensure reposd always selects two instances from the repository, in the case of an interval metric, to ensure the value is properly calculated.
Finally, note that GI queries where the requested instance timestamp is "0" (i.e. request the latest value) always succeed (as long as reposd has been running for at least one interval as described above), since this takes a slightly different code path, in that case the code automatically selects the sufficient number of values. For example:
cimcli -l localhost -n root/cimv2 gi 'Linux_ProcessorMetricValue.MetricDefinitionId="TotalCPUTimePercentage.188",InstanceID="TotalCPUTimePercentage.188.Processor0.myhost.0"'