|
From: O. S. <oli...@on...> - 2016-04-11 09:11:05
|
So will those SDR errors become fatal now? > I think it would make sense to abort at that point. I have also seen > sensor requests where it got all the SDRs but only showed the > reservation ID error for each just to have that error at the end > again. I will move on to the jump start as soon as I am sure there's > no more cases for me where it has an incomplete sensor list and says > "successful". > > > The reservation ID error only applies to one SDR, because isensor then > > re-acquires it for the next SDR. > > I guess we could abort everything at that point, but if you are seeing > > those errors, then moving to the jumpstart is the only reliable > > solution, since the firmware is being asked for SDR reservations from > > multiple sources. > > > > On Thu, Mar 31, 2016 at 9:14 AM, Oliver Stöneberg <oli...@on...> wrote: > > > The -25 error sticks now after the change you did yesterday and I > > > only had two incomplete sensor lists with "successful" in the past > > > three hours. I usually have dozens of those per hour. Both appear to > > > be variants of the reservation ID error which isn't propagated as > > > exitcode yet. > > > > > >> I am not surprised by this - it's always the reservation ID. Could > > >> you also make this error message appear at the end as well? > > >> > > >> I think if this error actually makes the command fail and the -25 > > >> actually sticks that should finally cover all the issues I am aware > > >> of. That would also enable me to finally start using the jump-start > > >> next week. > > >> > > >> Thanks > > >> Oliver > > >> > > >> > The log shows this key error message: > > >> > 0000 GetSDR error 0xc5 Reservation ID cancelled or invalid, rlen=16 > > >> > > > >> > So there is some other program trying to get the SDRs from this same > > >> > system, and grabbing the reservation ID away. > > >> > > > >> > > > >> > On Tue, Mar 29, 2016 at 1:25 PM, Oliver Stöneberg <oli...@on...> wrote: > > >> > > I still see incomplete sensor lists with successful calls and the > > >> > > "malformed" error in the output. > > >> > > It seems setting the rc in GetSDR() is not enough. It seems you > > >> > > either need to add another rc != 0 check there or abort the loop in > > >> > > case rc is not 0 or it will be overwritten by the result of the next > > >> > > ipmi_cmd_mc() call. > > >> > > > > >> > > I also attached a debug log of a sensor request where it doesn't > > >> > > start with the proper index. Since it is too big for the list I put > > >> > > it up on the web - http://www.pastefile.com/I19qDA > > >> > > > > >> > > > > >> > >> Oliver, > > >> > >> > > >> > >> Sorry you were sick. Glad you are back. > > >> > >> I just now committed the changes to SVN and GIT. > > >> > >> > > >> > >> Andy > > >> > >> > > >> > >> On Tue, Mar 29, 2016 at 3:50 AM, Oliver Stöneberg <oli...@on...> wrote: > > >> > >> > Hi Andy, > > >> > >> > > > >> > >> > I was out sick the whole last week and I just checked SVN and there's > > >> > >> > no chances. Seems like you didn't commit that -25 thing yet. > > >> > >> > > > >> > >> > Greeting > > >> > >> > Oliver > > >> > >> > > > >> > >> >> Hi Andy, > > >> > >> >> > > >> > >> >> thanks for the usual fast reply and solution. I will give it a spin > > >> > >> >> on Monday. I am pretty sure it's the reservation ID thing since we > > >> > >> >> are already getting lots of those. The servers I am seeing this with > > >> > >> >> are Cisco and IBM brand, so I doubt it's a firmware issue. I will > > >> > >> >> update them to the latest version anyways. > > >> > >> >> > > >> > >> >> I also didn't go with the jumpstart file yet since I wanted to have > > >> > >> >> all the issues reported in the normal non-jumpstart case so I can be > > >> > >> >> sure it will be fine. > > >> > >> >> > > >> > >> >> The thing I wanted the normal sensor reading to work is that there's > > >> > >> >> no user interaction involved regarding the sensor reading. We use > > >> > >> >> ipmiutil insid an asset management system to fetch the data and you > > >> > >> >> just add the server to the system with the required data and that's > > >> > >> >> it. So the jumpstart creation would be automatic on the first sensor > > >> > >> >> fetching. So we need to be sure the file created is actually complete > > >> > >> >> and correct. > > >> > >> >> > > >> > >> >> I will check which kind of server it is on Monday as well. > > >> > >> >> > > >> > >> >> Greetings and Thanks > > >> > >> >> Oliver > > >> > >> >> > > >> > >> >> > Oliver, > > >> > >> >> > > > >> > >> >> > If the log shows debug output "sdr[57] off=0, expected 18, got 4" then > > >> > >> >> > the code wrote this message to stderr 3 lines later regardless of > > >> > >> >> > debug or not "SDR record 57 is malformed, length 4 is less than > > >> > >> >> > minimum 18" , so there had to be an error message. > > >> > >> >> > However, the fact that the last line says 'successfully' and it did > > >> > >> >> > not return an error code should be fixed. > > >> > >> >> > I'll change it to return error -25 in this case for the next release. > > >> > >> >> > > > >> > >> >> > Root cause: > > >> > >> >> > Originally, we had only seen this problem reading OEM SDRs which > > >> > >> >> > didn't have readings and were last in the list, so it was not > > >> > >> >> > considered a functional error, but this firmware behavior breaks > > >> > >> >> > getting the sensor readings. I can think of two reasons that the > > >> > >> >> > firmware might be giving this error: (1) the firmware is internally > > >> > >> >> > busy doing some other function that interferes with sending the SDR > > >> > >> >> > buffer, (2) the SDR reservation ID was acquired by another process > > >> > >> >> > while the SDR buffer was in process. > > >> > >> >> > > > >> > >> >> > Note that the jumpstart SDR file will always be the same for every > > >> > >> >> > system of the same model (same motherboard), so you could capture/dump > > >> > >> >> > it once per model to know that it is correct. > > >> > >> >> > Or, on each system, dump it before the first pass (when the firmware > > >> > >> >> > is not busy, so it should be fine), then reuse it each time (if the > > >> > >> >> > dumped SDR file exists). > > >> > >> >> > > > >> > >> >> > Andy > > >> > >> >> > > > >> > >> >> > On Thu, Mar 17, 2016 at 3:59 PM, Oliver Stöneberg <oli...@on...> wrote: > > >> > >> >> > > The problem i that I don't get any error message. It stops at the > > >> > >> >> > > sensor and says "completed successfully". I do see such errors as > > >> > >> >> > > well but not in all cases the list is incomplete. > > >> > >> >> > > > > >> > >> >> > > The jump start would help but if the list is incomplete during the > > >> > >> >> > > jump start creation that would still be a problem. > > >> > >> >> > > > > >> > >> >> > >> In the one that was incomplete, reading the SDRs failed due to this error: > > >> > >> >> > >> ipmi_cmd SDR[57] off=0 ilen=16 status=0 cc=0 sz=4 > > >> > >> >> > >> sdr[57] off=0, expected 18, got 4 > > >> > >> >> > >> GetSDR[0057] next=ffff (len=2): 01 80 > > >> > >> >> > >> > > >> > >> >> > >> And it would have displayed a corresponding message to stderr like this: > > >> > >> >> > >> "SDR record 57 is malformed, length 4 is less than minimum 18" > > >> > >> >> > >> It is supposed to stop reading SDRs at this point because it cannot > > >> > >> >> > >> trust the pointer/offset to the next SDR. > > >> > >> >> > >> The firmware did not return an error, but returned a bad data block > > >> > >> >> > >> for the first chunk of SDR 0057. > > >> > >> >> > >> > > >> > >> >> > >> where the successful one has this debug there: > > >> > >> >> > >> ipmi_cmd SDR[57] off=0 ilen=16 status=0 cc=0 sz=18 > > >> > >> >> > >> ... > > >> > >> >> > >> ipmi_cmd SDR[57] off=48 ilen=12 status=0 cc=0 sz=14 > > >> > >> >> > >> GetSDR[0057] next=58 (len=60): 57 00 51 01 37 20 00 56 04 12 67 40 0d > > >> > >> >> > >> 6f 03 00 03 00 03 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > >> > >> >> > >> 00 00 00 00 00 00 00 00 00 00 00 cc 48 44 44 31 39 5f 53 54 41 54 55 > > >> > >> >> > >> 53 > > >> > >> >> > >> GetSDR[0057]: ret = 0, next=58 > > >> > >> >> > >> > > >> > >> >> > >> The sensor 0057 points to this on the successful one: > > >> > >> >> > >> 0057 | Full | Drive Slot | 56 | HDD19_STATUS | Warn-lo | 0.00 na > > >> > >> >> > >> > > >> > >> >> > >> If the firmware doesn't return the SDR data, ipmiutil cannot control > > >> > >> >> > >> that. However, using a jumpstart file of the SDRs would be a good > > >> > >> >> > >> option to prevent having to read the SDRs from the firmware each time, > > >> > >> >> > >> but just get the readings. > > >> > >> >> > >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> ------------------------------------------------------------------------------ > > >> > >> >> Transform Data into Opportunity. > > >> > >> >> Accelerate data analysis in your applications with > > >> > >> >> Intel Data Analytics Acceleration Library. > > >> > >> >> Click to learn more. > > >> > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > > >> > >> >> _______________________________________________ > > >> > >> >> ipmiutil-developers mailing list > > >> > >> >> ipm...@li... > > >> > >> >> https://lists.sourceforge.net/lists/listinfo/ipmiutil-developers > > >> > >> > > > >> > >> > > > >> > > > > >> > > > > >> > > > > >> > > ------------------------------------------------------------------------------ > > >> > > Transform Data into Opportunity. > > >> > > Accelerate data analysis in your applications with > > >> > > Intel Data Analytics Acceleration Library. > > >> > > Click to learn more. > > >> > > http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 > > >> > > _______________________________________________ > > >> > > ipmiutil-developers mailing list > > >> > > ipm...@li... > > >> > > https://lists.sourceforge.net/lists/listinfo/ipmiutil-developers > > >> > > >> > > >> > > >> ------------------------------------------------------------------------------ > > >> Transform Data into Opportunity. > > >> Accelerate data analysis in your applications with > > >> Intel Data Analytics Acceleration Library. > > >> Click to learn more. > > >> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 > > >> _______________________________________________ > > >> ipmiutil-developers mailing list > > >> ipm...@li... > > >> https://lists.sourceforge.net/lists/listinfo/ipmiutil-developers > > > > > > > > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 > _______________________________________________ > ipmiutil-developers mailing list > ipm...@li... > https://lists.sourceforge.net/lists/listinfo/ipmiutil-developers |