From: Buriez, P. <pat...@in...> - 2010-05-31 14:02:27
|
Hi Antoine, For further details on what happened on the TS and why the SIPp instance exited, you need to look at the SIPp logfile. It should be located in the /home/aro/test/ directory, and its name is <SIPp process id>_errors.log. Regards, Patrice -----Original Message----- From: Antoine Roly [mailto:ant...@gm...] Sent: Monday, May 31, 2010 3:16 PM To: Buriez, Patrice Cc: sip...@li... Subject: RE: sipp ims bench Hi Patrice, I have another question. I launched a test this morning, and it ends with an error (Don't worry, it is not related to any timer, or vm configuration,...). In the terminal of the manager I see: 14:53:50.228| sailfin_uac 00 (S+F)= 18607 F= 39 IHS= 0.20960% 14:53:50.228| *IHS ALL* (S+F)= 18607 F= 39 IHS= 0.20960% 14:53:50.303|CPU0 18439557ms: 45.4545 MT: 3631924 MF: 2486792 14:53:51.198|<TS1> ERROR_REPORT 0xfff 14:53:51.198|<TS1> ERROR_REPORT 0xfff 14:53:51.199|<TS1> ERROR_REPORT 0xfff ... 14:53:51.218|<TS1> ERROR_REPORT 0xfff 14:53:51.218|<TS1> ERROR_REPORT 0xfff 14:53:51.313|CPU0 18440567ms: 70.5882 MT: 3631924 MF: 2486792 14:53:52.313|CPU0 18441567ms: 35.0000 MT: 3631924 MF: 2486792 14:53:52.595|TS1 deregistered 14:53:52.595|shutdown 14:53:52.595|Closing CPU0 connection... 14:53:52.596|please wait... 14:53:54.597|Manager exit with rc=0 It is not the first time a test ends this way. Do you have an idea where the error could come from? It is a network error or something like that? The IHS is not big, the SUT is not overloaded at all, ... Regards, A. Le vendredi 28 mai 2010 à 14:37 +0100, Buriez, Patrice a écrit : > Hi Antoine, > > There is obviously something wrong with the clock on your setup. > Just take a look at the leftmost timestamp in manager.log: it shows that the manager ran from 10:31:37.295 to 10:39:26.425. However, it also shows on lines 472, 730 and 1912 that the clock suddenly and temporarily jumped 4398 seconds (+01:13:18) into the future. This problem is further confirmed in report.xml, for example: > <!-- update step at 12 2010-05-28 10:33:22.823 [4481546ms]--> > <!-- update step at 18 2010-05-28 10:33:22.823 [83500ms]--> > For your information, the YMDHMS and the ms timestamps are printed from different variables, and the ms timestamp is sometimes too high by 4398046ms. This is the same difference as in your previous report.xml files, so the problem seems to be very reproducible. > Checking in the source code, I tracked this problem down to getmilliseconds() in utils.cpp and to the gettimeofday() system call. The root cause of the problem is most probably the system clock that transiently jumps into the future for some reason... When it happens, the manager thinks that the step is completed and increases the load to the next step. > > It is very likely that your system clock problem originates from a disagreement between the hypervisor and the virtual machine about what time it is. By the way, what hypervisor are you using? > I will let you further investigate the HV and VM clock mis-configuration. You already killed ntpd and ptpd on the VM, but there is probably a synchronization feature in the HV that regularly adjusts the clock on the VM... Or may be yet another time-related daemon on the VM that incorrectly moves the clock ahead and the HV (almost) immediately restores it to the correct time... > To evidence the problem and to confirm that it has been solved, you can write a simple C program that calls gettimeofday() in a loop, prints the tv_sec field of the timeval structure, and checks whether it has suddenly decreased (showing that it was incorrectly increased in the previous loop iteration). In order to avoid 100% CPU usage, you could add a sleep() inside the loop, but since this system call is also time-based, it might actually prevent your program from showing the problem! > > I don't mean that IMS Bench SIPp is not supported in a VM, but we never tested this setup. In fact, we usually do the opposite: in order to benchmark our SUTs, we use a stack of at least 4 physical servers on which we run at least 4 SIPp TS instances. Anyway, it might run in a VM, provided the generated load is not too high, but you will probably lose some precision on the results. The major prerequisite, on a VM or on a physical server, is that the clock is linear and monotonic, which is not your case at the moment. > Regarding the clock precision, the installation guide at http://sipp.sourceforge.net/ims_bench/reference.html#Pre-requisites recommends to recompile the kernel with the timer frequency set to 1000Hz. Some distributions come with a kernel already configured that way, and otherwise may be you recompiled your kernel accordingly on the VM. But whatever the configuration of the VM, it still depends on the configuration of the underlying HV, which might possibly be out of your control. > > I guess that your current goal is to validate whether IMS Bench SIPp can be used to benchmark SailFin, and to check that the generated reports contain the information that you expect. For that purpose, I understand your choice of running in a VM with a low load, because you plan to validate features rather than full performance. I also guess that, once you validated the tools, you would deploy a real test setup using one or more dedicated physical servers in order to benchmark the real performance of your SailFin SUT. > If that's the case, and unless the VM/HV clock mis-configuration issue is really obvious, then there is no real value in spending your time to make it work in a VM, because the final setup would use physical servers anyway. Instead, I would suggest that you temporarily use a real physical machine, because a low end server or even a desktop should be good enough to validate the features under a low load. > > Regarding the "segmentation fault" problem, I think that it is related to the clock issue, because we obviously assume that the clock is monotonic and we make decisions based on the amount of time spent. If the time difference is sometimes negative, then we might take wrong decisions, such as deleting an object which could later be accessed at another point in the code when the time difference is correct again... So I wouldn't worry too much about it, until the clock issue is solved. > > Regards, > Patrice > > -----Original Message----- > From: Antoine Roly [mailto:ant...@gm...] > Sent: Friday, May 28, 2010 10:47 AM > To: Buriez, Patrice > Cc: sip...@li... > Subject: RE: sipp ims bench > > Hi Patrice, > > I've tried some tests with the initialSAPS to a even value, but the > results are still strange, and the SAPS increases more than expected. > The files you've asked (report and manafer.log) are attached. > > If the problem could come from the clock, I'm going to investigate in > this direction. The TS and manager are running in a virtual machine, > maybe something is wrong... It should not, and I've never had a problem > with it but we never know... > > Regards, > > A. > > > > Le jeudi 27 mai 2010 à 18:35 +0100, Buriez, Patrice a écrit : > > Hi Antoine, > > > > This is really weird, the [ms] timestamp in the report.xml still moves back and forth, while the "YMD HMS.ms" seems correct! > > Because of that transient wrong time reference, the load is increased too often. That's why you got 60 instead of 5. > > > > Can you do one more try, with InitialSAPS set to an even value, or to any multiple of (StirSteps+1)? > > Please also attach the manager.log file. > > It's OK to run the manager and TS on the same computer. > > > > Regards, > > Patrice > > > > -----Original Message----- > > From: Antoine Roly [mailto:ant...@gm...] > > Sent: Thursday, May 27, 2010 6:14 PM > > To: Buriez, Patrice > > Cc: sip...@li... > > Subject: RE: sipp ims bench > > > > Hi Patrice, > > > > I've "svn co" revision 587 and killed ptpd and ntpd. > > I haven't seen anything weird when I compiled the soft (make rmtl, ossl > > and mgr as in the doc). > > > > The manager and the TS are the same host, so I suppose it's ok to run > > the test without both ntpd and ptpd, but I had to put the MaxTimeOffset > > to 0. > > I don't know if this can have an important negative impact on the test > > (other than for the time in the report of course). > > > > I've made several tests today, the results are strange. Almost all tests > > end correctly (i.e. without seg fault, but the results are weird), some > > test ends with a seg fault like in a previous mail. > > > > Here are the 3 files from the latest test... In this one, the SAPS > > increased more than expected and overloaded sailfin. As you can see in > > the report, the requested load of the first step was 5, but the mean > > value is 60!!! I don't understand why the SAPS increase so much. Gsl is > > working, I think the soft uses that to generate traffic so... > > > > Obviously there's something wrong, maybe in the way I'm using the bench, > > I don't know... Is it possible it's not working as expected due to the > > very low value I'm using (for initialSAPS, SAPSincreaseAmount,...)? I > > suppose not but... Or because I've only a single TS running on the same > > host than the manager, and without ntpd or ptpd? > > > > Regards, > > > > A. > > > > Le mercredi 26 mai 2010 à 17:54 +0100, Buriez, Patrice a écrit : > > > Hi Antoine, > > > > > > I investigated the files you sent. > > > The report.xml file suggests that the time reference is moving back and forth. > > > I see several possible reasons for that: > > > > > > - Are you running ntpd and ptpd at the same time? > > > If that's the case, kill at least one of them, or even both, and try again. > > > > > > - The "Segmentation fault" suggests that something is going really bad. May be the stack got corrupted... > > > Try a "make clean", then "make", and check for errors and warnings. Anything weird there? > > > > > > - We might have a regression in IMS Bench SIPp. > > > Get revision 587 and try again with this first version that supports SailFin: > > > svn co -r 587 https://sipp.svn.sourceforge.net/svnroot/sipp/sipp/branches/ims_bench ims_bench-587 > > > > > > Regards, > > > Patrice > > > > > > -----Original Message----- > > > From: Antoine Roly [mailto:ant...@gm...] > > > Sent: Wednesday, May 26, 2010 2:12 PM > > > To: Buriez, Patrice > > > Subject: sipp ims bench > > > > > > Hi Patrice, > > > > > > Here are the files you asked. > > > > > > For this test, only one instance of SIPp was running, on the same host > > > that the manager. I suppose this is not a problem. Of course the SUT was > > > another host. > > > > > > Thanks in advance > > > > > > Regards, > > > > > > Antoine > > > > --------------------------------------------------------------------- > Intel Corporation NV/SA > Rond point Schuman 6, B-1040 Brussels > RPM (Bruxelles) 0415.497.718. > Citibank, Brussels, account 570/1031255/09 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > --------------------------------------------------------------------- Intel Corporation NV/SA Rond point Schuman 6, B-1040 Brussels RPM (Bruxelles) 0415.497.718. Citibank, Brussels, account 570/1031255/09 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. |