Re: [Ocf-linux-users] talitos driver should be preemtive
Brought to you by:
david-m
From: Kim P. <kim...@fr...> - 2010-05-27 07:00:18
|
On Wed, 26 May 2010 20:32:49 +0200 " ALEXANDRU IONUT GRAMA" <ai....@al...> wrote: > Hello sirs! I really apreciate your fast answer, thank you very much for > the answers. > > At first, I think you should know some characteristics of my system and > software layer. > > At the first, I use a kernel 2.6.21-rc2, with the next options: > Kernel options ---> > Timer frequency (300 HZ) ---> > Preemption Model (Preemptible Kernel (Low-Latency Desktop)) ---> > [*] Preempt The Big Kernel Lock > [*] Kernel support for ELF binaries > As I understand, those options give to the kernel the preemption > feature.Ocf have been builded as modules, so: > > Loadable module support ---> > [*] Enable loadable module support > [*] Module unloading > [*] Automatic kernel module loading > Cryptographic options ---> > OCF Configuration ---> > <M> OCF (Open Cryptograhic Framework) > <M> cryptodev (user space support) > <M> cryptosoft (software crypto engine) > <M> talitos (HW crypto engine) > (The other options are disabled) > > After applying the patch to Openssl-0.9.8n, I've make some changes in > cryptodev uncommenting the parts relationated with > --with-cryptodev-digest. (I've understood looking at the code that > cryptodev-digest feature doesn't work, but i've activated it for > testing it). After doing it, I've compile Openssl with those options: > powerpc-linux-gcc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT > -DDSO_ -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DDSO_DLFCN > -DHAVE_DLFCN_H -DENGINE_DYNAMIC_SUPPORT -Os > > For loading and unloading modules, I've use insmod (or rmmod) with the > modules ocf,cryptodev,cryptosoft and talitos. > > After that, depending if I benchmark with talitos or without talitos I do > the insmod or rmmod with ocf,cryptodev,cryptosoft and talitos. The > benchmarking is done with the time command, so for each execution we > obtain time consumed in user mode (U), system mode (S) and also, total > elapsed time as Real time (R): > $ time openssl enc -e -aes-128-cbc -salt -in kkkk -out kkkk.enc -pass > pass:'micontrasenalarguisima' -engine cryptodev > > kkkk is a file was previously generated with dd from /dev/urandom and > contains 10Megabytes of random data. The measures I'm going to show you > are the minimum of the results produced by a sequence of 50 executions > of each time command.We have checked the average is very close to the > minimum, so the minimum is a very good representation of the best > possible performance. > > Command/Times R(secs) U(secs) S(secs) R-(U+S)* > 1)with crypto engine 3.4 0.13 3.08 ~=0.026 > 2)crypto by software 3.59 2.29 1.09 ~=0.056 > > The first command (1) is executing with "-engine cryptodev" and the > modules loaded and the second one (2) is executing with modules removed > and without "-engine cryptodev", so by software. > > * The R-(U+S) given is the average of that computation for each > individual measurement, so it denotes some minimum and constant > background activity in the system that constantly enlarges the elapsed > time measured. > > Our first surprise is that total elapsed time with (case 1 ~= case 2) and > without engine is very similar. We can deduce that the performance of > the main processor and of the crypto-processor is very similar. It's > surprising to have a crypto-processor not faster than the main CPU, but > it could be understood if the main CPU could perform other tasks in > parallel. So we have repeated those benchmarks with a CPU consuming > process (an infinite loop) running in background, in order to prove if > the CPU can perform in parallel really. The results follows: > > Command/Times R(secs) U(secs) S(secs) R-(U+S)* > 3)with crypto engine 6.7 0.12 3.09 ~=3.436 > 4)crypto by software 6.25 2.31 0.69 ~=3.262 > +100% CPU in background. > > Those figures point us something amazing: It's much faster, cheaper, and > simple having the CPU without the crypto-processor!!!!!! you're not utilizing the crypto h/w at all - the talitos driver needs to be modified to support the 8272's SEC 1.x. The -engine cryptodev results are the results from using the kernel's built-in software crypto algorithm implementations, because you have loaded the cryptosoft module. It should be straightforward to convert talitos to support SEC 1.x h/w; it has a different ring buffer mechanism (which, if I knew more about, I'd be able to tell you whether it allowed simultaneous ciphers and hashes)... Kim > We can see that the time used to crypt the data without cryptodev (case > 4) is 6.25. This is expected because the CPU is shared (50/50) between > openssl process and the other background process, and the openssl > process needs 2.31+0.69=3.00 secs to perform, and 6.25 is ~= 2*3.00 > secs. Also, the elapsed time for crypt the data (U+S) it's more or less > the same independly of the existence of the background process, (case 1 > ~= case 3) and (case 2 ~=case 4). And here comes what it's strange: > Processing the data with a background process should take few more real > time that doing it without the bg process, but not the double! (case 3: > 6.7 > 2*(0.12+3.09)). Supose that, for example, from those 3.09 secs, > 1.00 is the CPU loadding data to the crypto-processor and the other 2.09 > secs is waiting for it to finish, then we would expect something more > simmilar to 2*(0.12+1.00)+2.09 secs = 4.33 secs of real time. But it > looks like doesn't exist parallelism between the CPU and the > crypto-processor, and the CPU is waiting for the cryptoprocessor to > finish without freeing the CPU, as explained in the two following time > graphs: > > The time graph with parallel execution should be like this: > > openssl in CPU | = = = = = = = > other proc in CPU |= = = = ====== ====== ====== > ------------------------------------------- > openssl in Crypto-PU | ====== ====== ====== > > > Instead of the previous graph, I'm thinking that the time graph is > something like this: > > openssl in CPU | = = = = = = = > other proc in CPU |= = = = = = = = = = = = = = = = = = = = = > ------------------------------------------- > openssl in Crypto-PU | = = = = = = = = = = = = = = = = = = > > In summary, our questions are: > Why do the case 3 gives 6.7 secs instead of much less as expected? > Is the first time graph schema correct? > What can we do for fixing it? > > > Best regards, > Alexandru. > > > Con fecha 25/5/2010, "David McCullough" <dav...@mc...> > escribió: > > > > >Jivin Kim Phillips lays it down ... > >> On Wed, 26 May 2010 07:55:51 +1000 > >> David McCullough <dav...@Mc...> wrote: > >> > >> > > >> > Jivin Kim Phillips lays it down ... > >> > > On Mon, 24 May 2010 23:14:55 +0200 > >> > > " ALEXANDRU IONUT GRAMA" <ai....@al...> wrote: > >> > > > >> > > > Hello. My name is Alexandru and I'm doing my final degree project about > >> > > > porting Linux to a embedded device (a router), that uses a 8272 of > >> > > > Freescale. My issue to solve is provide IPsec to the router. The > >> > > > software that I used is a Linux 2.6.19 + Quagga + Openssl + ipsec-tools. > >> > > > The point was that the processor (8272) came with a crypto-processor > >> > > > embedded in it that should help in the encryption process. I've found > >> > > > this excelent project that provide the support of hardware encryption > >> > > > with Cryptodev + Talitos driver. Also, the patch for Openssl works > >> > > > perfectly and I've obtained the feature that I need. > >> > > > > >> > > > After making some benchmarks I've discovered that talitos is not > >> > > > preemtive. The crypto-processor (SEC) should make the operations of > >> > > > encryption/decryption and let the processor idle; the scheduler should > >> > > > be called and let another process to enter as "active process". After > >> > > > the crypto-processor finish the job, it should say "I'm done!" by a > >> > > > IRQ signal and the other encryption process that need the encrypted data > >> > > > should be activated and continue getting the crypted data from the > >> > > > address of memory where the crypto-processor writted it. > >> > > > > >> > > > Well, the behaviour of talitos seems not to be like that. It's look like > >> > > > the processor is waiting for the crypto-processor to finish, and after > >> > > > that it gets the crypted data.That wastes the time of the processor > >> > > > while is waiting for the crypto-processor to finish. Maybe I'm wrong, > >> > > > but the benchmarks looks like (2 processes means 1 crypting and another > >> > > > doing an infinite loop a=1+1): > >> > > > no CD(R) no CD(U) no CD (S) > >> > > > 1 process crypting 0.36 0.13 0.20 > >> > > > 2 processes(1+1*) 0.72 0.13 0.20 > >> > > > It looks normal without Cryptodev that the user and system time be the > >> > > > same, but the the real be double, because there's another process > >> > > > requiring the CPU. > >> > > > > >> > > > Benchmarking the system with Cryptodev I've obtain the more or less the > >> > > > same times (much more system time that without it), but it's not > >> > > > exactly the double, it's 0,02 less (0.36*2 - 0.02). That's it > >> > > > improving the time, but not really how much I've expected. And it is > >> > > > because crypto-processor doesn't leave free the processor. > >> > > >> > Ok, I think the problem may be how you are benchmarking it. What commands > >> > are you running to benchmark it ? How are you measuring the CPU usage ? > >> > > >> > OCF has no busy waits and I am fairly confident that the talitos driver > >> > doesn't busy wait for anything, but Kim would know best. > >> > >> it doesn't. > >> > >> > > > I want to add preemtion to talitos, does anyone is working already on it? > >> > > > May I help? > >> > > > >> > > I believe this is due to the wait_event_interruptible call in cryptodev. > >> > > > >> > > Also note that there are other pre-emption issues due to openssl having > >> > > a synchronous crypto api (at least last I checked) - that tends to not > >> > > jive well with asynchronous crypto h/w, such as what you are using. > >> > > >> > Can you recall and details as to how a synchronous userspace API was causing > >> > kernel preemption issues ? > >> > >> not a kernel pre-emption issue per se; I just wanted to mention it > >> makes it harder to overcome serializing the overhead of sending the > >> request to h/w and back. Also, newer talitos h/w can perform ciphers > >> and hashes simultaneously (I'm not sure if the 8272 can do that though). > > > >But the 8272 still has a queue for crypto requests right ? Which means you > >can have several outstanding requests to the HW at any point ? > > > >As long as the HW can queue requests and doesn't busy wait, OCF will > >scale over multiple processes/threads/CPU's, at least to a point where > >it can be explained by bus bandwidth, userspace copy overhead or something :-) > > > >We'll just have to wait and see how Alexandru is testing it, > > > >Cheers, > >Davidm > > > >-- > >David McCullough, dav...@mc..., Ph:+61 734352815 > >McAfee - SnapGear http://www.mcafee.com http://www.uCdot.org > |