Re: [Ocf-linux-users] talitos driver should be preemtive
Brought to you by:
david-m
From: A. I. G. <ai....@al...> - 2010-05-27 00:10:23
|
Con fecha 26/5/2010, "Kim Phillips" <kim...@fr...> escribió: >On Wed, 26 May 2010 20:32:49 +0200 >" ALEXANDRU IONUT GRAMA" <ai....@al...> wrote: > >> Hello sirs! I really apreciate your fast answer, thank you very much for >> the answers. >> >> At first, I think you should know some characteristics of my system and >> software layer. >> >> At the first, I use a kernel 2.6.21-rc2, with the next options: >> Kernel options ---> >> Timer frequency (300 HZ) ---> >> Preemption Model (Preemptible Kernel (Low-Latency Desktop)) ---> >> [*] Preempt The Big Kernel Lock >> [*] Kernel support for ELF binaries >> As I understand, those options give to the kernel the preemption >> feature.Ocf have been builded as modules, so: >> >> Loadable module support ---> >> [*] Enable loadable module support >> [*] Module unloading >> [*] Automatic kernel module loading >> Cryptographic options ---> >> OCF Configuration ---> >> <M> OCF (Open Cryptograhic Framework) >> <M> cryptodev (user space support) >> <M> cryptosoft (software crypto engine) >> <M> talitos (HW crypto engine) >> (The other options are disabled) >> >> After applying the patch to Openssl-0.9.8n, I've make some changes in >> cryptodev uncommenting the parts relationated with >> --with-cryptodev-digest. (I've understood looking at the code that >> cryptodev-digest feature doesn't work, but i've activated it for >> testing it). After doing it, I've compile Openssl with those options: >> powerpc-linux-gcc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT >> -DDSO_ -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DDSO_DLFCN >> -DHAVE_DLFCN_H -DENGINE_DYNAMIC_SUPPORT -Os >> >> For loading and unloading modules, I've use insmod (or rmmod) with the >> modules ocf,cryptodev,cryptosoft and talitos. >> >> After that, depending if I benchmark with talitos or without talitos I do >> the insmod or rmmod with ocf,cryptodev,cryptosoft and talitos. The >> benchmarking is done with the time command, so for each execution we >> obtain time consumed in user mode (U), system mode (S) and also, total >> elapsed time as Real time (R): >> $ time openssl enc -e -aes-128-cbc -salt -in kkkk -out kkkk.enc -pass >> pass:'micontrasenalarguisima' -engine cryptodev >> >> kkkk is a file was previously generated with dd from /dev/urandom and >> contains 10Megabytes of random data. The measures I'm going to show you >> are the minimum of the results produced by a sequence of 50 executions >> of each time command.We have checked the average is very close to the >> minimum, so the minimum is a very good representation of the best >> possible performance. >> >> Command/Times R(secs) U(secs) S(secs) R-(U+S)* >> 1)with crypto engine 3.4 0.13 3.08 ~=0.026 >> 2)crypto by software 3.59 2.29 1.09 ~=0.056 >> >> The first command (1) is executing with "-engine cryptodev" and the >> modules loaded and the second one (2) is executing with modules removed >> and without "-engine cryptodev", so by software. >> >> * The R-(U+S) given is the average of that computation for each >> individual measurement, so it denotes some minimum and constant >> background activity in the system that constantly enlarges the elapsed >> time measured. >> >> Our first surprise is that total elapsed time with (case 1 ~= case 2) and >> without engine is very similar. We can deduce that the performance of >> the main processor and of the crypto-processor is very similar. It's >> surprising to have a crypto-processor not faster than the main CPU, but >> it could be understood if the main CPU could perform other tasks in >> parallel. So we have repeated those benchmarks with a CPU consuming >> process (an infinite loop) running in background, in order to prove if >> the CPU can perform in parallel really. The results follows: >> >> Command/Times R(secs) U(secs) S(secs) R-(U+S)* >> 3)with crypto engine 6.7 0.12 3.09 ~=3.436 >> 4)crypto by software 6.25 2.31 0.69 ~=3.262 >> +100% CPU in background. >> >> Those figures point us something amazing: It's much faster, cheaper, and >> simple having the CPU without the crypto-processor!!!!!! > >you're not utilizing the crypto h/w at all - the talitos driver needs >to be modified to support the 8272's SEC 1.x. The -engine cryptodev >results are the results from using the kernel's built-in software >crypto algorithm implementations, because you have loaded the >cryptosoft module. > >It should be straightforward to convert talitos to support SEC 1.x h/w; >it has a different ring buffer mechanism (which, if I knew more about, >I'd be able to tell you whether it allowed simultaneous ciphers and >hashes)... > >Kim Thank you Kim, I dind't know I shouldn't load the cryptosoft module. In this guide( http://www.docunext.com/wiki/My_Notes_on_Patching_2.6.22_with_OCF#The_Results ),the author uses cryptosoft, and I thought that I should load it. I've found a guide of SEC1.x at this website: http://cache.freescale.com/files/32bit/doc/user_guide/SEC1SWUG.pdf?fpsp=1&WT_TYPE=Users%20Guides&WT_VENDOR=FREESCALE&WT_FILE_FORMAT=pdf&WT_ASSET=Documentation This one give me the the values that I should send to the crypto-processor for doing the proper operation, but I don't know the meaning of the symbols included in the talitos source code. I want to adapt talitos for being fully compatible with SEC 1.x arch, and if my changes of the code are apropiate for the project, contribute with them to the OCF-project to provide integration with SEC1.x branch. So, David and Kim, when you have some time, could you please give me some explanation about the meaning of the simbols and functions that you use on talitos? It will be very apreciated!!! Thank you to all, you provide light on to my project! Alexandru. ¿David > >> We can see that the time used to crypt the data without cryptodev (case >> 4) is 6.25. This is expected because the CPU is shared (50/50) between >> openssl process and the other background process, and the openssl >> process needs 2.31+0.69=3.00 secs to perform, and 6.25 is ~= 2*3.00 >> secs. Also, the elapsed time for crypt the data (U+S) it's more or less >> the same independly of the existence of the background process, (case 1 >> ~= case 3) and (case 2 ~=case 4). And here comes what it's strange: >> Processing the data with a background process should take few more real >> time that doing it without the bg process, but not the double! (case 3: >> 6.7 > 2*(0.12+3.09)). Supose that, for example, from those 3.09 secs, >> 1.00 is the CPU loadding data to the crypto-processor and the other 2.09 >> secs is waiting for it to finish, then we would expect something more >> simmilar to 2*(0.12+1.00)+2.09 secs = 4.33 secs of real time. But it >> looks like doesn't exist parallelism between the CPU and the >> crypto-processor, and the CPU is waiting for the cryptoprocessor to >> finish without freeing the CPU, as explained in the two following time >> graphs: >> >> The time graph with parallel execution should be like this: >> >> openssl in CPU | = = = = = = = >> other proc in CPU |= = = = ====== ====== ====== >> ------------------------------------------- >> openssl in Crypto-PU | ====== ====== ====== >> >> >> Instead of the previous graph, I'm thinking that the time graph is >> something like this: >> >> openssl in CPU | = = = = = = = >> other proc in CPU |= = = = = = = = = = = = = = = = = = = = = >> ------------------------------------------- >> openssl in Crypto-PU | = = = = = = = = = = = = = = = = = = >> >> In summary, our questions are: >> Why do the case 3 gives 6.7 secs instead of much less as expected? >> Is the first time graph schema correct? >> What can we do for fixing it? >> >> >> Best regards, >> Alexandru. >> >> >> Con fecha 25/5/2010, "David McCullough" <dav...@mc...> >> escribió: >> >> > >> >Jivin Kim Phillips lays it down ... >> >> On Wed, 26 May 2010 07:55:51 +1000 >> >> David McCullough <dav...@Mc...> wrote: >> >> >> >> > >> >> > Jivin Kim Phillips lays it down ... >> >> > > On Mon, 24 May 2010 23:14:55 +0200 >> >> > > " ALEXANDRU IONUT GRAMA" <ai....@al...> wrote: >> >> > > >> >> > > > Hello. My name is Alexandru and I'm doing my final degree project about >> >> > > > porting Linux to a embedded device (a router), that uses a 8272 of >> >> > > > Freescale. My issue to solve is provide IPsec to the router. The >> >> > > > software that I used is a Linux 2.6.19 + Quagga + Openssl + ipsec-tools. >> >> > > > The point was that the processor (8272) came with a crypto-processor >> >> > > > embedded in it that should help in the encryption process. I've found >> >> > > > this excelent project that provide the support of hardware encryption >> >> > > > with Cryptodev + Talitos driver. Also, the patch for Openssl works >> >> > > > perfectly and I've obtained the feature that I need. >> >> > > > >> >> > > > After making some benchmarks I've discovered that talitos is not >> >> > > > preemtive. The crypto-processor (SEC) should make the operations of >> >> > > > encryption/decryption and let the processor idle; the scheduler should >> >> > > > be called and let another process to enter as "active process". After >> >> > > > the crypto-processor finish the job, it should say "I'm done!" by a >> >> > > > IRQ signal and the other encryption process that need the encrypted data >> >> > > > should be activated and continue getting the crypted data from the >> >> > > > address of memory where the crypto-processor writted it. >> >> > > > >> >> > > > Well, the behaviour of talitos seems not to be like that. It's look like >> >> > > > the processor is waiting for the crypto-processor to finish, and after >> >> > > > that it gets the crypted data.That wastes the time of the processor >> >> > > > while is waiting for the crypto-processor to finish. Maybe I'm wrong, >> >> > > > but the benchmarks looks like (2 processes means 1 crypting and another >> >> > > > doing an infinite loop a=1+1): >> >> > > > no CD(R) no CD(U) no CD (S) >> >> > > > 1 process crypting 0.36 0.13 0.20 >> >> > > > 2 processes(1+1*) 0.72 0.13 0.20 >> >> > > > It looks normal without Cryptodev that the user and system time be the >> >> > > > same, but the the real be double, because there's another process >> >> > > > requiring the CPU. >> >> > > > >> >> > > > Benchmarking the system with Cryptodev I've obtain the more or less the >> >> > > > same times (much more system time that without it), but it's not >> >> > > > exactly the double, it's 0,02 less (0.36*2 - 0.02). That's it >> >> > > > improving the time, but not really how much I've expected. And it is >> >> > > > because crypto-processor doesn't leave free the processor. >> >> > >> >> > Ok, I think the problem may be how you are benchmarking it. What commands >> >> > are you running to benchmark it ? How are you measuring the CPU usage ? >> >> > >> >> > OCF has no busy waits and I am fairly confident that the talitos driver >> >> > doesn't busy wait for anything, but Kim would know best. >> >> >> >> it doesn't. >> >> >> >> > > > I want to add preemtion to talitos, does anyone is working already on it? >> >> > > > May I help? >> >> > > >> >> > > I believe this is due to the wait_event_interruptible call in cryptodev. >> >> > > >> >> > > Also note that there are other pre-emption issues due to openssl having >> >> > > a synchronous crypto api (at least last I checked) - that tends to not >> >> > > jive well with asynchronous crypto h/w, such as what you are using. >> >> > >> >> > Can you recall and details as to how a synchronous userspace API was causing >> >> > kernel preemption issues ? >> >> >> >> not a kernel pre-emption issue per se; I just wanted to mention it >> >> makes it harder to overcome serializing the overhead of sending the >> >> request to h/w and back. Also, newer talitos h/w can perform ciphers >> >> and hashes simultaneously (I'm not sure if the 8272 can do that though). >> > >> >But the 8272 still has a queue for crypto requests right ? Which means you >> >can have several outstanding requests to the HW at any point ? >> > >> >As long as the HW can queue requests and doesn't busy wait, OCF will >> >scale over multiple processes/threads/CPU's, at least to a point where >> >it can be explained by bus bandwidth, userspace copy overhead or something :-) >> > >> >We'll just have to wait and see how Alexandru is testing it, >> > >> >Cheers, >> >Davidm >> > >> >-- >> >David McCullough, dav...@mc..., Ph:+61 734352815 >> >McAfee - SnapGear http://www.mcafee.com http://www.uCdot.org >> |