Re: [Ocf-linux-users] OCF userland support
Brought to you by:
david-m
From: Egor N. M. <eg...@pa...> - 2007-10-11 18:29:18
|
David, I was able to go to unlocked_ioctl. Things improved quite a bit. I suppose this should be included in the next ocf release, although there is a chance that it will expose bugs in some drivers - like it did in mine. The problem that I saw with ocf-bench, turned out to be my driver bug after all - I accidentally had one of the variables declared static in *_process(), and it was being modified by other threads. Actually, it seems the bkl was in ioctl for awhile, I looked in 2.4 tree, it is there. Another issue is that I think the throughput calculation is not correct in cryptotest.c, at least not for SMP environment. To calculate the throughput we want to have total data size processed divided by real time it took to process it. So the nops should always be multiplied by the number of threads. Also, to calculate time we are interested in real time, not the time that processes spent running divided by the number of threads. This gives incorrect results for SMP systems with more than 1 thread. Ideally we would synchronize threads, but for now I take delta between first process start time and last process stop time as execution time. David McCullough wrote: >Jivin Egor N. Martovetsky lays it down ... > > >>David - thanks for a quick response. I have comments in line. >> >>David McCullough wrote: >> >> >> >>>Jivin Egor N. Martovetsky lays it down ... >>> >>> >>> >>> >>>>David, >>>> >>>>I noticed that the throughput I get when using cryptotest or OpenSSL speed >>>>is much worse than what I get using ocf-bench. I also don't get much >>>>improvement, if at all, when running mutiple threads of the above >>>>programs. >>>> >>>> >>>> >>>> >>>ocf-bench runs in kernel mode, the data does not need to be >>>copied from user space to kernel space and back. This make a massive >>>difference to performance. >>> >>>All user apps need to pass their data through to the kernel and back. >>>Unfortunately we don't have a zero copy API for OCF (yet ;-) >>> >>>Basically, for OCF accelerated user space, you need to be using larger >>>packets to help overcome the overheads of the user-kernel-user copies, >>>but it will never be a good as in-kernel crypto with a zero copy >>>interface. >>> >>> >>Yes, I was aware of the copying, and it explains some of the performance >>degradation, >>that I see with a single thread user space program vs. kernel mode. As >>you point out, the performance >>of user space program gets better relative to kernel mode, as the packet >>size is increased. However, >>in ocf-bench I can keep cpu 100% utilized submitting and processing done >>packets, while a single thread >>of cryptotest is unable to do that, so I tried to run a few threads, and >>saw the throughput get worse. >> >> > >All that makes sense, except that it got worse, see below. > > > >>>>It seems that this is a result of using ioctl(vs unlocked_ioctl) to >>>>access /dev/crypto, which >>>>would only allow one process doing crypto at a given time. Is that a >>>>known problem and >>>>are there plans to fix it? >>>> >>>> >>>I wasn't aware that ioctl would prevent multiple processes from working >>>in parallel. I have seen performance improvements with multiple threads >>>on 2.4 systems. Haven't checked on 2.6 >>> >>> >>In 2.6 kernel the do_ioctl() function in fs/ioctl.c does a kernel lock >>before calling device's ioctl. >> >> > >Ok, that is just plain ugly :-( This used to be ok and I obviously missed >the addition of ioctl_unlocked and the BKL. > >It should be safe to switch cryptodev across to ioctl_unlocked since >that is what the code expects (and gets on other kernels/systems). > > > >>Since cryptodev ioctl submits a packet and waits for completion before >>returning, effectively >>only one request can be processed at a given time, and I am not able to >>take advantage of multiple >>crypto channels executing in parallel. >> >> >> >>>>Also, is ocf-bench SMP safe? I had to set CRYPTO_F_BATCH in the >>>>crp_flags to make it work, >>>>otherwise with the CRYPTO_F_CBIMM it would not work in the SMP mode. >>>> >>>> >>>I have never thought nor checked that ocf-bench is SMP safe. Which OCF >>>driver are you using when doing your tests ? It could explain a few >>>things, >>> >>> >>> >>It's my own driver, for a new PA Semi chip, and since it is still under >>development - >>yes, it can explain a few things. :) >> >> > >I was more interested in whether is was cryptosoft or one of the HW >drivers. Generally the HW drivers work better with immediate callbacks >as there is still a "gap" between the callin and callback. > >When your completion call is run before you have returned from the >initial request, your code needs to be a lot more careful ;-) > >Unfortunately OCF hasn't had a huge amount of SMP focus. I have run it >on SMP machines using hifn drivers, but not that often. So you may hit >some other SMP issues. > > > >>But in this case, I don't think so, because in general it works fine, >>and ocf-bench >>works fine in nosmp mode, or with CRYPTO_F_BATCH mode, which makes >>the completions go through a callback queue that is protected by >>spinlocks, as opposed >>to immediate callbacks. >> >> > >If you get a handle on what is happening let me know, it would be nice >to get it fixed. > >Cheers, >Davidm > > > -- Egor N. Martovetsky |