loop-AES / Feature Requests / #18 Dual-Core/SMP parallelization support

Jari Ruusu - 2006-01-26

Logged In: YES
user_id=238645

Following applies to 2.4 and 2.6 kernel versions:
Writes are already parallelized to some extent, reads
are not. If writing process is able to allocate
internal buffer where to store the ciphertext, then
encryption is done using context of writing process
regardless of what cpu it happens to run on. If writing
process is not able to allocate internal buffer, then
encryption work is pushed to loop helper thread.
Decryption is always handled by loop helper thread.
Currently there is only one helper thread per loop
device.

Many cases involving modern processors, the AES
implementation, especially the AMD64 optimized
assembler implementation, is already fast enough to
exceed disk data transfer speed even on one processor
core.

Do you have disk system that is fast enough to fully
utilize one core of modern processor running loop-AES?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-01-27

Logged In: NO

Hi jariruusu,

I'm running loop-aes on a small server with two Opteron 244
processors and a 6 disk HW RAID. The plain RAID is able to
deliver 60-100 MB/s during sustained read of a 16G file and
is capable to write 40 MB/s (same file size). A loop-aes
partition on the same array reads and writes around 25-30
MB/s. I've rarely seen more than one busy processor on
I/O-intensive tasks, even if I start multiple concurrent
disk transfers.

I'm currently running kernel 2.6.14 (x86_64) and loop-aes 3.1b.

Yours,
Holger

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jari Ruusu - 2006-01-28

Logged In: YES
user_id=238645

Speed of assembler AES implementation on 1.6 GHz Opteron
key length 128 bits, encrypt speed 1106.6 Mbits/s
key length 128 bits, decrypt speed 1107.0 Mbits/s
key length 192 bits, encrypt speed 932.3 Mbits/s
key length 192 bits, decrypt speed 933.3 Mbits/s
key length 256 bits, encrypt speed 807.8 Mbits/s
key length 256 bits, decrypt speed 813.7 Mbits/s

Speed of assembler MD5 implementation on 1.6 GHz Opteron
md5 IV speed 2367.1 Mbits/sec

Combining above 128 bit AES + MD5, one 1.6 GHz Opteron
should be able to handle about 89 MB/s. Above numbers
are for crypto operations in userspace, not including
any file system, loop driver, block layer, or disk
waiting overhead.

If you want to attempt to optimize loop-AES
performance, you can try these optimizations:

1) Try using deadline I/O scheduler (boot with
elevator=deadline kernel parameter). Deadline I/O
scheduler may reduce situations where loop driver
has to wait for I/O to complete on underlying
device.

2) Try using built-in loop driver by applying kernel
patch that is present in loop-AES tarball. This
should reduce TLB cache misses and improve
performance a litle bit.

3) Try using larger page pre-allocation. For module
version add "options loop lo_prealloc=512" line to
/etc/modprobe.conf, or alternatively, if you are
using built-in loop driver by applying the kernel
patch, adding "lo_prealloc=512" kernel parameter.
Larger pre-allocation may reduce situations where
loop driver has to wait for I/O to complete on
underlying device.

4) Try using 128 bit AES keys instead of 256 bit. 128
bit keys use smaller number of rounds and are little
bit faster.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jari Ruusu - 2006-03-19

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jari Ruusu - 2006-03-19

Logged In: YES
user_id=238645

Adding multiple worker thread support would mean much
re-writing of loop code. The nasty problem being
getting barrier ordering right. As of this writing, I
do not plan to make such changes myself. If someone
sends a patch for that, then I will seriously consider
merging it. But for now... I am closing this feature
request.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dual-Core/SMP parallelization support

Group

Searches

Help

#18 Dual-Core/SMP parallelization support

Discussion