From: Hubertus F. <fr...@wa...> - 2002-02-11 19:37:47
|
Enclosed is a prototype implementation of fast user level interprocess synchronization for linux 2.4.17+ and 2.5.2+). I attached the ulocks.tar.bz2 file which contains an (at this time optional) kernel patch, a module implementing the required kernel functionality, the user level portions and an elaborate test program. -- Hubertus Franke <fr...@wa...> ----------------------------------------------------------------------- There have been many acronyms been used for this including: fast semaphores user level locking fast user level interprocess synchronization (FULIPS) I shall use the short-term FULIPS from here on. As discussed in earlier messages on this topic it is desirable to have multi-process applications synchronize efficiently. Most notably databases would like this functionality. At the current time such synchronization relies on standard SysV IPC mechanisms (most commomly SysV semaphores). The disadvantage of using these routines is that in the uncontested cases the overhead of a system call and the kernel functionality is incurred. Instead in FULIPS we allow communication over a shared memory region (e.g. SysV shmseg, or shared mmaped file). A user level datastructure is allocated in that region and accessible to all processes/tasks that have access to that memory. Then in the uncontested case, the lock can be acquird via atomic operations and without entering the kernel. In the contested case, the kernel has to be entered to resolve the waiting and wakeup cases. As can be seen in the RESULTS file/attachment, substantial performance improvements can be achieved. In the following I express some of my opinions regarding issues. Please feel free to provide some constructive criticism on these positions. For Linux to support various middlewares efficiently a good and comprehensive FULIPS support is necessary. So far I have only seen proposals for fast semaphores/mutexes. IHMO-1: For the same reasons multiple reader/single writer locks ------- (rwlocks) were introduced in the kernel, they should also be introduced in a FULIPS library/kernel. In the proposal made by Linus sometime in 5/2001 timeframe, he advocated to locate a kernel pointer/handle and an access signature into the user lock datastructure. Presenting this handle/signature combination is sufficient to gain access to the kernel object handling the contention cases. Though the initial proposal exits a process trying to attempt access with a false handle/signature pair, this proposal is not tight. IMHO-2: We should utilize the memory protection mechanisms in place. ------- In particular in my implementation I pass the virtual address of the user lock into the kernel and I verify access to it. To improve performance a hash table is maintained to map <mm,vaddr> to the kernel object. I maintain hash access statistics and the cases I have tried show on average 1-3 hash chains traversals. In the previous proposal the kernel object related to the user lock are allocated apriori. While one can argue that this is exactly what is done for SysV locks and FULIPS only provide a fast path access to such functionality one can also made a contrary point for dynamic allocation on demand. Physical memory in the system is allocated to accessed virtual pages only and mostly so on demand. Errors with respect to accessibility are determined at access time not at creation time. So why not apply the same logic to the kernel part of user level locks. IMHO-3: Creating the kernel object can be deferred until the first contention ------- is experienced, rather then requiring apriori allocation. Though I don't feel strongly about this position, this is what I implemented in the current code. Through simple restructuring, the code can be adopted to require mandatory creation and lock attachments. e.g lock_create for the first and lock_attach for the following processes. IMHO-4: We should provide build in per/lock statistics. ------- This was easily accomplished in my code and virtually did not create Since I really would like to see a standardized API for FULIPS supported in the kernel and glibc.a I like to solicitate opinion on my stated IMHO-1..4. IMPLEMENTATION -------------- At the current time I structured the code such that it can be loaded as a module rather than requiring constant kernel modification. The absolute minimum required from the kernel is a callback function and a mechanism to invoke it when a VMA with allocated ulocks in the kernel calls munmap(). In that case we need to pass control back to the kernel module implementing the kernel part of FULIPS. The code provided here consists of the following components: (a) KERNEL-PATCH: (optional) a patch to 2.4.17 to support FULIPS in the an external module. It implements the callback function mentioned above. Also define MAP_SEMAPHORE, however currently the libc.a filters unknown flags before passing the flags to the mmap/shmat system call. so this is not effective until libc is recompiled. The requirement of this patch can be bypassed if one is willing to intercept the vm_ops. Unfortunately due to the shm_vm_ops usage, the shm_detach behavior changes slightly, marking the segment for destruction only at application termination and not at shm_detach time. My current module (see below) supports both approaches, so one doesn't need to modify the kernel to experiment with this subsystem. To do so, goto src/kulocks.c #line 56 and do not define USE_VANILLA_KERNEL. (b) KULOCKS Module: src/kulocks.c the dynamically loadable module maintaining kernel related data structures the code to deal with th and a proc file system to access some internal statistics. (c) Include Files: src/include user level definitions and operations. it supports general fast semaphores spinning semaphores with timeout (finite spinning time). It also supports multiple reader/single writer locks. We structured this include directory to mimic the include directory of the kernel. This way we can copy everything right over to the kernel once design is accepted and integration is required. (d) user library: src/ulock.c ulock.c simply provides some support for the spinning initialization and slow path spin implementation, that ultimately should/could move to libc.a. A library is created from this and must be linked with applications. (e) ULOCKFLEX: a reasonable sophisticated test program to verify integrity of once implementation as well as performance Some more comments <asm/ulocks.h> currently is only provided for the i386 architecture. There are basically only a few atomic operations required: decrement, increment, compare_and_swap These can be taken almost identically from the kernel section ensuring that they are SMP enabled. ISSUES: ------- Error handling can become a headache in this scenario. For instance if a process get interrupted while waiting in the kernel, the status word in user space will be out of sync with the state of the semaphores in the kernel. Further operations will compromise the integrity of the program. Handling possible errors in user land can add complexity. I opted therefore to make interrupted system call to restart. That leaves the following error codes to be returned. ENOMEM: Insufficient kernel memory to allocate internal objects (here Linus's approach where there is only one kernel lock object and no references and hash entries does seem advantageous) EFAULT: illegal address was specified system call was issued directly without a wrapper functions, otherwise the wrapper functions would have SEGV EACCES: MAP_SEMAPHORE not specified on the VMA (currently not active). EINVAL: lock type unknown All in all, it seems that the integrity of the program is already compromised (other than the ENOMEM case) and those errors should results in a program check anyway. The other option would be to return the error and repair the status word in the lock routine. Comments on this issue. ULOCKFLEX: ---------- Integrity and performance measurement tool. This tool will exercise the user locks infrastructure. I intend to submit this to the Linux Test Project, once the proper interface for user level locks has been fixed/accepted. To many options to describe here. For fast semaphores (non-spinning) I implemented also the same functionality using SysV semaphores. Simply execute klockflex instead of ulockflex. In essense the tool reports total throughput and lock contention (latter only if compiled in via the -DLOCK_STATISTICS flag). It also verifies whether the locking is working properly. It uses mean locking hold times and non-hold times and uniforms distributions of [ 0.5 .. 1.5 ] * mean to create some randomness in the access and locking events. It allows the specification of number of task (either processes or threads), the number of locks to use, the type of locking (semaphore or rwlocks) and their related parameters. One can specify the type of shared memory (shmat or memory mapped files) and the number of such shared memory objects to be used. It can check for statistical significance when run The file <RESULTS> shows a collection of some runs that demonstrate the potential performance gains that can be obtained by utilizing a FULIPS subsystem. --------------------------------------------------------- What to do to try this out: 1) either (a) patch the kernel and comment out the line "#define USE_VANILLA_KERNEL" in src/kulocks.c#56 (b) or do nothing and this step 2) in top directory issue> make 3) insmod src/kulocks.o 4) cd ulockflex; doit (to run a comformance test) ---------------------------------------------------------- I'd appreciate your feedback. -- Hubertus Franke <fr...@wa...> |