From: C. A. M. <an...@ma...> - 2024-05-16 19:21:30
|
The metadata cache size was hard-coded to 8 metadata blocks. A large parallel workload can cause a lot of spinlock thrashing in squashfs_cache_get if the number of metadata blocks is smaller than the number of parallel metadata reads, as the decompression time can keep the metadata cache full, and squashfs_cache_get uses a simple spinlock to synchronize the cache. Allow the cache size to be tuned by adding CONFIG_SQUASHFS_METADATA_CACHE_SIZE which defaults to the old hard-coded value of 8. A good setting for systems with plenty of memory would be something a big larger than the expected number of parallel readers on a single squashfs. For highly memory constrained systems, a smaller setting may be appropriate. This issue was discovered on an embedded where a large performance drop in boot times was noticed when the system from an 8 core (4 physical) machine to a 16 core (8 physical) machine. It was discovered that much CPU time was being spun away in the spin_lock call in squashfs_cache_get. This was due to the fact that the metadata cache is fixed at 8 entries, and having more cores allowed more parallel file system walks (which happens to be a part of one of our service start scripts for each of our many parallel services). Because when each cache entry is released all waiting cores are awakened to attempt to grab another cache entry, those cores fight over the spinlock just to find out they are not going to get another cache entry. While this commit isn't a general solution, it does provide a simple way for one to configure their kernel to alleviate the performance issue. A better solution would be to use a less CPU intensive and preemptable synchronization method, and to only wake up one waiter when one cache entry comes up. Others have pointed out this issue: http://lkml.iu.edu/hypermail/linux/kernel/1805.0/01702.html And this is a similar issue, but on the data cache, but points out many of the same technical issues with squashfs_cache_get: https://chrisdown.name/2018/04/17/kernel-adventures-the-curious-case-of-squashfs-stalls.html A simple way to reproduce and measure the time for various parallel workloads (assuming a fairly large number of directories and files in the squashfs): time ( N=16; for ((i=0;i<$N;++i)); \ do find /path/to/mounted/squash/ -print > /dev/null & done; \ for ((i=0;i<$N;++i)); do wait; done) On one system, with N=8, the loop above takes 1 second of elapsed time, but on the same system with N=16, it takes 13 seconds (when 2 would be a reasonable scale up). --- fs/squashfs/Kconfig | 20 ++++++++++++++++++++ fs/squashfs/squashfs_fs.h | 2 +- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig index 60fc98bdf4212..311e3141df9af 100644 --- a/fs/squashfs/Kconfig +++ b/fs/squashfs/Kconfig @@ -264,3 +264,23 @@ config SQUASHFS_FRAGMENT_CACHE_SIZE Note there must be at least one cached fragment. Anything much more than three will probably not make much difference. + +config SQUASHFS_METADATA_CACHE_SIZE + int "Number of metadata blocks cached" if SQUASHFS_EMBEDDED + depends on SQUASHFS + default "8" + help + By default SquashFS caches the last 8 metadata blocks read from the + filesystem. A metadata block is 8KiB. Increasing this amount may + mean SquashFS has to re-read metadata less often from disk, at the + expense of extra system memory. Decreasing this amount will mean + SquashFS uses less memory at the expense of extra reads from disk. + + Note there must be at least one cached metadata block. A setting too + low with a large parallel workload can cause a lot of spinlock + thrashing in squashfs_cache_get. A good setting for the metadata + cache size is something a bit larger than the number of expected + parallel metadata reads. When booting with multiple services on a + single squashfs on a machine with a lot of cores, a higher setting + than the default will net a large performance improvement by avoiding + spinlock thrashing. diff --git a/fs/squashfs/squashfs_fs.h b/fs/squashfs/squashfs_fs.h index 95f8e89017689..c4e32358f922c 100644 --- a/fs/squashfs/squashfs_fs.h +++ b/fs/squashfs/squashfs_fs.h @@ -202,7 +202,7 @@ static inline int squashfs_block_size(__le32 raw) #define SQUASHFS_XATTR_OFFSET(A) ((unsigned int) ((A) & 0xffff)) /* cached data constants for filesystem */ -#define SQUASHFS_CACHED_BLKS 8 +#define SQUASHFS_CACHED_BLKS CONFIG_SQUASHFS_METADATA_CACHE_SIZE /* meta index cache */ #define SQUASHFS_META_INDEXES (SQUASHFS_METADATA_SIZE / sizeof(unsigned int)) -- 2.34.1 |