From: ljsebald <ljs...@us...> - 2023-09-16 02:24:01
|
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "A pseudo Operating System for the Dreamcast.". The branch, master has been updated via 468b40885ea409f01bc5eab897f55a0f8c11b5c6 (commit) from f4ccdacf521f130f7daddc488d7e81b1ed1a5a48 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 468b40885ea409f01bc5eab897f55a0f8c11b5c6 Author: Falco Girgis <gyr...@gm...> Date: Fri Sep 15 21:22:34 2023 -0500 Enable TLS Support for C/C++ (#111) * Enable TLS Support * Minor fixes for TLS Support Added an include causing the toolchain to fail to build Fixed a copy/paste error in the example Makefile Changed example to use printf Added cast to GBR pointer to remove warning * Change printf() to dbglog() in thread.c * Modify compiler_tls.c example to work with GCC4 * Add CFLAG to specify tls-model * Initialize Thread Control Block * Added more test coverage to C/C++ TLS example - Added coverage for main thread's TLS data getting flexed - Added arbitary alignments for flexing tdata and tbss alignment * Enable TLS Support * Minor fixes for TLS Support Added an include causing the toolchain to fail to build Fixed a copy/paste error in the example Makefile Changed example to use printf Added cast to GBR pointer to remove warning * Modify compiler_tls.c example to work with GCC4 * Add CFLAG to specify tls-model * Initialize Thread Control Block * Fix issue with TLS on main thread GBR register needed to be initialized after setting up the thread control block for the main thread. * Added alignment check to compiler_tls example * Made every thread run TLS alignment tests * Got aligned TLS partway working - added TLS exec model (local) to CMake toolchain - added .tdata and .tbss alignment variables to .rodata in DC linker script - made KOS threads *attempt* to create TLS storage with proper alignment (still WIP) - updated compiler_tls example to reproduce issues with aligned TLS that still need to be resolved * C/C++ TLS FINALLY WORKS!!! - Finally figured out how to align the segments properly based on ELF TLS data segment information given through the DC linker script - Finished and fleshed out the compiler_tls test to reproduce all of the issues and serve as a potentially automatable test suite. * Added linker script alignment for NAOMI - Added the same update to the NAOMI linker script that the DC got for .TDATA and .TBSS alignment - Updated CHANGELOG with Colton and my TLS work * EMERGENCY bugfix for when .tdata/.tbss are empty! Holy crap, the optimizer in non-release builds will totally optimize away zero-size checks, because the source of the check is the memory address of a linker-script exported variable... and why would the compiler think an address could be NULL? Had to make sizes volatile to circumvent this in release builds. * Cleaned up example, buffed up threads - Cleaned up some styling issues with example - Noticed I wasn't properly getting return values from thd_join() nor was I checking its return code properly - Noticed BlueCrab was spawning 600(!!!!) threads in his "once_init" threading test, so I figured I should stress this one with 200. * Apply suggestions from code review Co-authored-by: Lawrence Sebald <ljs...@us...> * Fixed formatting for tls segment alloc kernel code * Swapped aligned_alloc() out for memalign() * Removed extra space in assignment --------- Co-authored-by: Colton Pawielski <cep...@mt...> Co-authored-by: Lawrence Sebald <ljs...@us...> ----------------------------------------------------------------------- Summary of changes: doc/CHANGELOG | 1 + environ_dreamcast.sh | 2 +- examples/dreamcast/basic/threading/Makefile | 3 + .../basic/threading/compiler_tls/Makefile | 31 ++++ .../basic/threading/compiler_tls/compiler_tls.c | 203 +++++++++++++++++++++ include/kos/thread.h | 18 +- kernel/thread/thread.c | 105 +++++++++++ utils/cmake/dreamcast.toolchain.cmake | 2 +- utils/dc-chain/scripts/gcc-pass1.mk | 1 - utils/dc-chain/scripts/gcc-pass2.mk | 1 - utils/ldscripts/shlelf-naomi.xc | 26 ++- utils/ldscripts/shlelf.xc | 26 ++- 12 files changed, 408 insertions(+), 11 deletions(-) create mode 100644 examples/dreamcast/basic/threading/compiler_tls/Makefile create mode 100644 examples/dreamcast/basic/threading/compiler_tls/compiler_tls.c diff --git a/doc/CHANGELOG b/doc/CHANGELOG index d00c6d1..6ec3a10 100644 --- a/doc/CHANGELOG +++ b/doc/CHANGELOG @@ -189,6 +189,7 @@ KallistiOS version 2.1.0 ----------------------------------------------- - DC Added DMA YUV converter path. Adjust some name of related #defines. Added yuv examples [AB] - *** Added GCC builtin functions for supporting all of C11 atomics [FG] +- *** Added toolchain and KOS support for C/C++ compiler-level TLS [CP && FG] KallistiOS version 2.0.0 ----------------------------------------------- - DC Broadband Adapter driver fixes [Megan Potter == MP] diff --git a/environ_dreamcast.sh b/environ_dreamcast.sh index ec68262..cb092da 100644 --- a/environ_dreamcast.sh +++ b/environ_dreamcast.sh @@ -1,7 +1,7 @@ # KallistiOS environment variable settings. These are the shared pieces # for the Dreamcast(tm) platform. -export KOS_CFLAGS="${KOS_CFLAGS} -ml -m4-single-only -ffunction-sections -fdata-sections -matomic-model=soft-imask" +export KOS_CFLAGS="${KOS_CFLAGS} -ml -m4-single-only -ffunction-sections -fdata-sections -matomic-model=soft-imask -ftls-model=local-exec" export KOS_AFLAGS="${KOS_AFLAGS} -little" if [ x${KOS_SUBARCH} = xnaomi ]; then diff --git a/examples/dreamcast/basic/threading/Makefile b/examples/dreamcast/basic/threading/Makefile index 91f63df..e76cb63 100644 --- a/examples/dreamcast/basic/threading/Makefile +++ b/examples/dreamcast/basic/threading/Makefile @@ -5,6 +5,7 @@ # all: + $(KOS_MAKE) -C compiler_tls $(KOS_MAKE) -C general $(KOS_MAKE) -C rwsem $(KOS_MAKE) -C recursive_lock @@ -14,6 +15,7 @@ all: $(KOS_MAKE) -C atomics clean: + $(KOS_MAKE) -C compiler_tls clean $(KOS_MAKE) -C general clean $(KOS_MAKE) -C rwsem clean $(KOS_MAKE) -C recursive_lock clean @@ -23,6 +25,7 @@ clean: $(KOS_MAKE) -C atomics clean dist: + $(KOS_MAKE) -C compiler_tls dist $(KOS_MAKE) -C general dist $(KOS_MAKE) -C rwsem dist $(KOS_MAKE) -C recursive_lock dist diff --git a/examples/dreamcast/basic/threading/compiler_tls/Makefile b/examples/dreamcast/basic/threading/compiler_tls/Makefile new file mode 100644 index 0000000..6e3e692 --- /dev/null +++ b/examples/dreamcast/basic/threading/compiler_tls/Makefile @@ -0,0 +1,31 @@ +# KallistiOS ##version## +# +# basic/threading/compiler_tls/Makefile +# +# Copyright (C) 2001 Megan Potter +# Copyright (C) 2023 Colton Pawielski +# + +EXAMPLE_NAME=compiler_tls + +all: rm-elf $(EXAMPLE_NAME).elf + +include $(KOS_BASE)/Makefile.rules + +OBJS = $(EXAMPLE_NAME).o + +clean: + -rm -f $(EXAMPLE_NAME).elf $(OBJS) + +rm-elf: + -rm -f $(EXAMPLE_NAME).elf + +$(EXAMPLE_NAME).elf: $(OBJS) + $(KOS_CC) $(KOS_CFLAGS) $(KOS_LDFLAGS) -o $(EXAMPLE_NAME).elf -ftls-model=local-exec $(KOS_START) $(OBJS) $(DATAOBJS) $(OBJEXTRA) $(KOS_LIBS) -Wl,-Map=$(EXAMPLE_NAME).map + +run: $(EXAMPLE_NAME).elf + $(KOS_LOADER) $(EXAMPLE_NAME).elf + +dist: + -rm -f $(OBJS) + $(KOS_STRIP) $(EXAMPLE_NAME).elf diff --git a/examples/dreamcast/basic/threading/compiler_tls/compiler_tls.c b/examples/dreamcast/basic/threading/compiler_tls/compiler_tls.c new file mode 100644 index 0000000..25be21a --- /dev/null +++ b/examples/dreamcast/basic/threading/compiler_tls/compiler_tls.c @@ -0,0 +1,203 @@ +/* KallistiOS ##version## + + compiler_tls.c + + Copyright (C) 2023 Colton Pawielski + Copyright (C) 2023 Falco Girgis + + A simple example showing off thread local variables + + This example launches several threads that access variables + placed in the TLS segment by the compiler. The compiler + is then able to generate trivial lookups based on the GBR + register which holds the address to the current threads's + control block. + + This example also doubles as a validator for KOS and the + toolchain's TLS implementation by verifying proper behavior + of each TLS segment with various different kinds of data + and alignment requirements. + + */ + +#include <kos.h> +#include <stdbool.h> +#include <stdlib.h> + +#if (__GNUC__ <= 4) +/* GCC4 only supports using TLS with the __thread identifier, + even when passed the -std=c99 flag */ +#define thread_local __thread +#else +/* Newer versions of GCC use C11's _Thread_local to specify TLS */ +#define thread_local _Thread_local +#endif + +typedef struct { + uint8_t inner[3]; +} Align4; + +typedef struct { + uint8_t inner[3]; +} Align16; + +/* Various types of thread-local variables, coming from both the .TDATA and .TBSS + segments with both manual over-alignment and default alignment. + + NOTE: For the sake of validation, these must be declared volatile, otherwise the + compiler will optimize away conditionals checking for values that it knows should be + constant! +*/ +static thread_local volatile _Alignas(4) Align4 tls_buff4 = {.inner = {2, 2, 2}}; +static thread_local volatile _Alignas(16) Align16 tls_buff16 = {.inner = {1, 1, 1}}; +static thread_local volatile uint16_t tls_uint16[256] = { 0 }; +static thread_local volatile _Alignas(32) uint32_t tbss_test = 0; +static thread_local volatile _Alignas(32) char tls_string[] = { "abcdefghijklmnopqrstuvwxyz012345" }; +static thread_local volatile uint32_t tdata_test = 0x5A; + +/* + Main thread function. Puts each thread through an array of + tests on its own set of thread-local data to ensure proper + initialization, alignment, and uniqueness. + */ +void *thd(void *v) { + int i; + int id = (int)v; + int ret = 0; + + printf("Started Thread %d\n", id); + + /* Ensure zero-initialized .TBSS data have the correct initial + values and are unique to each thread. */ + for(i = 0; i < 5; i++) { + printf("Thread[%d]\tbss_test = 0x%lX\n", id, tbss_test); + tbss_test++; + thd_sleep(50); + } + + if(tbss_test != 5) { + fprintf(stderr, "TBSS data check failed!\n"); + ret = -1; + } + + /* Ensure value-initialized .TDATA data have the correct initial + values and are unique to each thread. */ + for(i = 0; i < 5; i++) { + printf("Thread[%d]\ttdata_test = 0x%lX\n", id, tdata_test); + tdata_test++; + thd_sleep(50); + } + + if(tdata_test != 0x5F) { + fprintf(stderr, "TDATA data check failed!\n"); + ret = -1; + } + + /* Ensure default-aligned .TBSS data is initialized properly. */ + for(i = 0; i < 256; ++i) { + if(tls_uint16[i] != 0) { + fprintf(stderr, "tls_uint16[%d] failed!\n", i); + ret = -1; + break; + } + } + + /* Ensure manually over-aligned .TDATA data is initialized properly. */ + if(strcmp((const char *)tls_string, "abcdefghijklmnopqrstuvwxyz012345")) { + fprintf(stderr, "tls_string check failed: %s\n", (const char *)tls_string); + ret = -1; + } + + /* Check if at least one byte has been offset improperly + within either of these two oddly-sized structures which could + be potentially misaligned. + + Thanks to the DevkitPro 3DS guys for creating this test case + to flex their own TLS implementation. + */ + + bool reproduced = false; + + printf("["); + for(i = 0; i < 3; i++) { + if(tls_buff4.inner[i] != 2) + reproduced = true; + + printf("%d, ", tls_buff4.inner[i]); + } + + printf("]\n"); + + printf("["); + for(i = 0; i < 3; i++) { + if(tls_buff16.inner[i] != 1) + reproduced = true; + + printf("%d, ", tls_buff16.inner[i]); + } + + printf("]\n"); + + if(reproduced) { + fprintf(stderr, "Bug has been reproduced!\n"); + ret = -1; + } + else { + printf("There has been no issue!\n"); + } + + printf("Finished Thread %d\n", id); + + /* Return the result back to the main function. */ + return (void *)ret; +} + +int main(int argc, char **argv) { + /* This is ridiculous, but lets do it anyway. */ + const int thread_count = 200; + kthread_t *threads[thread_count]; + int i, ret, result = 0; + + printf("Starting Threads\n"); + + /* Create a bunch of threads and put each through + the same series of tests on their own (hopefully) + independent set of thread-local variables. */ + for(i = 0; i < thread_count; i++) { + threads[i] = thd_create(0, thd, (void *)i + 1); + } + + /* Put the main thread through the same tests as thread 0. */ + ret = (int)thd((void *)0); + printf("Thread[0] returned: %d\n", ret); + + if(ret == -1) + result = -1; + + /* Wait for each thread to return with its test result. */ + for(i = 0; i < thread_count; i++) { + int res = thd_join(threads[i], (void**)&ret); + + if(res < 0) { + fprintf(stderr, "Thread[%d] failed to join: %d\n", i + 1, res); + result = -1; + } + else { + printf("Thread[%d] returned: %d\n", i + 1, ret); + + if(ret == -1) + result = -1; + } + } + + printf("Threads Finished!\n"); + + if(result == -1) { + fprintf(stderr, "\n\n***** TLS TEST FAILED! *****\n\n"); + return EXIT_FAILURE; + } + else { + printf("\n\n***** TLS TEST SUCCESS! *****\n\n"); + return EXIT_SUCCESS; + } +} diff --git a/include/kos/thread.h b/include/kos/thread.h index 83dbba0..6a74ecb 100644 --- a/include/kos/thread.h +++ b/include/kos/thread.h @@ -17,6 +17,7 @@ __BEGIN_DECLS #include <arch/irq.h> #include <sys/queue.h> #include <sys/reent.h> +#include <stdint.h> #include <stdint.h> @@ -105,6 +106,18 @@ TAILQ_HEAD(ktqueue, kthread); LIST_HEAD(ktlist, kthread); /* \endcond */ +/** \brief Control Block Header + \ingroup threads + + Header preceeding the static TLS data segments as defined by + the SH-ELF TLS ABI (version 1). This is what the thread pointer + (GBR) points to for compiler access to thread-local data. +*/ +typedef struct tcbhead { + void *dtv; /**< \brief Dynamic TLS vector (unused) */ + uintptr_t pointer_guard; /**< \brief Pointer guard (unused) */ +} tcbhead_t; + /** \brief Structure describing one running thread. \ingroup threads @@ -184,10 +197,13 @@ typedef struct kthread { /** \brief Our reent struct for newlib. */ struct _reent thd_reent; - /** \brief Thread-local storage. + /** \brief OS-level thread-local storage. \see kos/tls.h */ struct kthread_tls_kv_list tls_list; + /** \brief Compiler-level thread-local storage. */ + tcbhead_t* tcbhead; + /** \brief Return value of the thread function. This is only used in joinable threads. */ void *rv; diff --git a/kernel/thread/thread.c b/kernel/thread/thread.c index 18843ba..d134e04 100644 --- a/kernel/thread/thread.c +++ b/kernel/thread/thread.c @@ -10,6 +10,7 @@ #include <string.h> #include <malloc.h> #include <stdio.h> +#include <stdlib.h> #include <reent.h> #include <errno.h> #include <kos/thread.h> @@ -21,6 +22,7 @@ #include <arch/irq.h> #include <arch/timer.h> #include <arch/arch.h> +#include <assert.h> /* @@ -35,6 +37,16 @@ also using their queue library verbatim (sys/queue.h). */ +/* TLS Section ELF data - exported from linker script. */ +extern int _tdata_start, _tdata_size; +extern int _tbss_size; +extern long _tdata_align, _tbss_align; + +/* Utility function for aligning an address or offset. */ +static inline size_t align_to(size_t address, size_t alignment) { + return (address + (alignment - 1)) & ~(alignment - 1); +} + /*****************************************************************************/ /* Thread scheduler data */ @@ -327,6 +339,88 @@ int thd_remove_from_runnable(kthread_t *thd) { return 0; } +/* Creates and initializes the static TLS segment for a thread, + composed of a Thread Control Block (TCB), followed by .TDATA, + followed by .TBSS, very carefully ensuring alignment of each + subchunk. +*/ +static void *thd_create_tls_data(void) { + size_t align, tdata_offset, tdata_end, tbss_offset, + tbss_end, align_rem, tls_size; + + tcbhead_t *tcbhead; + void *tdata_segment, *tbss_segment; + + /* Cached and typed local copies of TLS segment data for sizes, + alignments, and initial value data pointer, exported by the + linker script. + + SIZES MUST BE VOLATILE or the optimizer on non-debug builds will + optimize zero-check conditionals away, since why would the + address of a variable be NULL? (Linker script magic, it can be.) + */ + const volatile size_t tdata_size = (size_t)(&_tdata_size); + const volatile size_t tbss_size = (size_t)(&_tbss_size); + const size_t tdata_align = tdata_size ? (size_t)_tdata_align : 1; + const size_t tbss_align = tbss_size ? (size_t)_tbss_align : 1; + const uint8_t *tdata_start = (const uint8_t *)(&_tdata_start); + + /* Each subsegment of the requested memory chunk must be aligned + by the largest segment's memory alignment requirements. + */ + align = 8; /* tcbhead_t has to be aligned by 8. */ + if(tdata_align > align) + align = tdata_align; /* .TDATA segment's alignment */ + if(tbss_align > align) + align = tbss_align; /* .TBSS segment's alignment */ + + /* Calculate the sizing and offset location of each subsegment. */ + tdata_offset = align_to(sizeof(tcbhead_t), align); + tdata_end = tdata_offset + tdata_size; + tbss_offset = align_to(tdata_end, tbss_align); + tbss_end = tbss_offset + tbss_size; + + /* Calculate final aligned size requirement. */ + align_rem = tbss_end % align; + tls_size = tbss_end; + + if(align_rem) + tls_size += (align - align_rem); + + /* Allocate combined chunk with calculated size and alignment. */ + tcbhead = memalign(align, tls_size); + assert(tcbhead); + assert(!((uintptr_t)tcbhead % 8)); + + /* Since we aren't using either member within it, zero out tcbhead. */ + memset(tcbhead, 0, sizeof(tcbhead_t)); + + /* Initialize .TDATA */ + if(tdata_size) { + tdata_segment = (uint8_t *)tcbhead + tdata_offset; + + /* Verify proper alignment. */ + assert(!((uintptr_t)tdata_segment % tdata_align)); + + /* Initialize tdata_segment with .tdata bytes from ELF. */ + memcpy(tdata_segment, tdata_start, tdata_size); + } + + /* Initialize .TBSS */ + if(tbss_size) { + tbss_segment = (uint8_t *)tcbhead + tbss_offset; + + /* Verify proper alignment. */ + assert(!((uintptr_t)tbss_segment % tbss_align)); + + /* Zero-initialize tbss_segment. */ + memset(tbss_segment, 0, tbss_size); + } + + /* Return segment head: this is what GBR points to. */ + return tcbhead; +} + /* New thread function; given a routine address, it will create a new kernel thread with the given attributes. When the routine returns, the thread will exit. Returns the new thread struct. */ @@ -383,6 +477,9 @@ kthread_t *thd_create_ex(const kthread_attr_t *restrict attr, nt->stack_size = real_attr.stack_size; + /* Create static TLS data */ + nt->tcbhead = thd_create_tls_data(); + /* Populate the context */ params[0] = (uint32_t)routine; params[1] = (uint32_t)param; @@ -392,6 +489,8 @@ kthread_t *thd_create_ex(const kthread_attr_t *restrict attr, ((uint32_t)nt->stack) + nt->stack_size, (uint32_t)thd_birth, params, 0); + /* Set Thread Pointer */ + nt->context.gbr = (uint32_t)nt->tcbhead; nt->tid = tid; nt->prio = real_attr.prio; nt->flags = THD_DEFAULTS; @@ -475,6 +574,9 @@ int thd_destroy(kthread_t *thd) { /* Free its stack */ free(thd->stack); + /* Free static TLS segment */ + free(thd->tcbhead); ...<truncated>... hooks/post-receive -- A pseudo Operating System for the Dreamcast. |