From: Vegard N. <veg...@gm...> - 2016-05-21 13:51:14
|
Hi people, I'm having some trouble with using current_thread_info() during UML early boot. Sometimes it works just fine, but often I get segfaults because current_thread_info() is returning an invalid pointer. It looks random: 0x202118, 0x1003e0003, 0xd33b90b3, 0x6db043, etc. I found an earlier thread which described the same problem: http://permalink.gmane.org/gmane.linux.uml.devel/14642 However, I think the patch there is a bit hacky/papers over an underlying bug, since it just uses is_kernel_addr() before deciding whether to return the pointer from current_thread_info() or not. The fact that the crash is random leads me to think it's some sort of race during the UML boot. Does anybody understand fully what's going on here, why it returns those invalid (seemingly random) values? If the problem is that we're on a wrong stack, can we switch stacks earlier during boot or something to make current_thread_info() always return a valid thread_info pointer? Thanks, Vegard |
From: Thomas M. <th...@m3...> - 2016-05-21 18:44:03
|
Hi, Mhh. Strange. Do you have a stack trace to call to current thread info which ends up with a wrong value. I wonder from were it originates. With kind regards Thomas With kind regards Thomas > Am 21.05.2016 um 15:51 schrieb Vegard Nossum <veg...@gm...>: > > Hi people, > > I'm having some trouble with using current_thread_info() during UML > early boot. Sometimes it works just fine, but often I get segfaults > because current_thread_info() is returning an invalid pointer. It > looks random: 0x202118, 0x1003e0003, 0xd33b90b3, 0x6db043, etc. > > I found an earlier thread which described the same problem: > http://permalink.gmane.org/gmane.linux.uml.devel/14642 > > However, I think the patch there is a bit hacky/papers over an > underlying bug, since it just uses is_kernel_addr() before deciding > whether to return the pointer from current_thread_info() or not. The > fact that the crash is random leads me to think it's some sort of race > during the UML boot. > > Does anybody understand fully what's going on here, why it returns > those invalid (seemingly random) values? If the problem is that we're > on a wrong stack, can we switch stacks earlier during boot or > something to make current_thread_info() always return a valid > thread_info pointer? > > Thanks, > > > Vegard > > ------------------------------------------------------------------------------ > Mobile security can be enabling, not merely restricting. Employees who > bring their own devices (BYOD) to work are irked by the imposition of MDM > restrictions. Mobile Device Manager Plus allows you to control only the > apps on BYO-devices by containerizing them, leaving personal data untouched! > https://ad.doubleclick.net/ddm/clk/304595813;131938128;j > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel |
From: Richard W. <ric...@gm...> - 2016-05-21 21:49:29
|
On Sat, May 21, 2016 at 3:51 PM, Vegard Nossum <veg...@gm...> wrote: > Hi people, > > I'm having some trouble with using current_thread_info() during UML > early boot. Sometimes it works just fine, but often I get segfaults > because current_thread_info() is returning an invalid pointer. It > looks random: 0x202118, 0x1003e0003, 0xd33b90b3, 0x6db043, etc. Where do you use it? Can you rule out a bad compiler optimization? (We had such an issue a few years ago) > Does anybody understand fully what's going on here, why it returns > those invalid (seemingly random) values? If the problem is that we're > on a wrong stack, can we switch stacks earlier during boot or > something to make current_thread_info() always return a valid > thread_info pointer? Can't say much without more details. -- Thanks, //richard |
From: Vegard N. <veg...@gm...> - 2016-05-22 15:40:03
|
On 21 May 2016 at 20:18, Thomas Meyer <th...@m3...> wrote: > Am 21.05.2016 um 15:51 schrieb Vegard Nossum <veg...@gm...>: >> I'm having some trouble with using current_thread_info() during UML >> early boot. Sometimes it works just fine, but often I get segfaults >> because current_thread_info() is returning an invalid pointer. It >> looks random: 0x202118, 0x1003e0003, 0xd33b90b3, 0x6db043, etc. > > Mhh. Strange. Do you have a stack trace to call to current thread info which ends up with a wrong value. I wonder from were it originates. One such trace would be: #2 0x000000006026652c in snprintf (buf=<optimized out>, size=<optimized out>, fmt=<optimized out>) at lib/vsprintf.c:2181 #3 0x00000000600046f8 in setup_env_path () at arch/um/os-Linux/main.c:109 #4 main (argc=3, argv=0x7ffc3d8c23e8, envp=<optimized out>) at arch/um/os-Linux/main.c:125 I wonder why setup_env_path() ends up calling the kernel's snprintf(), I thought that it would be using the glibc snprintf() at this point? Vegard |
From: Thomas M. <th...@m3...> - 2016-05-23 18:42:50
|
Am Sonntag, den 22.05.2016, 17:39 +0200 schrieb Vegard Nossum: > On 21 May 2016 at 20:18, Thomas Meyer <th...@m3...> wrote: > > > > Am 21.05.2016 um 15:51 schrieb Vegard Nossum <vegard.nossum@gmail.c > > om>: > > > > > > I'm having some trouble with using current_thread_info() during > > > UML > > > early boot. Sometimes it works just fine, but often I get > > > segfaults > > > because current_thread_info() is returning an invalid pointer. It > > > looks random: 0x202118, 0x1003e0003, 0xd33b90b3, 0x6db043, etc. > > Mhh. Strange. Do you have a stack trace to call to current thread > > info which ends up with a wrong value. I wonder from were it > > originates. > One such trace would be: > > #2 0x000000006026652c in snprintf (buf=<optimized out>, > size=<optimized out>, fmt=<optimized out>) at lib/vsprintf.c:2181 > #3 0x00000000600046f8 in setup_env_path () at arch/um/os- > Linux/main.c:109 > #4 main (argc=3, argv=0x7ffc3d8c23e8, envp=<optimized out>) at > arch/um/os-Linux/main.c:125 > > I wonder why setup_env_path() ends up calling the kernel's > snprintf(), > I thought that it would be using the glibc snprintf() at this point? Mhh. Good question! Doing a make ARCH=um V=1 arch/um/os-Linux/main.o results in: gcc -Wp,-MD,arch/um/os-Linux/.main.o.d -Wall -Wundef -Wstrict- prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror- implicit-function-declaration -Wno-format-security -std=gnu89 -mcmodel=large -fno-builtin -m64 -funit-at-a-time -D__arch_um__ -Dvmap=kernel_vmap -Din6addr_loopback=kernel_in6addr_loopback -Din6addr_any=kernel_in6addr_any -Dstrrchr=kernel_strrchr -D_LARGEFILE64_SOURCE -fno-delete-null-pointer-checks -O2 -- param=allow-store-data-races=0 -fno-reorder-blocks -fno-ipa-cp-clone -fno-partial-inlining -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize- sibling-calls -fno-var-tracking-assignments -g -gdwarf-4 -Wdeclaration- after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -DCC_HAVE_ASM_GOTO -I./arch/um/include/shared -I./arch/x86/um/shared -I./arch/um/include/shared/skas -D_FILE_OFFSET_BITS=64 -idirafter ./include -idirafter ./include -D__KERNEL__ -D__UM_HOST__ -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -include ./include/linux/kern_levels.h -include user.h -c -o arch/um/os-Linux/main.o arch/um/os-Linux/main.c so it includes user.h and is under os-Linux. So I guess it should actually call the glibc version, I'm not sure why it doesn't. with kind regards thomas > > > Vegard > |
From: Richard W. <ric...@gm...> - 2016-06-12 20:11:36
|
On Sun, May 22, 2016 at 5:39 PM, Vegard Nossum <veg...@gm...> wrote: > On 21 May 2016 at 20:18, Thomas Meyer <th...@m3...> wrote: >> Am 21.05.2016 um 15:51 schrieb Vegard Nossum <veg...@gm...>: >>> I'm having some trouble with using current_thread_info() during UML >>> early boot. Sometimes it works just fine, but often I get segfaults >>> because current_thread_info() is returning an invalid pointer. It >>> looks random: 0x202118, 0x1003e0003, 0xd33b90b3, 0x6db043, etc. >> >> Mhh. Strange. Do you have a stack trace to call to current thread info which ends up with a wrong value. I wonder from were it originates. > > One such trace would be: > > #2 0x000000006026652c in snprintf (buf=<optimized out>, > size=<optimized out>, fmt=<optimized out>) at lib/vsprintf.c:2181 > #3 0x00000000600046f8 in setup_env_path () at arch/um/os-Linux/main.c:109 > #4 main (argc=3, argv=0x7ffc3d8c23e8, envp=<optimized out>) at > arch/um/os-Linux/main.c:125 > > I wonder why setup_env_path() ends up calling the kernel's snprintf(), > I thought that it would be using the glibc snprintf() at this point? That early you cannot use current() nor any other core kernel stuff since the kernel has not started so far. So, the current thread info struct points to garbage. -- Thanks, //richard |
From: Vegard N. <veg...@gm...> - 2016-06-12 20:59:35
|
On 12 June 2016 at 22:11, Richard Weinberger <ric...@gm...> wrote: >> I wonder why setup_env_path() ends up calling the kernel's snprintf(), >> I thought that it would be using the glibc snprintf() at this point? > > That early you cannot use current() nor any other core kernel stuff > since the kernel has not started so far. > So, the current thread info struct points to garbage. Yes, I know. I think setup_env_path() should call the libc snprintf rather than the kernel one, can you explain how to do that properly? Thanks, Vegard |
From: Richard W. <ri...@no...> - 2016-06-12 21:05:16
|
Am 12.06.2016 um 22:59 schrieb Vegard Nossum: > On 12 June 2016 at 22:11, Richard Weinberger > <ric...@gm...> wrote: >>> I wonder why setup_env_path() ends up calling the kernel's snprintf(), >>> I thought that it would be using the glibc snprintf() at this point? >> >> That early you cannot use current() nor any other core kernel stuff >> since the kernel has not started so far. >> So, the current thread info struct points to garbage. > > Yes, I know. I think setup_env_path() should call the libc snprintf > rather than the kernel one, can you explain how to do that properly? Currently UML sets up nasty maps for known namespaces clashes. i.e. KBUILD_CFLAGS += $(CFLAGS) $(CFLAGS-y) -D__arch_um__ \ $(ARCH_INCLUDE) $(MODE_INCLUDE) -Dvmap=kernel_vmap \ -Din6addr_loopback=kernel_in6addr_loopback \ -Din6addr_any=kernel_in6addr_any -Dstrrchr=kernel_strrchr A much better approach would be having a real linker scope. Some time ago I posted some thoughts on that: https://lkml.org/lkml/2015/11/19/758 Due to -ENOTIME this never materialized, though. ;-( Thanks, //richard |
From: Vegard N. <veg...@gm...> - 2016-06-12 21:41:40
|
On 12 June 2016 at 23:05, Richard Weinberger <ri...@no...> wrote: > Am 12.06.2016 um 22:59 schrieb Vegard Nossum: >> On 12 June 2016 at 22:11, Richard Weinberger >> <ric...@gm...> wrote: >>>> I wonder why setup_env_path() ends up calling the kernel's snprintf(), >>>> I thought that it would be using the glibc snprintf() at this point? >>> >>> That early you cannot use current() nor any other core kernel stuff >>> since the kernel has not started so far. >>> So, the current thread info struct points to garbage. >> >> Yes, I know. I think setup_env_path() should call the libc snprintf >> rather than the kernel one, can you explain how to do that properly? > > Currently UML sets up nasty maps for known namespaces clashes. > i.e. > KBUILD_CFLAGS += $(CFLAGS) $(CFLAGS-y) -D__arch_um__ \ > $(ARCH_INCLUDE) $(MODE_INCLUDE) -Dvmap=kernel_vmap \ > -Din6addr_loopback=kernel_in6addr_loopback \ > -Din6addr_any=kernel_in6addr_any -Dstrrchr=kernel_strrchr I see... nice and hacky ;-) I'll try the same for snprintf and see if that works around my bug. > A much better approach would be having a real linker scope. > Some time ago I posted some thoughts on that: > https://lkml.org/lkml/2015/11/19/758 > > Due to -ENOTIME this never materialized, though. ;-( Cool, objcopy -G/--keep-global-symbol(s) seems like a good solution. Doesn't look like it should be too difficult. I might give it a try. Thanks! Vegard |
From: Richard W. <ri...@no...> - 2016-06-12 22:55:22
|
Am 12.06.2016 um 23:41 schrieb Vegard Nossum: > I see... nice and hacky ;-) I'll try the same for snprintf and see if > that works around my bug. > >> A much better approach would be having a real linker scope. >> Some time ago I posted some thoughts on that: >> https://lkml.org/lkml/2015/11/19/758 >> >> Due to -ENOTIME this never materialized, though. ;-( > > Cool, objcopy -G/--keep-global-symbol(s) seems like a good solution. > Doesn't look like it should be too difficult. I might give it a try. Not really difficult but unpleasant. ;-) IIRC last time I looked I figured that UML's source structure and build process would need a big rework to achieve that. Would be cool if you could work on it, I'll happily assist as far as my spare time permits. Thanks, //richard |