From: Henry N. <Henry.Ne@Arcor.de> - 2007-03-26 07:46:37
Attachments:
BugCheck-8187cb3c.txt
|
With initrd, Colinux is shutting down from command "halt" and crashed some times on the "power off". Tested build 20070101 from SF on Vista in VMware. CoLinux starts with the shipped initrd file: colinux-daemon kernel=vmlinux initrd=initrd.gz root=/dev/ram0 Here are STOP codes: 0x80000003 (0xc0000046, 0x8187cb3c, 0x922eb6e4, 0x00000000) 0x80000003 (0xc0000046, 0x8187cb3c, 0x90f6f6e4, 0x00000000) 0x80000003 (0xc0000046, 0x8187cb3c, 0x8fcff6e4, 0x00000000) From reading Bugcheck Analysis, the call to *KeReleaseMutex* is wrong somethere from linux+0x9732, the wrapper function co_os_mutex_release. From looking the stack, in the file src/colinux/kernel/monitor.c at line 1156, in the function co_monitor_refdown calls with an invalid arg? co_os_mutex_release(manager->lock); <-- line 1156 I attach Bugcheck Analysis with comments for linux.sys in the stack. An idea I have: Please see in colinux/os/winnt/kernel/lowlevel/mutex.c, the co_os_mutex_acquire does no check the return code from KeWaitForMutexObject. Can this function return without a Mutex, and the KeReleaseMutex will be fail? I will add a coLinux KeBugCheck for error case of KeWaitForMutexObject. -- Henry |
From: Henry N. <Henry.Ne@Arcor.de> - 2007-04-11 08:19:53
|
Henry Nestler wrote: > With initrd, Colinux is shutting down from command "halt" and crashed > some times on the "power off". > Tested build 20070101 from SF on Vista in VMware. > > CoLinux starts with the shipped initrd file: > colinux-daemon kernel=vmlinux initrd=initrd.gz root=/dev/ram0 > > Here are STOP code: > 0x80000003 (0xc0000046, 0x8187cb3c, 0x922eb6e4, 0x00000000) > > From reading Bugcheck Analysis, the call to *KeReleaseMutex* is wrong > somethere from linux+0x9732, the wrapper function co_os_mutex_release. > > From looking the stack, in the file src/colinux/kernel/monitor.c at > line 1156, in the function co_monitor_refdown calls with an invalid arg? > co_os_mutex_release(manager->lock); <-- line 1156 Problem found in src/colinux/kernel/monitor.c(1134): Function co_monitor_refdown is called recursive via send_monitor_end_messages, co_manager_send_eof, co_os_manager_userspace_eof, co_manager_close, co_manager_close_ The problem is after src/colinux/kernel/monitor.c(1111): static void send_monitor_end_messages(co_monitor_t *cmon) { co_manager_open_desc_t opened; int i; co_os_mutex_acquire(cmon->connected_modules_write_lock); for (i = 0; i < CO_MONITOR_MODULES_COUNT; i++) { opened = cmon->connected_modules[i]; if (!opened) continue; 1122: co_manager_send_eof(cmon->manager, opened); ---> From here "cmon" points into wrong or empty memory, ---> and follow functions can be fail. 1123: cmon->connected_modules[i] = NULL; 1124: co_manager_close(cmon->manager, opened); } co_os_mutex_release(cmon->connected_modules_write_lock); } co_rc_t co_monitor_refdown(co_monitor_t *cmon, bool_t user_context, bool_t monitor_owner) { ... if (end_messages) 1159: send_monitor_end_messages(cmon); if (destroy) 1162: return co_monitor_destroy(cmon, monitor_owner); return CO_RC(OK); } In a worst case send_monitor_end_messages (1159) calls co_manager_send_eof (1122), indirectly co_monitor_refdown and with co_monitor_destroy (1162) freed the memory for "cmon". Than goes continue at line 1123 with bug. How can fix it? -- Henry |
From: Henry N. <Henry.Ne@Arcor.de> - 2007-04-12 06:53:59
Attachments:
send_monitor_end_messages.patch
|
Henry Nestler wrote: > Henry Nestler wrote: >> With initrd, Colinux is shutting down from command "halt" and crashed >> some times on the "power off". >> Tested build 20070101 from SF on Vista in VMware. >> >> CoLinux starts with the shipped initrd file: >> colinux-daemon kernel=vmlinux initrd=initrd.gz root=/dev/ram0 >> >> Here are STOP code: >> 0x80000003 (0xc0000046, 0x8187cb3c, 0x922eb6e4, 0x00000000) >> >> From reading Bugcheck Analysis, the call to *KeReleaseMutex* is wrong >> somethere from linux+0x9732, the wrapper function co_os_mutex_release. >> >> From looking the stack, in the file src/colinux/kernel/monitor.c at >> line 1156, in the function co_monitor_refdown calls with an invalid arg? >> co_os_mutex_release(manager->lock); <-- line 1156 > > Problem found in src/colinux/kernel/monitor.c(1134): > Function co_monitor_refdown is called recursive via > send_monitor_end_messages, co_manager_send_eof, > co_os_manager_userspace_eof, co_manager_close, co_manager_close_ > > [...] > In a worst case send_monitor_end_messages (1159) calls > co_manager_send_eof (1122), indirectly co_monitor_refdown and with > co_monitor_destroy (1162) freed the memory for "cmon". Than goes > continue at line 1123 with bug. Fixed. It was not a Vista only problem. It was a problem for running under heavy cpu and memory usage. The autobuild has this patch included today. George, please apply the patch to branch stable. send_monitor_end_messages.patch: * Don't destroy monitor struct from co_monitor_refdown, if send_monitor_end_messages is active from an other recursive call. Bugfix for BSDO on halt/shutdown under Vista. -- Henry |
From: Henry N. <Henry.Ne@Arcor.de> - 2007-04-13 07:50:05
|
Henry Nestler wrote: >>> From looking the stack, in the file src/colinux/kernel/monitor.c at >>> line 1156, in the function co_monitor_refdown calls with an invalid arg? >>> co_os_mutex_release(manager->lock); <-- line 1156 >> >> Problem found in src/colinux/kernel/monitor.c(1134): >> Function co_monitor_refdown is called recursive via >> send_monitor_end_messages, co_manager_send_eof, >> co_os_manager_userspace_eof, co_manager_close, co_manager_close_ >> >> [...] >> In a worst case send_monitor_end_messages (1159) calls >> co_manager_send_eof (1122), indirectly co_monitor_refdown and with >> co_monitor_destroy (1162) freed the memory for "cmon". Than goes >> continue at line 1123 with bug. > > Fixed. > It was not a Vista only problem. It was a problem for running under > heavy cpu and memory usage. I'm sorry. The patch was better, bit not fixed for ever. Has a new crash on same position in new source: co_rc_t co_monitor_refdown(co_monitor_t *cmon, bool_t user_context, bool_t monitor_owner) { ... if (monitor_owner) end_messages = PTRUE; } co_os_mutex_release(manager->lock); /* Don't destroy monitor, if we are in sending end messages! */ 1159: if (cmon->sending_monitor_end_messages) <---!crash! return CO_RC(OK); if (end_messages) send_monitor_end_messages(cmon); -- Henry |
From: Henry N. <Henry.Ne@Arcor.de> - 2007-04-19 20:00:17
|
Hello, in the call graph[1;2] can see the recursion for co_monitor_refdown. The graph does not shown macros and inline function, was generated from debugging build of linux.sys 0.8.0. With full.graph[3] you can generate your own 'gengraph' self. Have created with CodeViz[4] version 1.0.11 by this lines: genfull -g cobjdump -o full.graph-objdump -f linux.sys gengraph -g full.graph-objdump -r -f co_monitor_refdown gengraph -g full.graph-objdump -r -f co_monitor_free [1] http://www.henrynestler.com/colinux/codeviz/co_monitor_refdown.png [2] http://www.henrynestler.com/colinux/codeviz/co_monitor_free.png [3] http://www.henrynestler.com/colinux/codeviz/full.graph-20070411.gz [4] http://www.csn.ul.ie/~mel/projects/codeviz/ -- Henry |
From: Henry N. <Henry.Ne@Arcor.de> - 2007-04-24 08:02:33
Attachments:
monitor-refcount-decrement.patch
|
Second try to fix the BSOD on halt/shutdown under Vista. * Don't destroy monitor struct from co_monitor_refdown, if send_monitor_end_messages is active from an other recursive call. Bugfix for BSOD on halt/shutdown under Vista. * co_monitor_refdown: Move 'cmon->refcount--' very close near the end. Recursions from send_monitor_end_messages can now not coming with 0. Do nothing and return in top of function, if refcount <= 0. * After call of co_monitor_refdown: Set opened->monitor_owner to false is more save. Warning! Change on this basic code can be very unstable and it was not long tested. For me, it passed 100 boots and halt under XP, and 1000 boots+halts under Vista with follow command line: colinux-daemon kernel=vmlinux mem=128 initrd=initrd.gz \ cobd0=Debian-3.0r0.ext3.100mb.img root=/dev/cobd0 eth0=slirp (This line crashed after 5th to 10th boot without this patch.) The daily autobuild includes this patch: http://www.henrynestler.com/colinux/autobuild/devel-20060424/ -- Henry |
From: George P B. <geo...@gm...> - 2007-04-24 20:23:54
|
Henry Nestler wrote: > Second try to fix the BSOD on halt/shutdown under Vista. > > * Don't destroy monitor struct from co_monitor_refdown, if > send_monitor_end_messages is active from an other recursive call. > Bugfix for BSOD on halt/shutdown under Vista. > * co_monitor_refdown: Move 'cmon->refcount--' very close near the end. > Recursions from send_monitor_end_messages can now not coming with 0. > Do nothing and return in top of function, if refcount <= 0. > * After call of co_monitor_refdown: Set opened->monitor_owner to false > is more save. > > Warning! > Change on this basic code can be very unstable and it was not long > tested. > > For me, it passed 100 boots and halt under XP, and 1000 boots+halts > under Vista with follow command line: > colinux-daemon kernel=vmlinux mem=128 initrd=initrd.gz \ > cobd0=Debian-3.0r0.ext3.100mb.img root=/dev/cobd0 eth0=slirp > (This line crashed after 5th to 10th boot without this patch.) > > The daily autobuild includes this patch: > http://www.henrynestler.com/colinux/autobuild/devel-20060424/ Sounds good. It would be nice to clean-up that call graph to not be so convoluted (messed-up and interdependent like that). We, should do some light testing of shutting down normal images now, as well as restarting instead of halting, and probably even a light testing on use as a service and shutting down the service. I'm not saying that you and I should do this testing alone. This would be a good place to call for anyone using Vista, using services, or depending on the restart future to pitch in and try this build for these situations and report back. I'm in a bind. I really wanted to release 0.7.1 as stable, but I was holding off a little bit for a fix for this. Could be the fix for this is unstable enough, that we should go ahead and release stable 0.7.1 with this known issue, then after some testing in 0.8.0 back port this fix to 0.7.1 as a bug fix release later. What are your thoughts here? Thanks, George |
From: Henry N. <Henry.Ne@Arcor.de> - 2007-04-26 08:49:07
|
George P Boutwell wrote: > Henry Nestler wrote: >> Second try to fix the BSOD on halt/shutdown under Vista. >> >> * Don't destroy monitor struct from co_monitor_refdown, if >> send_monitor_end_messages is active from an other recursive call. >> Bugfix for BSOD on halt/shutdown under Vista. >> * co_monitor_refdown: Move 'cmon->refcount--' very close near the end. >> Recursions from send_monitor_end_messages can now not coming with 0. >> Do nothing and return in top of function, if refcount <= 0. >> * After call of co_monitor_refdown: Set opened->monitor_owner to false >> is more save. >> >> Warning! >> Change on this basic code can be very unstable and it was not long >> tested. >> >> For me, it passed 100 boots and halt under XP, and 1000 boots+halts >> under Vista with follow command line: >> colinux-daemon kernel=vmlinux mem=128 initrd=initrd.gz \ >> cobd0=Debian-3.0r0.ext3.100mb.img root=/dev/cobd0 eth0=slirp >> (This line crashed after 5th to 10th boot without this patch.) >> >> The daily autobuild includes this patch: >> http://www.henrynestler.com/colinux/autobuild/devel-20060424/ > > Sounds good. It would be nice to clean-up that call graph to not be > so convoluted (messed-up and interdependent like that). We, should do > some light testing of shutting down normal images now, as well as > restarting instead of halting, and probably even a light testing on use > as a service and shutting down the service. I'm not saying that you and > I should do this testing alone. This would be a good place to call for > anyone using Vista, using services, or depending on the restart future > to pitch in and try this build for these situations and report back. > > I'm in a bind. I really wanted to release 0.7.1 as stable, but I was > holding off a little bit for a fix for this. Could be the fix for this > is unstable enough, that we should go ahead and release stable 0.7.1 > with this known issue, then after some testing in 0.8.0 back port this > fix to 0.7.1 as a bug fix release later. What are your thoughts here? I'm full agree with George. This change was very near the core of coLinux memory managment. Please test this build in your environments. Better we have many various situations testet before released it. Here are binary downloads from Stable snapshot: http://www.henrynestler.com/colinux/testing/stable-0.7.1/20070425-Snapshot/stable-coLinux-20070425.exe Devel snapshot: http://www.henrynestler.com/colinux/testing/devel-0.8.0/20070425-Snapshot/devel-coLinux-20070425.exe http://www.colinux.org/snapshots (0.8.0 is the same there) View into the download directory for source, changelogs and ZIP-Updates. -- Henry |