#77 Concurrency issue with deferred disposal

v1.0 (example)
closed-fixed
nobody
None
5
2014-05-21
2014-05-11
Arnout Engelen
No

See also https://sourceforge.net/p/notion/mailman/message/32300649/

using notion from Debian, and collapse.lua from here:
https://github.com/dkogan/notion-scripts/blob/master/scripts/collapse.lua
(move all clients to the same frame and close all other frames, like C-x
1 in Emacs)

The last upgrade broke collapse.lua: using it now causes notion to
segfault.

Below is a valgrind trace. It seems notion doesn't like
ioncore.defer( rqclose ) and ioncore.defer( move the managed clients)
being called concurrently (my guess is that the order in which ioncore
executes stuff changed, and it now tries to close the region before or
concurrently to moving clients).

See the mailinglist message for a temporary fix to the script.

The valgrind trace (with freshly git-cloned notion) is below.

==15052== Conditional jump or move depends on uninitialised value(s)
==15052== at 0x41E90E: region_may_dispose (region.c:479)
==15052== by 0x41ED61: region_rqdispose (region.c:512)
==15052== by 0x4349AF: mainloop_execute_deferred_on_list (defer.c:191)
==15052== by 0x416D44: ioncore_mainloop (event.c:228)
==15052== by 0x41519B: main (notion.c:281)
==15052==
==15052== Jump to the invalid address stated on the next line
==15052== at 0x0: ???
==15052== by 0x41E916: region_may_dispose (region.c:479)
==15052== by 0x41ED61: region_rqdispose (region.c:512)
==15052== by 0x4349AF: mainloop_execute_deferred_on_list (defer.c:191)
==15052== by 0x416D44: ioncore_mainloop (event.c:228)
==15052== by 0x41519B: main (notion.c:281)
==15052== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==15052==
==15052==
==15052== Process terminating with default action of signal 11 (SIGSEGV)
==15052== Bad permissions for mapped region at address 0x0
==15052== at 0x0: ???
==15052== by 0x41E916: region_may_dispose (region.c:479)
==15052== by 0x41ED61: region_rqdispose (region.c:512)
==15052== by 0x4349AF: mainloop_execute_deferred_on_list (defer.c:191)
==15052== by 0x416D44: ioncore_mainloop (event.c:228)
==15052== by 0x41519B: main (notion.c:281)
==15052==
==15052== HEAP SUMMARY:
==15052== in use at exit: 1,707,258 bytes in 13,962 blocks
==15052== total heap usage: 42,694 allocs, 28,732 frees, 5,525,276 bytes allocated
==15052==
==15052== LEAK SUMMARY:
==15052== definitely lost: 224 bytes in 1 blocks
==15052== indirectly lost: 176 bytes in 2 blocks
==15052== possibly lost: 0 bytes in 0 blocks
==15052== still reachable: 1,706,858 bytes in 13,959 blocks
==15052== suppressed: 0 bytes in 0 blocks
==15052== Rerun with --leak-check=full to see details of leaked memory
==15052==
==15052== For counts of detected and suppressed errors, rerun with: -v
==15052== Use --track-origins=yes to see where uninitialised values come from
==15052== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 3 from 3)
==15067== Memcheck, a memory error detector
==15067== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==15067== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==15067== Command: /home/moy/local/usr/bin/notion
==15067==

Discussion

  • Arnout Engelen
    Arnout Engelen
    2014-05-11

    So it appears libmainloop/defer.c:177 do_execute is run with a NULL object.

    The comment suggest that this is an illegal situation - indeed, defer_watch_handler is registered to remove the deferred action from the list when the object on which the action is registered is destroyed.

    libtu/obj.c:248 does set the object of the watch to NULL and may call handler() after that. Might be something going wrong there.

     
  • Arnout Engelen
    Arnout Engelen
    2014-05-14

    Should be fixed by e33387cc78508935c7ae6a5a1384be9fea584e17

     
  • Arnout Engelen
    Arnout Engelen
    2014-05-21

    • status: open --> closed-fixed