Menu

#1817 CCache can cause DB inconsistency

Core
open-remind
nobody
9
2010-03-10
2009-03-25
sherif
No

CCache.java has multiple problems, the most serious of which is what is given in Summary.
The problems are:-
1. It allows multiple threads to use HashMap, which is unsynchronized.
2. It allows objects to accumulate till an arbitrary expiry time (default - every 2 hours),
at which time all map entries are removed. This can cause out of memory failure
as references to large objects are cached.
3. This arbitrary expiry can cause two threads to create two independent instances
of a Persistent Object and operate independently, resulting in using stale data and
losing altogether the modifications done by one thread!. Bugs due to this will appear
at random and can be extremely difficult to track down.

The fix is to eliminate the need for a forced expiry by using softreferences to objects
in the map, allowing them to be garbage collected. In addition, all calls to HashMap
should be wrapped in synchronization blocks. Since callers of CCache expect
strong references, the mapping should be completely handled inside CCache.java
A reset cache call shouldl now only remove stale entries and not clear the whole
cache.

I am attaching a file with above changes. The whole file is attached instead of a diff
due to the highly pervasive nature of change.

For complete safety, some additional changes should be made in those who call this
class, but since most of those should result in a fast fail (exceptions) and the chances
are highly remote and since I am not an expert in those areas, I have not done the changes.
But, for completeness, here is what is required.
1. Those who do a .get and then if null, insert an entry should do this within a synchronization block.
2. There are some code which do a CCache.clear to (I think) make sure that stale entries are not
used by others (Have seen in this delete code). They should do a remove of the entry, rather
than clear. For backward compatibility, I have retained the behaviour of clear call in CCache.
After fixing the callers, this call should be made to throw an exception.

Discussion

  • Teo Sarca

    Teo Sarca - 2009-03-25

    I've tested a little in one of our deployments and i got first error when i've logout & login :
    Exception in thread "AWT-EventQueue-0" java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
    at java.util.HashMap$EntryIterator.next(HashMap.java:834)
    at java.util.HashMap$EntryIterator.next(HashMap.java:832)
    at org.compiere.util.CCache.cleanup(CCache.java:206)
    at org.compiere.util.CCache.reset(CCache.java:140)
    at org.compiere.util.CacheMgt.reset(CacheMgt.java:113)
    at org.compiere.util.Env.reset(Env.java:173)
    at org.compiere.util.Env.logout(Env.java:116)
    at org.compiere.apps.AEnv.logout(AEnv.java:695)
    at org.compiere.apps.AMenu.logout(AMenu.java:545)
    at org.compiere.apps.AEnv.actionPerformed(AEnv.java:381)
    at org.compiere.apps.AMenu.actionPerformed(AMenu.java:612)
    at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995)
    at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
    at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
    at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
    at javax.swing.AbstractButton.doClick(AbstractButton.java:357)
    at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1225)
    at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1266)
    at java.awt.Component.processMouseEvent(Component.java:6134)
    at javax.swing.JComponent.processMouseEvent(JComponent.java:3265)
    at java.awt.Component.processEvent(Component.java:5899)
    at java.awt.Container.processEvent(Container.java:2023)
    at java.awt.Component.dispatchEventImpl(Component.java:4501)
    at java.awt.Container.dispatchEventImpl(Container.java:2081)
    at java.awt.Component.dispatchEvent(Component.java:4331)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4301)
    at java.awt.LightweightDispatcher.processMouseEvent(Container.java:3965)
    at java.awt.LightweightDispatcher.dispatchEvent(Container.java:3895)
    at java.awt.Container.dispatchEventImpl(Container.java:2067)
    at java.awt.Window.dispatchEventImpl(Window.java:2458)
    at java.awt.Component.dispatchEvent(Component.java:4331)
    at java.awt.EventQueue.dispatchEvent(EventQueue.java:599)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)
    at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)

    Best regards,
    Teo Sarca - www.arhipac.ro

     
  • sherif

    sherif - 2009-03-25

    Thanks Teo for testing this. Since I was testing mostly with the webui, I did not hit this easily. When java is started in -server mode, it delays garbage collection.

    The bug is in line 216. I should have used the Iterator's remove method instead of the HashMap's.
    Just change super.remove(key) to iter.remove()

     
  • sherif

    sherif - 2009-03-26

    I am uploading a new file with one more bug fix (and some minor clean up of code).
    Certain callers of CCache is using it as a pure hashmap, rather than a cache
    by passing in an expireMinutes <= 0. In those cases, we cannot use softreference
    as these objects should not be garbage collected.

    This would probably mean that the reset() calls from such callers is really meant to
    be HashMap.clear() If we change this in CCache, then other callers to reset will all
    need to be changed to use cleanup(). I believe it is better to change the semantics of
    reset to purge. Those who really want all mappings to go should call clear, as Map
    documentation of clear is clear :-). The only place I found where such a change
    seems really needed is MSysConfig in resetCache(). But, then again, I am new to
    adempiere and would like one of the oldies to look at this.

     
  • sherif

    sherif - 2009-03-26

    Bug fix for m_expire = = 0 and cleanup

     
  • Carlos Ruiz

    Carlos Ruiz - 2010-03-10

    it seems like an interesting issue to review from architectural POV

    the inconsistency mentioned is hard to demonstrate - but the explanation is supportive as a potential cause of errors hard to trace

     
  • Carlos Ruiz

    Carlos Ruiz - 2010-03-10
    • milestone: --> Core
    • priority: 5 --> 9
    • status: open --> open-remind
     
  • sherif

    sherif - 2010-03-11

    SInce you are reviewing this issue, let me just pass some more observations.
    I have not really worked on a proper solution for this, but hopefully you will be able to come up with an architectural soln or suggestions.

    1. The soft references are too heavy and slows down the system and I reverted back pretty soon on that. (I also realised that it is not a great solution , due to #2 below)
    2. Ensuring that only 1 entry exists does not solve the issue of simultaneous updates and mix ups. Now 2 threads can set values and if one thread aborts (example gets an error/deadlock etc.), then its changes are not undone. On the other hand, having a separate instance for every one solves this better, which of course defeats the purpose of ccache.
    So, it looks as if the best solution is to use ccache only when the code is using the object in read only mode.
    3. The lack of synchronization while using HashMap is indeed a real issue and need to be fixed in code.

     
  • Tobias Schöneberg

    Patch for synchronized CCache

     
  • Tobias Schöneberg

    Hi,
    I uploaded a patch that just addresses problem 1 (unsynchronized). I wrote it after I got a ConcurrentModificationException in MTable.get() and I was going to open a new tracker, when I found you tracker here.

    As I wrote, the downside is that the patch only solves problem 1, but on the other hand, it's quite simple :-).
    Have a look if you like.

    Best regards,
    Tobi

     

Log in to post a comment.