Menu

#64 newLogger() does not cache its value, resulting in contention in multithreaded environments

General
closed
nobody
None
5
2015-10-24
2013-01-08
No

Source.newLogger() makes a call to the getLogger() method of the specified (or auto-detected) logging implementation at each Source instance creation.

getLogger() in many cases accesses a globally locked hash table that is not designed for heavy multithreaded access. We are parsing thousands of pages per second on multiple threads, so our thread dumps show that very often the threads get stuck in contention:

"VisitingThread-63" prio=10 tid=0x00007f6c0f2f6800 nid=0x1a7e waiting for monitor entry [0x00007f5adeb73000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Hashtable.get(Hashtable.java:458)
- waiting to lock <0x00007f64e62a7210> (a java.util.Hashtable)
at ch.qos.logback.classic.LoggerContext.getLogger(LoggerContext.java:132)
at ch.qos.logback.classic.LoggerContext.getLogger(LoggerContext.java:49)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:270)
at net.htmlparser.jericho.LoggerProviderSLF4J.getLogger(LoggerProviderSLF4J.java:29)
at net.htmlparser.jericho.LoggerFactory.getLogger(LoggerFactory.java:27)
at net.htmlparser.jericho.Source.newLogger(Source.java:1677)
at net.htmlparser.jericho.Source.<init>(Source.java:147)
at net.htmlparser.jericho.StreamedSource.<init>(StreamedSource.java:232)</init></init>

Presently the only solution is to disable logging altogether, which is a patch but it is not completely satisfactory. We can see two solutions:

1) Cache the returned logger, which, being always associated with the same name, is static in every logging framework I am aware of.

2) Having a Source/StreamedSource constructor in which it is possible to specify an SLF4J logger that will be used to avoid a call to newLogger(). SLF4J can log on essentially any other existing backend, so it seems a reasonable choice.

Discussion

  • Martin Jericho

    Martin Jericho - 2013-01-08

    Hi Sebastiano,

    I'm not convinced it's a good idea to internally and statically cache an arbitrary 3rd party object that may keep expensive resources open.

    You can however easily work around this problem as follows:

    Config.LoggerProvider=LoggerProvider.DISABLED;
    Logger sharedLogger=LoggerProvider.SLF4J.getLogger("net.htmlparser.jericho");

    Then for each of your created StreamSource objects:
    streamedSource.setLogger(sharedLogger);

    Cheers
    Martin

     
  • Martin Jericho

    Martin Jericho - 2013-01-08
    • status: unread --> closed-rejected
    • milestone: -->
     
  • Sebastiano Vigna

    I agree with you, but the problem of your solution is that people won't know. I discovered the problem because I looked at thread dumps—this is not how it should happen. Caching the logger is standard practice—most times I've seen stuff like

    private static final Logger LOGGER = LoggerFactory.getLogger( FetchedHttpResponse.class );
    

    I realize this is somehow of a wild guess. But the proposed solution is very twisted: actually, I would think that disabling logging altogether would disable logging even if I set an explicit logger.

    Having a constructor with an explicity SFL4J logger would make it possible to write a Javadoc explaining that the construtor's purpose is to avoid retrieving a logger each time an instance is created. Personally, as I've seen the idiom above everywhere, I would have never thought that a logger would be retrieved at each instantiation. Nothing wrong, it's just a common practice, not a rule, but nonetheless it took us some investigation to understand what was happening.

    My 2€¢. The parser is yours :).

     
  • Sebastiano Vigna

    Ouch. Sourceforge has given me error 500 but clearly is receiving my posts.

     
  • Martin Jericho

    Martin Jericho - 2013-01-08

    I don't want to add new constructors as that would require 6 new constructors on the Source class to complement the existing 6 constructors.

    I will however look at caching the logger internally in each of the standard providers. It does seem to be common practice to instantiate these Logger objects statically so it should be safe to cache them statically.

    Thanks for your suggestions.

     
  • Martin Jericho

    Martin Jericho - 2013-01-08
    • status: closed-rejected --> pending
     
  • Martin Jericho

    Martin Jericho - 2013-01-10
    • status: pending --> open-fixed
     
  • Martin Jericho

    Martin Jericho - 2013-01-10

    Fixed in version 3.4.

    Until version 3.4 is officially released, the development version is available here:
    http://jericho.htmlparser.net/temp/jericho-html-3.4-dev.zip

     
  • Martin Jericho

    Martin Jericho - 2015-10-24
    • status: open-fixed --> closed
    • Group: --> General
     
  • Martin Jericho

    Martin Jericho - 2015-10-24

    This bug has now been closed - Version 3.4 released.

     

Log in to post a comment.