Source.newLogger() makes a call to the getLogger() method of the specified (or auto-detected) logging implementation at each Source instance creation.
getLogger() in many cases accesses a globally locked hash table that is not designed for heavy multithreaded access. We are parsing thousands of pages per second on multiple threads, so our thread dumps show that very often the threads get stuck in contention:
"VisitingThread-63" prio=10 tid=0x00007f6c0f2f6800 nid=0x1a7e waiting for monitor entry [0x00007f5adeb73000]
java.lang.Thread.State: BLOCKED (on object monitor)
- waiting to lock <0x00007f64e62a7210> (a java.util.Hashtable)
Presently the only solution is to disable logging altogether, which is a patch but it is not completely satisfactory. We can see two solutions:
1) Cache the returned logger, which, being always associated with the same name, is static in every logging framework I am aware of.
2) Having a Source/StreamedSource constructor in which it is possible to specify an SLF4J logger that will be used to avoid a call to newLogger(). SLF4J can log on essentially any other existing backend, so it seems a reasonable choice.