JobScheduler / Bugs / #222 Exception could not start logger

#222 Exception could not start logger

Milestone: 2.5

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2024-05-23

Created: 2024-05-22

Creator: Bruno

Private: No

Using Jobscheduler on kubernetes cluster V2.6.6.

when i increase CPU of controller, i get an error :

2024-05-22T06:09:06,490 info js7.data.event.SnapshotableStateBuilder - Recovered last EventId is 1716357810417000/2024-05-22T06:03:30.417Z, emitted 5:36min ago (970 snapshot objects and 2 events, 1.2GB read in 38s)
2024-05-22T06:09:06,695 ERROR js7.common.system.startup.JavaMain - org.apache.pekko.ConfigurationException: Could not start logger due to [org.apache.pekko.ConfigurationException: Logger specified in config can't be loaded [org.apache.pekko.event.slf4j.Slf4jLogger] due to [org.apache.pekko.event.Logging$LoggerInitializationException: Logger log1-Slf4jLogger did not respond with LoggerInitialized, sent instead [TIMEOUT]]]
org.apache.pekko.ConfigurationException: Could not start logger due to [org.apache.pekko.ConfigurationException: Logger specified in config can't be loaded [org.apache.pekko.event.slf4j.Slf4jLogger] due to [org.apache.pekko.event.Logging$LoggerInitializationException: Logger log1-Slf4jLogger did not respond with LoggerInitialized, sent instead [TIMEOUT]]]
at org.apache.pekko.event.LoggingBus.startDefaultLoggers(Logging.scala:176) ~[org.apache.pekko.pekko-actor_2.13-1.0.1.jar:1.0.1]

Discussion

Andreas - 2024-05-23

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas - 2024-05-23

Hi Bruno,

the exception hints to the fact that a timeout is exceeded when instantiating the Slf4jLogger. This can happen if it takes the Controller a lot of time to read its journal. In the underlying situation it takes the Controller 38s to read 1.2 GB journal file (should be done in 2-3s).

For the number of orders in use the size of the journal and time to read the journal are excessive. In a meeting we identified the main factor being large log output created by jobs (>100 MB per order log).

Additional factors are the period of 2 days for which such orders exist, for example when retrying to execute failed jobs. Such orders prevent the journal from shrinking and can be removed after completion only.

If the Controller is restarted in between, then it will flood JOC Cockpit with events stored in its journal. This will create high load on JOC Cockpit's Java heap space that had to be increased to >5GB to comply with the number of events.

We can say that we find a sizing problem for jobs that create large log output from orders that persist for a longer period such as two days. The system can be stabilized by assigning more Java heap space to Controller and JOC Cockpit. At the same time, the better approach is to reduce log output from repeated jobs that fills up the Controller's journal and that is detrimental for performance.

Best regards
Andreas

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Exception could not start logger

JobScheduler workload automation to execute jobs and workflows

Group

Searches

Help

#222 Exception could not start logger

Discussion