Menu

#121 Daemon crash when restarted with in-progress jobs

fixed
2014-07-30
2014-07-28
No

I recently installed the latest version of Netdisco (2.028012) and I'm seeing some strange behavior. I ran a full discovery and for the most part, everything seemed to work. But then I noticed that there were some discover jobs (only 9 out of a large amount) in the queue that were not completing and were just alternating between "running on ..." and "queued" over and over again and they just wouldn't complete. I left them for a very long time with no change.

The daemon was still running at that time. I then restarted the daemon and expected the jobs to kick off, which they attempted to, but then the daemon crashed and I saw the following in the log:

[2183] 2014-07-24 02:07:01 warn App::Netdisco 2.028012 backend
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::SQLite::st execute failed: UNIQUE constraint failed: admin.job [for Statement "INSERT INTO admin ( action, debug, device, entered, finished, job, log, port, started, status, subaction, type, userip, username) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )"] at /home/netdisco/perl5/lib/perl5/App/Netdisco/Daemon/LocalQueue.pm line 17

netdisco-daemon-fg: caught signal 'DIE', exiting

As you can see, this killed the daemon and the jobs were still left in "running on" state in the web app.

I've tried restarting the daemon a few times, but I still see the same behavior. It starts up, tries running the jobs and then gets that error and the daemon stops running.

I did see the same error message posted recently by someone else in the list who had just updated, but I believe it was stated that it can be safely ignored. Seeing this behavior, I'm not sure that's the case.

Here's my system info in case it is needed:

App::Netdisco: 2.028012
DB Schema: v38
Dancer: 1.3126
Bootstrap: 2.3.1
PostgreSQL: PostgreSQL 9.3.4 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7), 64-bit. DBI 1.63, DBD::Pg 2.19.3
SNMP::Info: 3.18
Perl: 5.018002

52 devices with 1,144 interfaces using 259 IPs
107 layer 2 links between devices
1,435 nodes in 2,702 entries
1,665 IPs in 1,665 entries

Discussion

  • Oliver Gorwits

    Oliver Gorwits - 2014-07-30
    • labels: --> Bug, Daemon
    • status: new --> fixed
    • assigned_to: Oliver Gorwits
     
  • Oliver Gorwits

    Oliver Gorwits - 2014-07-30

    fixed in 2.028013