Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thanks, would I ever have to do a “vacuum full” to the db by any chance or should it be automatic process?

Auto Vacuum is set to on

From: Christian Ramseyer <ram...@ne...>
Date: Monday, 31 January 2022 at 10:34 pm
To: alcatron <alc...@gm...>, net...@li... <net...@li...>, Jethro Binks <jet...@st...>
Subject: Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

On 31.01.22 12:56, alcatron wrote:
> Thanks Christian, those commands you mentioned is that just at the psql
> command line?
>

Yes exactly. You can start the command line interface with "netdisco-do
psql".

> For some reason ever since I cleaned this device_skip table the netdisco
> postgresql folder has grown dramatically in size by an extra 15gig
> within 2 weeks.
>
> I see this directory taking up the space -
> /var/lib/pgsql/12/data/base/16386 and lot of other files in there.
>
> I had a look at the netdisco tables and I cant see any table that big in
> size, so im not really sure why the psql has dramatically keeps
> increasing in disk size ?

You should see what uses the space with the first query from here:
https://wiki.postgresql.org/wiki/Disk_Usage

This will include indexes and TOAST tables, the space is probably used
there instead of the actual table object.

Cheers
Christian

>
> Schema |            Name            | Type  |  Owner   |    Size    |
> Description
>
> --------+----------------------------+-------+----------+------------+-------------
>
> public | admin                      | table | netdisco | 173 MB     |
>
>   public | community                  | table | netdisco | 224 kB     |
>
>   public | dbix_class_schema_versions | table | netdisco | 40 kB      |
>
>   public | device                     | table | netdisco | 3312 kB    |
>
>   public | device_ip                  | table | netdisco | 34 MB      |
>
>   public | device_module              | table | netdisco | 895 MB     |
>
>   public | device_port                | table | netdisco | 1656 MB    |
>
>   public | device_port_log            | table | netdisco | 48 kB      |
>
>   public | device_port_power          | table | netdisco | 124 MB     |
>
>   public | device_port_properties     | table | netdisco | 354 MB     |
>
>   public | device_port_ssid           | table | netdisco | 17 MB      |
>
>   public | device_port_vlan           | table | netdisco | 1084 MB    |
>
>   public | device_port_wireless       | table | netdisco | 6776 kB    |
>
>   public | device_power               | table | netdisco | 1760 kB    |
>
>   public | device_skip                | table | netdisco | 5544 kB    |
>
>   public | device_vlan                | table | netdisco | 67 MB      |
>
>   public | log                        | table | netdisco | 8192 bytes |
>
>   public | netmap_positions           | table | netdisco | 288 kB     |
>
>   public | node                       | table | netdisco | 317 MB     |
>
>   public | node_ip                    | table | netdisco | 2084 MB    |
>
>   public | node_monitor               | table | netdisco | 8192 bytes |
>
>   public | node_nbt                   | table | netdisco | 4328 kB    |
>
>   public | node_wireless              | table | netdisco | 16 MB      |
>
>   public | oui                        | table | netdisco | 2160 kB    |
>
>   public | process                    | table | netdisco | 8192 bytes |
>
>   public | sessions                   | table | netdisco | 48 kB      |
>
>   public | statistics                 | table | netdisco | 200 kB     |
>
>   public | subnets                    | table | netdisco | 1296 kB    |
>
>   public | topology                   | table | netdisco | 48 kB      |
>
>   public | user_log                   | table | netdisco | 600 kB     |
>
>   public | users                      | table | netdisco | 48 kB      |
>
> *From: *Christian Ramseyer <ram...@ne...>
> *Date: *Thursday, 20 January 2022 at 12:45 am
> *To: *alcatron <alc...@gm...>,
> net...@li...
> <net...@li...>, Jethro Binks
> <jet...@st...>
> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped
> working
>
>
>
> On 19.01.22 14:00, alcatron wrote:
>> As for picking up on the error, I saw this in the netdisco-backend log.
>> I believe the device_skip table was getting so big it was running out of
>> memory processing it, the device skip table was like 162MB
>>
>> Im sure this will happen again in the next 2-3 months when the
>> device_skip table builds up. Perhaps its some kind of bug it can only
>> handle a device_skip table of a certain size?
>
> It's weird how it would get that big, as IIRC it keeps only one record
> per device in your DB at most. Is this including indexes? They might
> become quite big, since Postgres can create some "bloat" under our
> insert/delete pattern.
>
> device_skip is just used to not poll unreachable devices over and over
> again, there is no important data in there. So if in doubt,
>
> delete from device_skip;
> vacuum analyze device_skip;
> reindex table device skip;
>
> should allow for a fresh start.
>
> There are also the max_deferrals and retry_after options to control the
> skip behaviour. I don't think it will affect the table size much though.
> https://github.com/netdisco/netdisco/wiki/Configuration#workers
> <https://github.com/netdisco/netdisco/wiki/Configuration#workers>
>
> If you're getting these issues regularly I'd definitely experiment with
> the Postgres memory settings a bit, starting at work_mem.
>
> Cheers
> Christian
>
>
>>
>> Both of these in the netdisco-backend.log were referring to items in the
>> “device_skip”, I looked through lots of logged data and found when it
>> started not working.
>>
>> DETAIL:  Failed on request of size 284 in memory context
>> "CacheMemoryContext". [for Statement "SELECT me.backend, me.device,
>> me.actionset, me.deferrals, me.last_defer FROM device_skip me WHERE ( (
>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\',
>> 2=\'10.1.1.1\'] at
>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 261
>>
>> '}, 'DBIx::Class::Exception' )
>>
>> [18851] 2022-01-11 01:30:43 error bless( {'msg' =>
>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
>> execute failed: ERROR:  out of memory
>>
>> DETAIL:  Failed on request of size 8344 in memory context
>> "MessageContext". [for Statement "SELECT me.backend, me.device,
>> me.actionset, me.deferrals, me.last_defer FROM device_skip me WHERE ( (
>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\',
>> 2=\10.1.1.2\'] at
>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 261
>>
>> '}, 'DBIx::Class::Exception' )
>>
>> *From: *alcatron <alc...@gm...>
>> *Date: *Wednesday, 19 January 2022 at 10:14 pm
>> *To: *Christian Ramseyer <ram...@ne...>,
>> net...@li... <net...@li...>
>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped
>> working
>>
>> Hi Christian, thankyou for the tips.
>>
>> I found what the problem is, it was crashing and not going past a
>> certain object in the “device_skip” table in the database.
>>
>> I truncated that field in psql, and let it re-populate and that fixed
>> the automatic discovery and arpnip/macsuck etc.
>>
>> I have found after a while perhaps 2-3 months something happens in the
>> “device_skip” table and halts these processes then I need to clear it to
>> make it work again. I remember I had this similar issue a few months
>> back, then I remembered what I did.
>>
>> Muris
>>
>> *From: *Christian Ramseyer <ram...@ne...>
>> *Date: *Tuesday, 18 January 2022 at 12:20 pm
>> *To: *alcatron <alc...@gm...>,
>> net...@li... <net...@li...>
>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped
>> working
>>
>> Hi
>>
>>   >  could not connect to
>>   > server: No such file or directory/
>>
>> This would be very concerning, meaning that Postgres is not running at
>> all. But since you seem to have the web frontend running that is
>> probably not the case currently, so I wouldn't worry too much. Might be
>> an old log entry.
>>
>>
>>   > Failed on request of size 16 in memory context
>>   > "MessageContext".
>>
>> That on the other hand might be the issue. Postgres uses all kinds of
>> memory parameters, if one of them is too small the total GB of RAM
>> sticks in the server don't matter much.
>>
>> I had various issues with huge and clogged up discovery queues over the
>> years, as a first measure I'd try to:
>>
>> stop netdisco-backend
>> restart Postgres, connect to the database with "netdisco-do psql" and in
>> there run a "delete from admin;".
>> for good measure, also run "reindex table admin;"
>> restart netdisco-backend
>>
>> This sounds dangerous but admin is in fact just the queue of actions to
>> be done, so no important data will be lost.
>>
>> Also a "select count(*) from admin" first might be interesting, to see
>> how many rows are in there. If it's an absurdly high number (millions)
>> you can run e.g. "create table admin_backup as select * from admin;" for
>> analysis later.
>>
>> If you're still getting the memory errors afterwards and it still
>> doesn't work, I'd try to configure the memory parameters with this
>> assistant, using the "online transaction processing" db type.
>> https://pgtune.leopard.in.ua/#/about
> <https://pgtune.leopard.in.ua/#/about>
> <https://pgtune.leopard.in.ua/#/about
> <https://pgtune.leopard.in.ua/#/about>>
>>
>>
>> Cheers
>> Christian
>>
>>
>>
>> On 17.01.22 22:03, alcatron wrote:
>>> Hi all, just wanting to ask your thoughts on what could be causing
>>> netdisco to suddenly stop performing auto discovery tasks.
>>>
>>> Seems only arpnip is working via scheduled tasks, but discovery/macsuck
>>> has halted to auto perform. If I go manually to the device on web
>>> interface and trigger the auto discovery/arpnip/macsuck it works fine on
>>> the device.
>>>
>>> Nothing has changed on system, running for a few months now, and
>>> suddenly the auto discovery is broken partly.
>>>
>>> If I go to the backend log I see error like this below. The server is
>>> running and operational as I can still perform the manual to get
>>> discovery etc
>>>
>>> The server is not out of memory as it has like 16GB and still plenty
>>> unused not what the messages are indicating..
>>>
>>> Thanks for any assistance 😊
>>>
>>> /DBIx::Class::Schema::Versioned::_on_connect(): Your DB is currently
>>> unversioned. Please call upgrade on your schema to sync the DB. at
>>> /home/netdisco/perl5/lib/perl5/DBICx/Sugar.pm line 121/
>>>
>>> /DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI
>>> connect('dbname=netdisco','netdisco',...) failed: could not connect to
>>> server: No such file or directory/
>>>
>>> /            Is the server running locally and accepting/
>>>
>>> /            connections on Unix domain socket
>>> "/var/run/postgresql/.s.PGSQL.5432"? at
>>> /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1639. at
>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 50/
>>>
>>> //
>>>
>>> /[25756] error bless( {'msg' =>
>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
>>> execute failed: ERROR:  out of memory/
>>>
>>> /DETAIL:  Failed on request of size 16 in memory context
>>> "MessageContext". [for Statement "SELECT me.job, me.entered, me.started,
>>> me.finished, me.device, me.port, me.action, me.subaction, me.status,
>>> me.username, me.userip, me.log, me.debug, me.device_key FROM admin me
>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 1=\'186421742\'] at
>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 267/
>>>
>>> /'}, 'DBIx::Class::Exception' )/
>>>
>>> /[25781] 2022-01-11 01:33:53 error bless( {'msg' =>
>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
>>> execute failed: ERROR:  out of memory/
>>>
>>> /DETAIL:  Failed on request of size 16 in memory context
>>> "MessageContext". [for Statement "SELECT me.job, me.entered, me.started,
>>> me.finished, me.device, me.port, me.action, me.subaction, me.status,
>>> me.username, me.userip, me.log, me.debug, me.device_key FROM admin me
>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 1=\'186420514\'] at
>>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 267/
>>>
>>>
>>>
>>> _______________________________________________
>>> Netdisco mailing list
>>> net...@li...
>>> https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
>> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>>
>>
>> --
>> Christian Ramseyer, netnea ag
>> Network Management. Security. OpenSource.
>> https://www.netnea.com <https://www.netnea.com> <https://www.netnea.com
> <https://www.netnea.com>>
>> Phone: +41 79 644 77 64
>>
>
> --
> Christian Ramseyer, netnea ag
> Network Management. Security. OpenSource.
> https://www.netnea.com <https://www.netnea.com>
> Phone: +41 79 644 77 64
>

--
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64

Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

Full-featured enterprise network management tool

Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working