Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

||In the meantime, we've made a wiki paragraph describing it:

https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat

in short, either

reindex table concurrently $tablename;

or

reindex database concurrently netdisco;

to do the whole db at once.

Cheers
Christian

||
On 10.02.22 03:16, Muris wrote:
>
> I meant to say in my previous email I did a Vacuum Full, which fixed 
> the issue with size and dB and everything is back to being responsive.
>
> Christian, can you tell me how to perform the Reindex Concurrently? 
> What command do you exactly issue?
>
> I tried to execute but not sure if im doing It right.
>
> Cheers
>
> *From: *alcatron <alc...@gm...>
> *Date: *Wednesday, 2 February 2022 at 19:51
> *To: *Christian Ramseyer <ram...@ne...>, 
> "net...@li..." 
> <net...@li...>, Jethro Binks 
> <jet...@st...>
> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly 
> stopped working
>
> Thanks, I did a autovacuum and I reduced the DB size back down, and 
> saved 15gig.
>
> How do you execute reindex concurrently correctly on the db?
>
> *From: *Christian Ramseyer <ram...@ne...>
> *Date: *Monday, 31 January 2022 at 11:44 pm
> *To: *alcatron <alc...@gm...>, 
> net...@li... 
> <net...@li...>, Jethro Binks 
> <jet...@st...>
> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly 
> stopped working
>
> For us, the autovacuum works fine, so I never do vacuum full. But you
> can do it if you're totally out of space and need to reclaim some from
> Postgres.
>
> The only thing that needs additional intervention is the indexes, we do
> a REINDEX CONCURRENTLY once a week, as described here:
> https://www.postgresql.org/docs/current/routine-reindex.html
>
> Apparently this "index bloat" should be less of an issue in Postgres 14
> but I haven't gotten around to try it yet.
>
>
>
> On 31.01.22 13:38, alcatron wrote:
> > Thanks, would I ever have to do a “vacuum full” to the db by any chance
> > or should it be automatic process?
> >
> > Auto Vacuum is set to on
> >
> > *From: *Christian Ramseyer <ram...@ne...>
> > *Date: *Monday, 31 January 2022 at 10:34 pm
> > *To: *alcatron <alc...@gm...>,
> > net...@li...
> > <net...@li...>, Jethro Binks
> > <jet...@st...>
> > *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly 
> stopped
> > working
> >
> >
> >
> > On 31.01.22 12:56, alcatron wrote:
> >> Thanks Christian, those commands you mentioned is that just at the 
> psql
> >> command line?
> >>
> >
> > Yes exactly. You can start the command line interface with "netdisco-do
> > psql".
> >
> >
> >> For some reason ever since I cleaned this device_skip table the 
> netdisco
> >> postgresql folder has grown dramatically in size by an extra 15gig
> >> within 2 weeks.
> >>
> >> I see this directory taking up the space -
> >> /var/lib/pgsql/12/data/base/16386 and lot of other files in there.
> >>
> >> I had a look at the netdisco tables and I cant see any table that 
> big in
> >> size, so im not really sure why the psql has dramatically keeps
> >> increasing in disk size ?
> >
> >
> > You should see what uses the space with the first query from here:
> > https://wiki.postgresql.org/wiki/Disk_Usage
> > <https://wiki.postgresql.org/wiki/Disk_Usage>
> >
> > This will include indexes and TOAST tables, the space is probably used
> > there instead of the actual table object.
> >
> > Cheers
> > Christian
> >
> >
> >
> >
> >>
> >> Schema |            Name            | Type  | Owner   |    Size    |
> >> Description
> >>
> >> 
> --------+----------------------------+-------+----------+------------+-------------
> >>
> >> public | admin                      | table | netdisco | 173 MB     |
> >>
> >>   public | community                  | table | netdisco | 224 kB     |
> >>
> >>   public | dbix_class_schema_versions | table | netdisco | 40 kB      |
> >>
> >>   public | device                     | table | netdisco | 3312 kB    |
> >>
> >>   public | device_ip                  | table | netdisco | 34 MB      |
> >>
> >>   public | device_module              | table | netdisco | 895 MB     |
> >>
> >>   public | device_port                | table | netdisco | 1656 MB    |
> >>
> >>   public | device_port_log            | table | netdisco | 48 kB      |
> >>
> >>   public | device_port_power          | table | netdisco | 124 MB     |
> >>
> >>   public | device_port_properties     | table | netdisco | 354 MB     |
> >>
> >>   public | device_port_ssid           | table | netdisco | 17 MB      |
> >>
> >>   public | device_port_vlan           | table | netdisco | 1084 MB    |
> >>
> >>   public | device_port_wireless       | table | netdisco | 6776 kB    |
> >>
> >>   public | device_power               | table | netdisco | 1760 kB    |
> >>
> >>   public | device_skip                | table | netdisco | 5544 kB    |
> >>
> >>   public | device_vlan                | table | netdisco | 67 MB      |
> >>
> >>   public | log                        | table | netdisco | 8192 bytes |
> >>
> >>   public | netmap_positions           | table | netdisco | 288 kB     |
> >>
> >>   public | node                       | table | netdisco | 317 MB     |
> >>
> >>   public | node_ip                    | table | netdisco | 2084 MB    |
> >>
> >>   public | node_monitor               | table | netdisco | 8192 bytes |
> >>
> >>   public | node_nbt                   | table | netdisco | 4328 kB    |
> >>
> >>   public | node_wireless              | table | netdisco | 16 MB      |
> >>
> >>   public | oui                        | table | netdisco | 2160 kB    |
> >>
> >>   public | process                    | table | netdisco | 8192 bytes |
> >>
> >>   public | sessions                   | table | netdisco | 48 kB      |
> >>
> >>   public | statistics                 | table | netdisco | 200 kB     |
> >>
> >>   public | subnets                    | table | netdisco | 1296 kB    |
> >>
> >>   public | topology                   | table | netdisco | 48 kB      |
> >>
> >>   public | user_log                   | table | netdisco | 600 kB     |
> >>
> >>   public | users                      | table | netdisco | 48 kB      |
> >>
> >> *From: *Christian Ramseyer <ram...@ne...>
> >> *Date: *Thursday, 20 January 2022 at 12:45 am
> >> *To: *alcatron <alc...@gm...>,
> >> net...@li...
> >> <net...@li...>, Jethro Binks
> >> <jet...@st...>
> >> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly 
> stopped
> >> working
> >>
> >>
> >>
> >> On 19.01.22 14:00, alcatron wrote:
> >>> As for picking up on the error, I saw this in the netdisco-backend 
> log.
> >>> I believe the device_skip table was getting so big it was running 
> out of
> >>> memory processing it, the device skip table was like 162MB
> >>>
> >>> Im sure this will happen again in the next 2-3 months when the
> >>> device_skip table builds up. Perhaps its some kind of bug it can only
> >>> handle a device_skip table of a certain size?
> >>
> >> It's weird how it would get that big, as IIRC it keeps only one record
> >> per device in your DB at most. Is this including indexes? They might
> >> become quite big, since Postgres can create some "bloat" under our
> >> insert/delete pattern.
> >>
> >> device_skip is just used to not poll unreachable devices over and over
> >> again, there is no important data in there. So if in doubt,
> >>
> >> delete from device_skip;
> >> vacuum analyze device_skip;
> >> reindex table device skip;
> >>
> >> should allow for a fresh start.
> >>
> >> There are also the max_deferrals and retry_after options to control the
> >> skip behaviour. I don't think it will affect the table size much 
> though.
> >> https://github.com/netdisco/netdisco/wiki/Configuration#workers
> > <https://github.com/netdisco/netdisco/wiki/Configuration#workers>
> >> <https://github.com/netdisco/netdisco/wiki/Configuration#workers
> > <https://github.com/netdisco/netdisco/wiki/Configuration#workers>>
> >>
> >> If you're getting these issues regularly I'd definitely experiment with
> >> the Postgres memory settings a bit, starting at work_mem.
> >>
> >> Cheers
> >> Christian
> >>
> >>
> >>>
> >>> Both of these in the netdisco-backend.log were referring to items 
> in the
> >>> “device_skip”, I looked through lots of logged data and found when it
> >>> started not working.
> >>>
> >>> DETAIL:  Failed on request of size 284 in memory context
> >>> "CacheMemoryContext". [for Statement "SELECT me.backend, me.device,
> >>> me.actionset, me.deferrals, me.last_defer FROM device_skip me 
> WHERE ( (
> >>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\',
> >>> 2=\'10.1.1.1\'] at
> >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm 
> line 261
> >>>
> >>> '}, 'DBIx::Class::Exception' )
> >>>
> >>> [18851] 2022-01-11 01:30:43 error bless( {'msg' =>
> >>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: 
> DBD::Pg::st
> >>> execute failed: ERROR:  out of memory
> >>>
> >>> DETAIL:  Failed on request of size 8344 in memory context
> >>> "MessageContext". [for Statement "SELECT me.backend, me.device,
> >>> me.actionset, me.deferrals, me.last_defer FROM device_skip me 
> WHERE ( (
> >>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\',
> >>> 2=\10.1.1.2\'] at
> >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm 
> line 261
> >>>
> >>> '}, 'DBIx::Class::Exception' )
> >>>
> >>> *From: *alcatron <alc...@gm...>
> >>> *Date: *Wednesday, 19 January 2022 at 10:14 pm
> >>> *To: *Christian Ramseyer <ram...@ne...>,
> >>> net...@li... 
> <net...@li...>
> >>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly 
> stopped
> >>> working
> >>>
> >>> Hi Christian, thankyou for the tips.
> >>>
> >>> I found what the problem is, it was crashing and not going past a
> >>> certain object in the “device_skip” table in the database.
> >>>
> >>> I truncated that field in psql, and let it re-populate and that fixed
> >>> the automatic discovery and arpnip/macsuck etc.
> >>>
> >>> I have found after a while perhaps 2-3 months something happens in 
> the
> >>> “device_skip” table and halts these processes then I need to clear 
> it to
> >>> make it work again. I remember I had this similar issue a few months
> >>> back, then I remembered what I did.
> >>>
> >>> Muris
> >>>
> >>> *From: *Christian Ramseyer <ram...@ne...>
> >>> *Date: *Tuesday, 18 January 2022 at 12:20 pm
> >>> *To: *alcatron <alc...@gm...>,
> >>> net...@li... 
> <net...@li...>
> >>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly 
> stopped
> >>> working
> >>>
> >>> Hi
> >>>
> >>>   >  could not connect to
> >>>   > server: No such file or directory/
> >>>
> >>> This would be very concerning, meaning that Postgres is not running at
> >>> all. But since you seem to have the web frontend running that is
> >>> probably not the case currently, so I wouldn't worry too much. 
> Might be
> >>> an old log entry.
> >>>
> >>>
> >>>   > Failed on request of size 16 in memory context
> >>>   > "MessageContext".
> >>>
> >>> That on the other hand might be the issue. Postgres uses all kinds of
> >>> memory parameters, if one of them is too small the total GB of RAM
> >>> sticks in the server don't matter much.
> >>>
> >>> I had various issues with huge and clogged up discovery queues 
> over the
> >>> years, as a first measure I'd try to:
> >>>
> >>> stop netdisco-backend
> >>> restart Postgres, connect to the database with "netdisco-do psql" 
> and in
> >>> there run a "delete from admin;".
> >>> for good measure, also run "reindex table admin;"
> >>> restart netdisco-backend
> >>>
> >>> This sounds dangerous but admin is in fact just the queue of 
> actions to
> >>> be done, so no important data will be lost.
> >>>
> >>> Also a "select count(*) from admin" first might be interesting, to see
> >>> how many rows are in there. If it's an absurdly high number (millions)
> >>> you can run e.g. "create table admin_backup as select * from 
> admin;" for
> >>> analysis later.
> >>>
> >>> If you're still getting the memory errors afterwards and it still
> >>> doesn't work, I'd try to configure the memory parameters with this
> >>> assistant, using the "online transaction processing" db type.
> >>> https://pgtune.leopard.in.ua/#/about 
> <https://pgtune.leopard.in.ua/#/about>
> >> <https://pgtune.leopard.in.ua/#/about
> > <https://pgtune.leopard.in.ua/#/about>>
> >> <https://pgtune.leopard.in.ua/#/about
> >> <https://pgtune.leopard.in.ua/#/about
> > <https://pgtune.leopard.in.ua/#/about>>>
> >>>
> >>>
> >>> Cheers
> >>> Christian
> >>>
> >>>
> >>>
> >>> On 17.01.22 22:03, alcatron wrote:
> >>>> Hi all, just wanting to ask your thoughts on what could be causing
> >>>> netdisco to suddenly stop performing auto discovery tasks.
> >>>>
> >>>> Seems only arpnip is working via scheduled tasks, but 
> discovery/macsuck
> >>>> has halted to auto perform. If I go manually to the device on web
> >>>> interface and trigger the auto discovery/arpnip/macsuck it works 
> fine on
> >>>> the device.
> >>>>
> >>>> Nothing has changed on system, running for a few months now, and
> >>>> suddenly the auto discovery is broken partly.
> >>>>
> >>>> If I go to the backend log I see error like this below. The 
> server is
> >>>> running and operational as I can still perform the manual to get
> >>>> discovery etc
> >>>>
> >>>> The server is not out of memory as it has like 16GB and still plenty
> >>>> unused not what the messages are indicating..
> >>>>
> >>>> Thanks for any assistance 😊
> >>>>
> >>>> /DBIx::Class::Schema::Versioned::_on_connect(): Your DB is currently
> >>>> unversioned. Please call upgrade on your schema to sync the DB. at
> >>>> /home/netdisco/perl5/lib/perl5/DBICx/Sugar.pm line 121/
> >>>>
> >>>> /DBIx::Class::Storage::DBI::catch {...} (): DBI Connection 
> failed: DBI
> >>>> connect('dbname=netdisco','netdisco',...) failed: could not 
> connect to
> >>>> server: No such file or directory/
> >>>>
> >>>> /            Is the server running locally and accepting/
> >>>>
> >>>> /            connections on Unix domain socket
> >>>> "/var/run/postgresql/.s.PGSQL.5432"? at
> >>>> /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 
> 1639. at
> >>>> 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm 
> line 50/
> >>>>
> >>>> //
> >>>>
> >>>> /[25756] error bless( {'msg' =>
> >>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: 
> DBD::Pg::st
> >>>> execute failed: ERROR:  out of memory/
> >>>>
> >>>> /DETAIL:  Failed on request of size 16 in memory context
> >>>> "MessageContext". [for Statement "SELECT me.job, me.entered, 
> me.started,
> >>>> me.finished, me.device, me.port, me.action, me.subaction, me.status,
> >>>> me.username, me.userip, me.log, me.debug, me.device_key FROM 
> admin me
> >>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 
> 1=\'186421742\'] at
> >>>> 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm 
> line 267/
> >>>>
> >>>> /'}, 'DBIx::Class::Exception' )/
> >>>>
> >>>> /[25781] 2022-01-11 01:33:53 error bless( {'msg' =>
> >>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: 
> DBD::Pg::st
> >>>> execute failed: ERROR:  out of memory/
> >>>>
> >>>> /DETAIL:  Failed on request of size 16 in memory context
> >>>> "MessageContext". [for Statement "SELECT me.job, me.entered, 
> me.started,
> >>>> me.finished, me.device, me.port, me.action, me.subaction, me.status,
> >>>> me.username, me.userip, me.log, me.debug, me.device_key FROM 
> admin me
> >>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 
> 1=\'186420514\'] at
> >>>> 
> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm 
> line 267/
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Netdisco mailing list
> >>>> net...@li...
> >>>> https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>
> >> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>>
> >>> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> >> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/
> > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>>>
> >>>
> >>> --
> >>> Christian Ramseyer, netnea ag
> >>> Network Management. Security. OpenSource.
> >>> https://www.netnea.com <https://www.netnea.com> 
> <https://www.netnea.com
> > <https://www.netnea.com>> <https://www.netnea.com
> >> <https://www.netnea.com <https://www.netnea.com>>>
> >>> Phone: +41 79 644 77 64
> >>>
> >>
> >> --
> >> Christian Ramseyer, netnea ag
> >> Network Management. Security. OpenSource.
> >> https://www.netnea.com <https://www.netnea.com> 
> <https://www.netnea.com
> > <https://www.netnea.com>>
> >> Phone: +41 79 644 77 64
> >>
> >
> > --
> > Christian Ramseyer, netnea ag
> > Network Management. Security. OpenSource.
> > https://www.netnea.com <https://www.netnea.com>
> > Phone: +41 79 644 77 64
> >
>
> -- 
> Christian Ramseyer, netnea ag
> Network Management. Security. OpenSource.
> https://www.netnea.com
> Phone: +41 79 644 77 64
>

-- 
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64

Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working

Full-featured enterprise network management tool

Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working