|
From: alcatron <alc...@gm...> - 2022-01-31 12:38:27
|
Thanks, would I ever have to do a “vacuum full” to the db by any chance or should it be automatic process? Auto Vacuum is set to on From: Christian Ramseyer <ram...@ne...> Date: Monday, 31 January 2022 at 10:34 pm To: alcatron <alc...@gm...>, net...@li... <net...@li...>, Jethro Binks <jet...@st...> Subject: Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped working On 31.01.22 12:56, alcatron wrote: > Thanks Christian, those commands you mentioned is that just at the psql > command line? > Yes exactly. You can start the command line interface with "netdisco-do psql". > For some reason ever since I cleaned this device_skip table the netdisco > postgresql folder has grown dramatically in size by an extra 15gig > within 2 weeks. > > I see this directory taking up the space - > /var/lib/pgsql/12/data/base/16386 and lot of other files in there. > > I had a look at the netdisco tables and I cant see any table that big in > size, so im not really sure why the psql has dramatically keeps > increasing in disk size ? You should see what uses the space with the first query from here: https://wiki.postgresql.org/wiki/Disk_Usage This will include indexes and TOAST tables, the space is probably used there instead of the actual table object. Cheers Christian > > Schema | Name | Type | Owner | Size | > Description > > --------+----------------------------+-------+----------+------------+------------- > > public | admin | table | netdisco | 173 MB | > > public | community | table | netdisco | 224 kB | > > public | dbix_class_schema_versions | table | netdisco | 40 kB | > > public | device | table | netdisco | 3312 kB | > > public | device_ip | table | netdisco | 34 MB | > > public | device_module | table | netdisco | 895 MB | > > public | device_port | table | netdisco | 1656 MB | > > public | device_port_log | table | netdisco | 48 kB | > > public | device_port_power | table | netdisco | 124 MB | > > public | device_port_properties | table | netdisco | 354 MB | > > public | device_port_ssid | table | netdisco | 17 MB | > > public | device_port_vlan | table | netdisco | 1084 MB | > > public | device_port_wireless | table | netdisco | 6776 kB | > > public | device_power | table | netdisco | 1760 kB | > > public | device_skip | table | netdisco | 5544 kB | > > public | device_vlan | table | netdisco | 67 MB | > > public | log | table | netdisco | 8192 bytes | > > public | netmap_positions | table | netdisco | 288 kB | > > public | node | table | netdisco | 317 MB | > > public | node_ip | table | netdisco | 2084 MB | > > public | node_monitor | table | netdisco | 8192 bytes | > > public | node_nbt | table | netdisco | 4328 kB | > > public | node_wireless | table | netdisco | 16 MB | > > public | oui | table | netdisco | 2160 kB | > > public | process | table | netdisco | 8192 bytes | > > public | sessions | table | netdisco | 48 kB | > > public | statistics | table | netdisco | 200 kB | > > public | subnets | table | netdisco | 1296 kB | > > public | topology | table | netdisco | 48 kB | > > public | user_log | table | netdisco | 600 kB | > > public | users | table | netdisco | 48 kB | > > *From: *Christian Ramseyer <ram...@ne...> > *Date: *Thursday, 20 January 2022 at 12:45 am > *To: *alcatron <alc...@gm...>, > net...@li... > <net...@li...>, Jethro Binks > <jet...@st...> > *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped > working > > > > On 19.01.22 14:00, alcatron wrote: >> As for picking up on the error, I saw this in the netdisco-backend log. >> I believe the device_skip table was getting so big it was running out of >> memory processing it, the device skip table was like 162MB >> >> Im sure this will happen again in the next 2-3 months when the >> device_skip table builds up. Perhaps its some kind of bug it can only >> handle a device_skip table of a certain size? > > It's weird how it would get that big, as IIRC it keeps only one record > per device in your DB at most. Is this including indexes? They might > become quite big, since Postgres can create some "bloat" under our > insert/delete pattern. > > device_skip is just used to not poll unreachable devices over and over > again, there is no important data in there. So if in doubt, > > delete from device_skip; > vacuum analyze device_skip; > reindex table device skip; > > should allow for a fresh start. > > There are also the max_deferrals and retry_after options to control the > skip behaviour. I don't think it will affect the table size much though. > https://github.com/netdisco/netdisco/wiki/Configuration#workers > <https://github.com/netdisco/netdisco/wiki/Configuration#workers> > > If you're getting these issues regularly I'd definitely experiment with > the Postgres memory settings a bit, starting at work_mem. > > Cheers > Christian > > >> >> Both of these in the netdisco-backend.log were referring to items in the >> “device_skip”, I looked through lots of logged data and found when it >> started not working. >> >> DETAIL: Failed on request of size 284 in memory context >> "CacheMemoryContext". [for Statement "SELECT me.backend, me.device, >> me.actionset, me.deferrals, me.last_defer FROM device_skip me WHERE ( ( >> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\', >> 2=\'10.1.1.1\'] at >> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 261 >> >> '}, 'DBIx::Class::Exception' ) >> >> [18851] 2022-01-11 01:30:43 error bless( {'msg' => >> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st >> execute failed: ERROR: out of memory >> >> DETAIL: Failed on request of size 8344 in memory context >> "MessageContext". [for Statement "SELECT me.backend, me.device, >> me.actionset, me.deferrals, me.last_defer FROM device_skip me WHERE ( ( >> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\', >> 2=\10.1.1.2\'] at >> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 261 >> >> '}, 'DBIx::Class::Exception' ) >> >> *From: *alcatron <alc...@gm...> >> *Date: *Wednesday, 19 January 2022 at 10:14 pm >> *To: *Christian Ramseyer <ram...@ne...>, >> net...@li... <net...@li...> >> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped >> working >> >> Hi Christian, thankyou for the tips. >> >> I found what the problem is, it was crashing and not going past a >> certain object in the “device_skip” table in the database. >> >> I truncated that field in psql, and let it re-populate and that fixed >> the automatic discovery and arpnip/macsuck etc. >> >> I have found after a while perhaps 2-3 months something happens in the >> “device_skip” table and halts these processes then I need to clear it to >> make it work again. I remember I had this similar issue a few months >> back, then I remembered what I did. >> >> Muris >> >> *From: *Christian Ramseyer <ram...@ne...> >> *Date: *Tuesday, 18 January 2022 at 12:20 pm >> *To: *alcatron <alc...@gm...>, >> net...@li... <net...@li...> >> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly stopped >> working >> >> Hi >> >> > could not connect to >> > server: No such file or directory/ >> >> This would be very concerning, meaning that Postgres is not running at >> all. But since you seem to have the web frontend running that is >> probably not the case currently, so I wouldn't worry too much. Might be >> an old log entry. >> >> >> > Failed on request of size 16 in memory context >> > "MessageContext". >> >> That on the other hand might be the issue. Postgres uses all kinds of >> memory parameters, if one of them is too small the total GB of RAM >> sticks in the server don't matter much. >> >> I had various issues with huge and clogged up discovery queues over the >> years, as a first measure I'd try to: >> >> stop netdisco-backend >> restart Postgres, connect to the database with "netdisco-do psql" and in >> there run a "delete from admin;". >> for good measure, also run "reindex table admin;" >> restart netdisco-backend >> >> This sounds dangerous but admin is in fact just the queue of actions to >> be done, so no important data will be lost. >> >> Also a "select count(*) from admin" first might be interesting, to see >> how many rows are in there. If it's an absurdly high number (millions) >> you can run e.g. "create table admin_backup as select * from admin;" for >> analysis later. >> >> If you're still getting the memory errors afterwards and it still >> doesn't work, I'd try to configure the memory parameters with this >> assistant, using the "online transaction processing" db type. >> https://pgtune.leopard.in.ua/#/about > <https://pgtune.leopard.in.ua/#/about> > <https://pgtune.leopard.in.ua/#/about > <https://pgtune.leopard.in.ua/#/about>> >> >> >> Cheers >> Christian >> >> >> >> On 17.01.22 22:03, alcatron wrote: >>> Hi all, just wanting to ask your thoughts on what could be causing >>> netdisco to suddenly stop performing auto discovery tasks. >>> >>> Seems only arpnip is working via scheduled tasks, but discovery/macsuck >>> has halted to auto perform. If I go manually to the device on web >>> interface and trigger the auto discovery/arpnip/macsuck it works fine on >>> the device. >>> >>> Nothing has changed on system, running for a few months now, and >>> suddenly the auto discovery is broken partly. >>> >>> If I go to the backend log I see error like this below. The server is >>> running and operational as I can still perform the manual to get >>> discovery etc >>> >>> The server is not out of memory as it has like 16GB and still plenty >>> unused not what the messages are indicating.. >>> >>> Thanks for any assistance 😊 >>> >>> /DBIx::Class::Schema::Versioned::_on_connect(): Your DB is currently >>> unversioned. Please call upgrade on your schema to sync the DB. at >>> /home/netdisco/perl5/lib/perl5/DBICx/Sugar.pm line 121/ >>> >>> /DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI >>> connect('dbname=netdisco','netdisco',...) failed: could not connect to >>> server: No such file or directory/ >>> >>> / Is the server running locally and accepting/ >>> >>> / connections on Unix domain socket >>> "/var/run/postgresql/.s.PGSQL.5432"? at >>> /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1639. at >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 50/ >>> >>> // >>> >>> /[25756] error bless( {'msg' => >>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st >>> execute failed: ERROR: out of memory/ >>> >>> /DETAIL: Failed on request of size 16 in memory context >>> "MessageContext". [for Statement "SELECT me.job, me.entered, me.started, >>> me.finished, me.device, me.port, me.action, me.subaction, me.status, >>> me.username, me.userip, me.log, me.debug, me.device_key FROM admin me >>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 1=\'186421742\'] at >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 267/ >>> >>> /'}, 'DBIx::Class::Exception' )/ >>> >>> /[25781] 2022-01-11 01:33:53 error bless( {'msg' => >>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st >>> execute failed: ERROR: out of memory/ >>> >>> /DETAIL: Failed on request of size 16 in memory context >>> "MessageContext". [for Statement "SELECT me.job, me.entered, me.started, >>> me.finished, me.device, me.port, me.action, me.subaction, me.status, >>> me.username, me.userip, me.log, me.debug, me.device_key FROM admin me >>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: 1=\'186420514\'] at >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 267/ >>> >>> >>> >>> _______________________________________________ >>> Netdisco mailing list >>> net...@li... >>> https://sourceforge.net/p/netdisco/mailman/netdisco-users/ > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/> >> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>> >> >> -- >> Christian Ramseyer, netnea ag >> Network Management. Security. OpenSource. >> https://www.netnea.com <https://www.netnea.com> <https://www.netnea.com > <https://www.netnea.com>> >> Phone: +41 79 644 77 64 >> > > -- > Christian Ramseyer, netnea ag > Network Management. Security. OpenSource. > https://www.netnea.com <https://www.netnea.com> > Phone: +41 79 644 77 64 > -- Christian Ramseyer, netnea ag Network Management. Security. OpenSource. https://www.netnea.com Phone: +41 79 644 77 64 |