|
From: Christian R. <ram...@ne...> - 2022-02-10 07:43:39
|
||In the meantime, we've made a wiki paragraph describing it: https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat in short, either reindex table concurrently $tablename; or reindex database concurrently netdisco; to do the whole db at once. Cheers Christian || On 10.02.22 03:16, Muris wrote: > > I meant to say in my previous email I did a Vacuum Full, which fixed > the issue with size and dB and everything is back to being responsive. > > Christian, can you tell me how to perform the Reindex Concurrently? > What command do you exactly issue? > > I tried to execute but not sure if im doing It right. > > Cheers > > *From: *alcatron <alc...@gm...> > *Date: *Wednesday, 2 February 2022 at 19:51 > *To: *Christian Ramseyer <ram...@ne...>, > "net...@li..." > <net...@li...>, Jethro Binks > <jet...@st...> > *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly > stopped working > > Thanks, I did a autovacuum and I reduced the DB size back down, and > saved 15gig. > > How do you execute reindex concurrently correctly on the db? > > *From: *Christian Ramseyer <ram...@ne...> > *Date: *Monday, 31 January 2022 at 11:44 pm > *To: *alcatron <alc...@gm...>, > net...@li... > <net...@li...>, Jethro Binks > <jet...@st...> > *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly > stopped working > > For us, the autovacuum works fine, so I never do vacuum full. But you > can do it if you're totally out of space and need to reclaim some from > Postgres. > > The only thing that needs additional intervention is the indexes, we do > a REINDEX CONCURRENTLY once a week, as described here: > https://www.postgresql.org/docs/current/routine-reindex.html > > Apparently this "index bloat" should be less of an issue in Postgres 14 > but I haven't gotten around to try it yet. > > > > On 31.01.22 13:38, alcatron wrote: > > Thanks, would I ever have to do a “vacuum full” to the db by any chance > > or should it be automatic process? > > > > Auto Vacuum is set to on > > > > *From: *Christian Ramseyer <ram...@ne...> > > *Date: *Monday, 31 January 2022 at 10:34 pm > > *To: *alcatron <alc...@gm...>, > > net...@li... > > <net...@li...>, Jethro Binks > > <jet...@st...> > > *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly > stopped > > working > > > > > > > > On 31.01.22 12:56, alcatron wrote: > >> Thanks Christian, those commands you mentioned is that just at the > psql > >> command line? > >> > > > > Yes exactly. You can start the command line interface with "netdisco-do > > psql". > > > > > >> For some reason ever since I cleaned this device_skip table the > netdisco > >> postgresql folder has grown dramatically in size by an extra 15gig > >> within 2 weeks. > >> > >> I see this directory taking up the space - > >> /var/lib/pgsql/12/data/base/16386 and lot of other files in there. > >> > >> I had a look at the netdisco tables and I cant see any table that > big in > >> size, so im not really sure why the psql has dramatically keeps > >> increasing in disk size ? > > > > > > You should see what uses the space with the first query from here: > > https://wiki.postgresql.org/wiki/Disk_Usage > > <https://wiki.postgresql.org/wiki/Disk_Usage> > > > > This will include indexes and TOAST tables, the space is probably used > > there instead of the actual table object. > > > > Cheers > > Christian > > > > > > > > > >> > >> Schema | Name | Type | Owner | Size | > >> Description > >> > >> > --------+----------------------------+-------+----------+------------+------------- > >> > >> public | admin | table | netdisco | 173 MB | > >> > >> public | community | table | netdisco | 224 kB | > >> > >> public | dbix_class_schema_versions | table | netdisco | 40 kB | > >> > >> public | device | table | netdisco | 3312 kB | > >> > >> public | device_ip | table | netdisco | 34 MB | > >> > >> public | device_module | table | netdisco | 895 MB | > >> > >> public | device_port | table | netdisco | 1656 MB | > >> > >> public | device_port_log | table | netdisco | 48 kB | > >> > >> public | device_port_power | table | netdisco | 124 MB | > >> > >> public | device_port_properties | table | netdisco | 354 MB | > >> > >> public | device_port_ssid | table | netdisco | 17 MB | > >> > >> public | device_port_vlan | table | netdisco | 1084 MB | > >> > >> public | device_port_wireless | table | netdisco | 6776 kB | > >> > >> public | device_power | table | netdisco | 1760 kB | > >> > >> public | device_skip | table | netdisco | 5544 kB | > >> > >> public | device_vlan | table | netdisco | 67 MB | > >> > >> public | log | table | netdisco | 8192 bytes | > >> > >> public | netmap_positions | table | netdisco | 288 kB | > >> > >> public | node | table | netdisco | 317 MB | > >> > >> public | node_ip | table | netdisco | 2084 MB | > >> > >> public | node_monitor | table | netdisco | 8192 bytes | > >> > >> public | node_nbt | table | netdisco | 4328 kB | > >> > >> public | node_wireless | table | netdisco | 16 MB | > >> > >> public | oui | table | netdisco | 2160 kB | > >> > >> public | process | table | netdisco | 8192 bytes | > >> > >> public | sessions | table | netdisco | 48 kB | > >> > >> public | statistics | table | netdisco | 200 kB | > >> > >> public | subnets | table | netdisco | 1296 kB | > >> > >> public | topology | table | netdisco | 48 kB | > >> > >> public | user_log | table | netdisco | 600 kB | > >> > >> public | users | table | netdisco | 48 kB | > >> > >> *From: *Christian Ramseyer <ram...@ne...> > >> *Date: *Thursday, 20 January 2022 at 12:45 am > >> *To: *alcatron <alc...@gm...>, > >> net...@li... > >> <net...@li...>, Jethro Binks > >> <jet...@st...> > >> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly > stopped > >> working > >> > >> > >> > >> On 19.01.22 14:00, alcatron wrote: > >>> As for picking up on the error, I saw this in the netdisco-backend > log. > >>> I believe the device_skip table was getting so big it was running > out of > >>> memory processing it, the device skip table was like 162MB > >>> > >>> Im sure this will happen again in the next 2-3 months when the > >>> device_skip table builds up. Perhaps its some kind of bug it can only > >>> handle a device_skip table of a certain size? > >> > >> It's weird how it would get that big, as IIRC it keeps only one record > >> per device in your DB at most. Is this including indexes? They might > >> become quite big, since Postgres can create some "bloat" under our > >> insert/delete pattern. > >> > >> device_skip is just used to not poll unreachable devices over and over > >> again, there is no important data in there. So if in doubt, > >> > >> delete from device_skip; > >> vacuum analyze device_skip; > >> reindex table device skip; > >> > >> should allow for a fresh start. > >> > >> There are also the max_deferrals and retry_after options to control the > >> skip behaviour. I don't think it will affect the table size much > though. > >> https://github.com/netdisco/netdisco/wiki/Configuration#workers > > <https://github.com/netdisco/netdisco/wiki/Configuration#workers> > >> <https://github.com/netdisco/netdisco/wiki/Configuration#workers > > <https://github.com/netdisco/netdisco/wiki/Configuration#workers>> > >> > >> If you're getting these issues regularly I'd definitely experiment with > >> the Postgres memory settings a bit, starting at work_mem. > >> > >> Cheers > >> Christian > >> > >> > >>> > >>> Both of these in the netdisco-backend.log were referring to items > in the > >>> “device_skip”, I looked through lots of logged data and found when it > >>> started not working. > >>> > >>> DETAIL: Failed on request of size 284 in memory context > >>> "CacheMemoryContext". [for Statement "SELECT me.backend, me.device, > >>> me.actionset, me.deferrals, me.last_defer FROM device_skip me > WHERE ( ( > >>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\', > >>> 2=\'10.1.1.1\'] at > >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 261 > >>> > >>> '}, 'DBIx::Class::Exception' ) > >>> > >>> [18851] 2022-01-11 01:30:43 error bless( {'msg' => > >>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st > >>> execute failed: ERROR: out of memory > >>> > >>> DETAIL: Failed on request of size 8344 in memory context > >>> "MessageContext". [for Statement "SELECT me.backend, me.device, > >>> me.actionset, me.deferrals, me.last_defer FROM device_skip me > WHERE ( ( > >>> me.backend = ? AND me.device = ? ) )" with ParamValues: 1=\'server\', > >>> 2=\10.1.1.2\'] at > >>> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 261 > >>> > >>> '}, 'DBIx::Class::Exception' ) > >>> > >>> *From: *alcatron <alc...@gm...> > >>> *Date: *Wednesday, 19 January 2022 at 10:14 pm > >>> *To: *Christian Ramseyer <ram...@ne...>, > >>> net...@li... > <net...@li...> > >>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly > stopped > >>> working > >>> > >>> Hi Christian, thankyou for the tips. > >>> > >>> I found what the problem is, it was crashing and not going past a > >>> certain object in the “device_skip” table in the database. > >>> > >>> I truncated that field in psql, and let it re-populate and that fixed > >>> the automatic discovery and arpnip/macsuck etc. > >>> > >>> I have found after a while perhaps 2-3 months something happens in > the > >>> “device_skip” table and halts these processes then I need to clear > it to > >>> make it work again. I remember I had this similar issue a few months > >>> back, then I remembered what I did. > >>> > >>> Muris > >>> > >>> *From: *Christian Ramseyer <ram...@ne...> > >>> *Date: *Tuesday, 18 January 2022 at 12:20 pm > >>> *To: *alcatron <alc...@gm...>, > >>> net...@li... > <net...@li...> > >>> *Subject: *Re: [Netdisco] Netdisco auto discovery tasks suddenly > stopped > >>> working > >>> > >>> Hi > >>> > >>> > could not connect to > >>> > server: No such file or directory/ > >>> > >>> This would be very concerning, meaning that Postgres is not running at > >>> all. But since you seem to have the web frontend running that is > >>> probably not the case currently, so I wouldn't worry too much. > Might be > >>> an old log entry. > >>> > >>> > >>> > Failed on request of size 16 in memory context > >>> > "MessageContext". > >>> > >>> That on the other hand might be the issue. Postgres uses all kinds of > >>> memory parameters, if one of them is too small the total GB of RAM > >>> sticks in the server don't matter much. > >>> > >>> I had various issues with huge and clogged up discovery queues > over the > >>> years, as a first measure I'd try to: > >>> > >>> stop netdisco-backend > >>> restart Postgres, connect to the database with "netdisco-do psql" > and in > >>> there run a "delete from admin;". > >>> for good measure, also run "reindex table admin;" > >>> restart netdisco-backend > >>> > >>> This sounds dangerous but admin is in fact just the queue of > actions to > >>> be done, so no important data will be lost. > >>> > >>> Also a "select count(*) from admin" first might be interesting, to see > >>> how many rows are in there. If it's an absurdly high number (millions) > >>> you can run e.g. "create table admin_backup as select * from > admin;" for > >>> analysis later. > >>> > >>> If you're still getting the memory errors afterwards and it still > >>> doesn't work, I'd try to configure the memory parameters with this > >>> assistant, using the "online transaction processing" db type. > >>> https://pgtune.leopard.in.ua/#/about > <https://pgtune.leopard.in.ua/#/about> > >> <https://pgtune.leopard.in.ua/#/about > > <https://pgtune.leopard.in.ua/#/about>> > >> <https://pgtune.leopard.in.ua/#/about > >> <https://pgtune.leopard.in.ua/#/about > > <https://pgtune.leopard.in.ua/#/about>>> > >>> > >>> > >>> Cheers > >>> Christian > >>> > >>> > >>> > >>> On 17.01.22 22:03, alcatron wrote: > >>>> Hi all, just wanting to ask your thoughts on what could be causing > >>>> netdisco to suddenly stop performing auto discovery tasks. > >>>> > >>>> Seems only arpnip is working via scheduled tasks, but > discovery/macsuck > >>>> has halted to auto perform. If I go manually to the device on web > >>>> interface and trigger the auto discovery/arpnip/macsuck it works > fine on > >>>> the device. > >>>> > >>>> Nothing has changed on system, running for a few months now, and > >>>> suddenly the auto discovery is broken partly. > >>>> > >>>> If I go to the backend log I see error like this below. The > server is > >>>> running and operational as I can still perform the manual to get > >>>> discovery etc > >>>> > >>>> The server is not out of memory as it has like 16GB and still plenty > >>>> unused not what the messages are indicating.. > >>>> > >>>> Thanks for any assistance 😊 > >>>> > >>>> /DBIx::Class::Schema::Versioned::_on_connect(): Your DB is currently > >>>> unversioned. Please call upgrade on your schema to sync the DB. at > >>>> /home/netdisco/perl5/lib/perl5/DBICx/Sugar.pm line 121/ > >>>> > >>>> /DBIx::Class::Storage::DBI::catch {...} (): DBI Connection > failed: DBI > >>>> connect('dbname=netdisco','netdisco',...) failed: could not > connect to > >>>> server: No such file or directory/ > >>>> > >>>> / Is the server running locally and accepting/ > >>>> > >>>> / connections on Unix domain socket > >>>> "/var/run/postgresql/.s.PGSQL.5432"? at > >>>> /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line > 1639. at > >>>> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 50/ > >>>> > >>>> // > >>>> > >>>> /[25756] error bless( {'msg' => > >>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st > >>>> execute failed: ERROR: out of memory/ > >>>> > >>>> /DETAIL: Failed on request of size 16 in memory context > >>>> "MessageContext". [for Statement "SELECT me.job, me.entered, > me.started, > >>>> me.finished, me.device, me.port, me.action, me.subaction, me.status, > >>>> me.username, me.userip, me.log, me.debug, me.device_key FROM > admin me > >>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: > 1=\'186421742\'] at > >>>> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 267/ > >>>> > >>>> /'}, 'DBIx::Class::Exception' )/ > >>>> > >>>> /[25781] 2022-01-11 01:33:53 error bless( {'msg' => > >>>> 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: > DBD::Pg::st > >>>> execute failed: ERROR: out of memory/ > >>>> > >>>> /DETAIL: Failed on request of size 16 in memory context > >>>> "MessageContext". [for Statement "SELECT me.job, me.entered, > me.started, > >>>> me.finished, me.device, me.port, me.action, me.subaction, me.status, > >>>> me.username, me.userip, me.log, me.debug, me.device_key FROM > admin me > >>>> WHERE ( me.job = ? ) FOR UPDATE" with ParamValues: > 1=\'186420514\'] at > >>>> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm > line 267/ > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Netdisco mailing list > >>>> net...@li... > >>>> https://sourceforge.net/p/netdisco/mailman/netdisco-users/ > > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/> > >> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ > > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>> > >>> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ > >> <https://sourceforge.net/p/netdisco/mailman/netdisco-users/ > > <https://sourceforge.net/p/netdisco/mailman/netdisco-users/>>> > >>> > >>> -- > >>> Christian Ramseyer, netnea ag > >>> Network Management. Security. OpenSource. > >>> https://www.netnea.com <https://www.netnea.com> > <https://www.netnea.com > > <https://www.netnea.com>> <https://www.netnea.com > >> <https://www.netnea.com <https://www.netnea.com>>> > >>> Phone: +41 79 644 77 64 > >>> > >> > >> -- > >> Christian Ramseyer, netnea ag > >> Network Management. Security. OpenSource. > >> https://www.netnea.com <https://www.netnea.com> > <https://www.netnea.com > > <https://www.netnea.com>> > >> Phone: +41 79 644 77 64 > >> > > > > -- > > Christian Ramseyer, netnea ag > > Network Management. Security. OpenSource. > > https://www.netnea.com <https://www.netnea.com> > > Phone: +41 79 644 77 64 > > > > -- > Christian Ramseyer, netnea ag > Network Management. Security. OpenSource. > https://www.netnea.com > Phone: +41 79 644 77 64 > -- Christian Ramseyer, netnea ag Network Management. Security. OpenSource. https://www.netnea.com Phone: +41 79 644 77 64 |