From: <fri...@ad...> - 2001-06-12 00:50:30
|
Hi Firebird crew, for the last days, there was a discussion on the Mers Interbase list about a problem which seems to occur after doing some DB backups: the IBSERVER service (Windows) uses 100% CPU time and slows down the whole server so that DB and file operations are about 100 times slower than usually. This behaviour does not stop until the service is restarted. For a detailed explanation please read on. Of course, this is not some kind of 'horror bug' destroying data ( _if_ it is a bug), but it seriously affects the usability of IB/FB. It completely destroys the 'embed-deploy-relax' situation :-( I thought it was a problem due to my configuration until I found some other people who also ran into this problem. I know that Rob Schieck, the 'leader of the IB community' has thrown out some very competent people from the mers list, so I am repeating the full thread for you. At the Mers list, we didn't really find an answer - so I wonder what the gurus of this list might say. The following text is quite long, but not as long as many other mails in this list and in addition to that, its easier to understand ;-) Please excuse if this is not the right place for this e-mail but I know that this list is very frequently used by many experts. I hope we will find a solution ... TIA, Christian Christian wrote: ----------------------------------------------------------------- Hi, we are using InterBase/x86/Windows NT Version "WI-V6.0.0.627" on a NT Server (PII/400, 256 MB RAM) with about 5-15 Clients connected. Since December, it ran fine, but two weeks ago the whole server appeared to respond very slowly. Looking closer at it, I saw that the "ibserver.exe" was using 100% of the cpu since about 30 minutes. I kicked out the users, shutdown the database and restarted it and everything was fine. Yesterday, I had to reboot the server and today was the second time IB was behaving like described above. The DB has a size of 10 MB, Forced Writes are on, there are no consistency errors and no errors in the server log (except some "INET/inet_error: read errno = 10054", which I suppose is quite normal). Of course, this is very annoying because a service behaving like this on a server causes a lot of havoc. So I wonder if someone could give me a hint what could be wrong? TIA, Christian Thomas wrote: -------------------------------------------------------------------- Possibly you haven't enough disk space on the partition where the database file resides or on the partition where Interbase creates temp files. Christian wrote: ------------------------------------------------------------------- Thomas, thanks for your answer. There are about 400 MB of free diskspace. This should be enough for a 10 MB database, shouldn't it? But anyway, I will free up about 2 GB of disk space to be secure. Any other objections? Ded wrote: --------------------------------------------------------------------- -- Hi, Christian. 1. When you last time restored database from backup? 2. As far as I know, IB can be overloaded by: 2.1. Inaccurate select (missed join condition and so on). 2.2. Garbage collection after mass delete. 2.3. Sweep automatically started in inconvenient time. I usually turn auto-sweep off and make it when no users are working. 3. Work on slightly corrupted database. Classic IB servers much better keeps overload, but only Superserver architecture is available for Win. Best regards. Thomas wrote: ------------------------------------------------------------------- But on a database with only 10MB? Thomas Ded wrote: --------------------------------------------------------------------- -- Really. :) I forgot when I last time saw such one. Best regards. Tobias wrote: --------------------------------------------------------------------- can still have thousands of rows with which IB obviously got entangled somehow Tobias wrote: --------------------------------------------------------------------- Hi! have you tried a backup and restore? Christian wrote: ------------------------------------------------------------------- I backup the database regularly, but I never restored it. Maybe it's the time for a restore ... Christian Christian answered to Ded: -------------------------------------------------------- Ded, > 1. When you last time restored database from backup? I never did since December 2000. Perhaps I should do it. > 2. As far as I know, IB can be overloaded by: > 2.1. Inaccurate select (missed join condition and so on). This did not happen, I'm sure. I have only got very simple queries. > 2.2. Garbage collection after mass delete. There were no mass deletes. > 2.3. Sweep automatically started in inconvenient time. > I usually turn auto-sweep off and make it when no users are > working. The sweep interval is 20000. But I don't think the sweep problem occurs with such small DBs (10 MB). In addition to that, there are no tables larger than 7.000 records. > 3. Work on slightly corrupted database. The database check tells me everything is OK. Backup works, too. I will check if a DB restore reveals some error... > Classic IB servers much better keeps overload, but only Superserver > architecture is available for Win. Hmmm ... Thank you for the response. I will try a restore. Perhaps an Interbase guru (like Ann Harrison) is reading this and will bring forward a new, unconsidered point (as she often does) ... Louis wrote: --------------------------------------------------------------------- -- I frequently see 100% utilization when the query plan that IB comes up with isn't quite the best plan. Sometimes the IB-generated plan is wrong due to index statistics being incorrect. Index statistics can be corrected by restoring a backup or by using ALTER INDEX idxName INACTIVE; ALTER INDEX idxName ACTIVE; Usually this kind of thing is present from the beginning, but maybe your data has just grown to the point where a bad plan is causing it to thrash. Good luck. Louis Kleiman SSTMS, Inc. Ded wrote: --------------------------------------------------------------------- ---- Hi Louis. Smallest database I ever had deal was 170Mb and it was empty, so I can't shurely speak here, BUT ON 10Mb??? It should be in cash entirely and any bad query should'nt overload server for significant time, am I wrong? Best regards. Thomas wrote: --------------------------------------------------------------------- Christian, I've read about Interbase locations running years without doing a backup/restore. Which data access components do you use? BDE/FIB+/IBX/IBO ...? Have a look on the Oldest Active (OAT), Oldest Interesting (OIT) and Next Transaction. You can get these information with IBConsole. Are there big gaps between OAT/Next or OIT/Next? Ded replied: ------------------------------------------------------------------- ---- Thomas, he have auto-sweep on 20000 and I doubt one of his users don't turn off computer for weeks. Somewhat mysterious... [Annotation by me: that's right. Nobody of my users keeps the computer turned on for more than two days ...] Christian replied: ------------------------------------------------------------------- Hi Thomas, yesterday I restored the DB, turned off autosweep. Up till now no 100% CPU usage ... > I've read about Interbase locations running years without doing a > backup/restore. Which data access components do you use? > BDE/FIB+/IBX/IBO > ...? I use IBO, and for 6 months, it rocked. > Have a look on the Oldest Active (OAT), Oldest Interesting > (OIT) and Next > Transaction. You can get these information with IBConsole. > Are there big > gaps between OAT/Next or OIT/Next? There are generally no big gaps (thanks to IBO). E.g. today: Oldest transaction 5892 Oldest active 5893 Oldest snapshot 5892 Next transaction 5939 Generally, the gap is always smaller then 1000. Christian JAC2 wrote: --------------------------------------------------------------------- -- New databases that have never been restored with data are likely to have very poor stats on indices, leading to poor query plans. 100% normally comes about through large deletes (but you said this is not possible) or queries that blow the flacky query optimiser into touch. Q's How many users Does EVERYONE access the DB through applications that you control the SQL for (e.g. some "power-user" with a copy of Crystal reports killing the server with home grown queries)? What difference has the Restore made? Extract some of the queries that are running that include multi-table joins, execute them in Wisql / Marathon / IBExpert & see what the plans look like. We have a large (6GB) DB with lots of users, I track the time it takes to locate records for some of the main tables, log it & display it in a graph. We have over 50,000,000 lines of audit data. We have found that by backing up and restoring the DB once in a while really improves perfomance (due to old stuck transactions being left in the database). Restarts do help, but if the DB has grown from just meta-data, it needs a backup & restore. Auditing the SQL people execute is a must to solve these problems, I'm sure it will make its way in FB one day. I have seen small db's hang for 15 minutes with >5 table joins that look like they should work fine. Fabrice wrote: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!! Hi, We are currently developing an application which also provides a user interface for backup/restore. (Firebird 0.9.4xxx / Win2000) The test DB is smaller (about 1,5Mb), rebuilt from scratch 10 times a day (test phase). However, when launching the backup process (a .bat file doing a gbak -b), the IBSERVER.EXE very often (say, once every 10 tries) uses 100% CPU. I then wait for many minutes, nothing happens, the only thing I can do is to stop the IB service. I have no idea what is happenning. For now it is not my main problem, but it will surely become in a few weeks... Christian replied: ------------------------------------------------------------------- - Hi, Fabrice, do you mean the IBSERVER.EXE is at 100% when backing up and backing up never finishes? Or has backup finished and IBSERVER.EXE still uses 100% CPU? (This is at least how it seems to happen on my server ...) Christian Fabrice replied: ------------------------------------------------------------------- - Christian, Yes, backup has finished and IBSERVER.EXE still uses 100% CPU. And then, there is nothing to do but stop IB services and restart it. Fabrice > Ded replied: ------------------------------------------------------------------- ----- Hi, Fabrice. All this sounds like SuperServer bug. I leaved SS namely for problems of this kind, but it was Linux and half year ago. Poor Windows users, you can't go to Classic... :) Best regards. Martin wrote: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!! Hi, I recognize that same behaviour from when I was testing our backup/restore routine some time ago. When running backup and restore several times it behaved like Fabricio says: "(say, once every 10 tries) uses 100% CPU". That was with IB 5.6, and small test databases (only a couple of megs). Can someone confirm if this is a known bug, and if so, what can be done to prevent it from happening? I'm one of the 'Poor Windows users'... ;-( Thanks in advance, Martin Christian wrote: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!! Now, with a fresh, restored 9 GB DB without errors and autosweep turned off, the error occured again after the server was rebooted. Could it have to do with the order the NT services are starting. E.g. the IBSERVER.EXE starts too early before another service it (slightly) depends on is started? And so the Server gets confused and CPU usage is at 100%? This time, it occured with only two users online and these users had only executed one query which returns records from a table that consists of about 25 records. So it cannot be a bad plan or something like that ... Any opinions? Tobias wrote: --------------------------------------------------------------------- -- Hi! maybe it would make a difference if IB is not run as a server, but as an application? I guess it's worth a try ... [to be continued... hopefully ...] |