From: Yi Lu <yl...@ra...> - 2011-01-25 18:37:15
|
(***My previous post has been pending, not sure if it is accessible by all. Now I'm posting it again.*** I can provide the database if anyone want to repeat the test) Hello everyone, We ran into an issue with 32-bit FB 2.5 Final Release on 64-bit machine. When the problem occurs, task manager shows fbserver.exe process creates 500~800 threads. Firebird.log will have a chunk of messages saying: XXXX (Server) Fri Jan 21 18:13:28 2011 Operating system call _beginthreadex failed. Error code 8 Oftentimes this is followed by a fbserver crash where fbserver.exe is hung and consumes 0% CPU. Sometimes this message can be found in the firebird.log: XXXX (Server) Thu Jan 21 12:15:20 2010 unable to allocate memory from operating system in spite that the task manager doesn't indicate any memory shortage. We simplified the test case to a command-line, test program with multi-threads and each running a single query. With this tool we tested against servers running various OS and FB versions, added number of test tool threads and kept close watch on number of thread fbserver.exe uses. (In all tests mentioned, the page buffer of the test DB is 65535, firebird was running in super server mode and test programs have about 300 threads in total.) The test results are as follows: 1. Windows server 2008 R2 (64-bit), Firebird 2.1, 8G RAM No crash. maximum number of thread fbserver.exe takes is 362. 2. Windows server 2008 R2 (64-bit), Firebird 2.5 32-bit, 8G RAM Crashed in 10 min. Maximum number of thread fbserver.exe takes is over 500. Error message call _beginthreadex failed can be found in firebird.log 3.Windows server 2008 R2 (64-bit), Firebird 2.5 64-bit, 8G RAM No crash. Maximum number of thread fbserver.exe takes is 820. 4. Windows server 2003(32-bit) SP1, Firebird 2.5 32-bit, 2G RAM No crash. Maximum number of thread fbserver.exe takes is 407. Error message call _beginthreadex failed can be found in firebird.log 5. Windows server 2003(64-bit) SP1, Firebird, 4G RAM Crashed. Maximum number of thread observed is 471. Error message call _beginthreadex failed can be found in firebird.log It appears that the crash always occur when FB2.5 32-bit is running on 64-bit OS. Also we observed that on 64-bit system, FB2.5 32-bit was trying to create a lot more thread than FB2.1 did in the same situation, which we suspect exceeded the limit of thread numbers of the OS and caused the shortage of memory or other system resources. We notice that it is much more likely to crash when 32-bit fbserver.exe has more than 500 threads. Has any one experienced this issue? Any ideas on this problem? Yi Lu -- View this message in context: http://firebird.1100200.n4.nabble.com/32-bit-Firebird-attempting-to-create-large-amount-of-thread-on-64-bit-server-tp3236747p3236747.html Sent from the firebird-devel mailing list archive at Nabble.com. |
From: Yi Lu <yl...@ra...> - 2011-01-24 17:17:49
|
Hello everyone, We ran into an issue with 32-bit FB 2.5 Final Release on 64-bit machine. When the problem occurs, task manager shows fbserver.exe process creates 500~800 threads. Firebird.log will have a chunk of messages saying: XXXX (Server) Fri Jan 21 18:13:28 2011 Operating system call _beginthreadex failed. Error code 8 Oftentimes this is followed by a fbserver crash where fbserver.exe is hung and consumes 0% CPU. Sometimes this message can be found in the firebird.log: XXXX (Server) Thu Jan 21 12:15:20 2010 unable to allocate memory from operating system in spite that the task manager doesn't indicate any memory shortage. We simplified the test case to a command-line, test program with multi-threads and each running a single query. With this tool we tested against servers running various OS and FB versions, added number of test tool threads and kept close watch on number of thread fbserver.exe uses. (In all tests mentioned, the page buffer of the test DB is 65535, firebird was running in super server mode and test programs have about 300 threads in total.) The test results are as follows: 1. Windows server 2008 R2 (64-bit), Firebird 2.1, 8G RAM No crash. maximum number of thread fbserver.exe takes is 362. 2. Windows server 2008 R2 (64-bit), Firebird 2.5 32-bit, 8G RAM Crashed in 10 min. Maximum number of thread fbserver.exe takes is over 500. 3.Windows server 2008 R2 (64-bit), Firebird 2.5 64-bit, 8G RAM No crash. Maximum number of thread fbserver.exe takes is 820. 4. Windows server 2003(32-bit) SP1, Firebird 2.5 32-bit, 2G RAM No crash. Maximum number of thread fbserver.exe takes is 407. 5. Windows server 2003(64-bit) SP1, Firebird, 4G RAM Crashed. Maximum number of thread observed is 471. It appears that the crash always occur when FB2.5 32-bit is running on 64-bit OS. Also we observed that on 64-bit system, FB2.5 32-bit was trying to create a lot more thread than FB2.1 did in the same situation, which we suspect exceeded the limit of thread numbers of the OS and caused the shortage of memory or other system resources. We notice that it is much more likely to crash when 32-bit fbserver.exe has more than 500 threads. Has any one experienced this issue? Any ideas on this problem? -- View this message in context: http://firebird.1100200.n4.nabble.com/32-bit-Firebird-attempting-to-create-large-amount-of-thread-on-64-bit-server-tp3234449p3234449.html Sent from the firebird-devel mailing list archive at Nabble.com. |
From: Dimitry S. <sd...@ib...> - 2011-01-25 18:47:24
|
25.01.2011 19:37, Yi Lu wrote: > 32-bit FB 2.5 Final Release on 64-bit machine. Why? What prevents you from using Firebird 64 bits. > XXXX (Server) Thu Jan 21 12:15:20 2010 > unable to allocate memory from operating system > > in spite that the task manager doesn't indicate any memory shortage. RTFM J.Richter about Windows memory model. -- SY, SD. |
From: Yi Lu <yl...@ra...> - 2011-01-25 21:26:05
|
Firebird 2.5 64-bit doesn't have any problem. The problem with creation of lots of threads also occurs on Windows 2003 32-bit server, but less often. The crash is easiest to observe on Windows 2008 R2 and 2003 x64. As Dmitri Yemanov has mentioned, the real problem is why is Firebird 2.5 creating a lot more threads than Firebird 2.1? Because of this problem, we have had to revert a few customers to Firebird 2.1 where they only have a 32-bit OS. -- View this message in context: http://firebird.1100200.n4.nabble.com/32-bit-Firebird-attempting-to-create-large-amount-of-thread-on-64-bit-server-tp3236747p3237053.html Sent from the firebird-devel mailing list archive at Nabble.com. |
From: Dmitry Y. <fir...@ya...> - 2011-01-25 18:58:29
|
25.01.2011 21:37, Yi Lu wrote: > > It appears that the crash always occur when FB2.5 32-bit is running on > 64-bit OS. Also we observed that on 64-bit system, FB2.5 32-bit was trying > to create a lot more thread than FB2.1 did in the same situation, which we > suspect exceeded the limit of thread numbers of the OS and caused the > shortage of memory or other system resources. We notice that it is much more > likely to crash when 32-bit fbserver.exe has more than 500 threads. 500 threads * 2MB of stack = 1GB. Add the buffer cache etc and you get pretty close to the address space limit for 32-bit processes. Error "_beginthreadex failed" with error code 8 (ERROR_NOT_ENOUGH_MEMORY) confirms that. And this is pretty much expected. The question is why v2.5 SS creates much more threads than v2.1. Dmitry |
From: Alex P. <pes...@ma...> - 2011-01-26 07:28:24
|
On 01/25/11 21:58, Dmitry Yemanov wrote: > 25.01.2011 21:37, Yi Lu wrote: >> It appears that the crash always occur when FB2.5 32-bit is running on >> 64-bit OS. Also we observed that on 64-bit system, FB2.5 32-bit was trying >> to create a lot more thread than FB2.1 did in the same situation, which we >> suspect exceeded the limit of thread numbers of the OS and caused the >> shortage of memory or other system resources. We notice that it is much more >> likely to crash when 32-bit fbserver.exe has more than 500 threads. > 500 threads * 2MB of stack = 1GB. Add the buffer cache etc and you get > pretty close to the address space limit for 32-bit processes. Error > "_beginthreadex failed" with error code 8 (ERROR_NOT_ENOUGH_MEMORY) > confirms that. And this is pretty much expected. > > The question is why v2.5 SS creates much more threads than v2.1. If people access more than 1 database, this is more or less obvious. |
From: Yi Lu <yl...@ra...> - 2011-01-26 15:08:29
|
In our setting, all clients are hitting one database on the server.. -- View this message in context: http://firebird.1100200.n4.nabble.com/32-bit-Firebird-attempting-to-create-large-amount-of-thread-on-64-bit-server-tp3236747p3238229.html Sent from the firebird-devel mailing list archive at Nabble.com. |
From: Dimitry S. <sd...@ib...> - 2011-01-26 15:21:16
|
26.01.2011 16:08, Yi Lu wrote: > In our setting, all clients are hitting one database on the server.. How many of them are and which protocol do they use to connect the server? -- SY, SD. |
From: Yi Lu <yl...@ra...> - 2011-01-26 15:33:06
|
3 clients, each with 99 threads and each client thread sending a query to server. Firebird 2.5 64-bit also creates a lot of threads (sometimes as many as 800), but never leads to crash or "call _beginthreadex failed" error in the log. When the same load is applied to FB 2.5 32-bit and the number of fbserver.exe threads increases to more than 500, it is very likely to crash. -- View this message in context: http://firebird.1100200.n4.nabble.com/32-bit-Firebird-attempting-to-create-large-amount-of-thread-on-64-bit-server-tp3236747p3238273.html Sent from the firebird-devel mailing list archive at Nabble.com. |
From: Dmitry Y. <fir...@ya...> - 2011-01-26 22:37:38
|
26.01.2011 18:32, Yi Lu wrote: > > 3 clients, each with 99 threads and each client thread sending a query to > server. So we could expect that there are no more than 300 simultaneous queries running, and total number of server threads should be a tiny bit more. So far it looks like the server cannot reuse the inactive thread found in the pool and thus keep creating new ones. Looking at Worker::wait() it seems that if it won't be woken up during the timeout (1 min), then the thread pool may get totally blocked, as both waiting and signaling are protected by the same mutex. I hope Vlad would comment on it. > Firebird 2.5 64-bit also creates a lot of threads (sometimes as many as > 800), but never leads to crash or "call _beginthreadex failed" error in the > log. > > When the same load is applied to FB 2.5 32-bit and the number of > fbserver.exe threads increases to more than 500, it is very likely to crash. No surprise in both cases, it has been already explained. Dmitry |
From: Yi Lu <yl...@ra...> - 2011-01-26 23:27:10
|
The crash is understandable given the number of threads created. Our only concern is that there is a clear behavior change between Firebird 2.5 32-bit and Firebird 2.1 32-bit as Firebird 2.1 would not crash under the same circumstances. -- View this message in context: http://firebird.1100200.n4.nabble.com/32-bit-Firebird-attempting-to-create-large-amount-of-thread-on-64-bit-server-tp3236747p3239022.html Sent from the firebird-devel mailing list archive at Nabble.com. |
From: Dimitry S. <sd...@ib...> - 2011-01-27 07:55:51
|
26.01.2011 17:15, Dmitry Yemanov wrote: > So we could expect that there are no more than 300 simultaneous queries > running, and total number of server threads should be a tiny bit more. IIRC, number of working threads was limited by 20 (or something like that) on 2.5RC time. Was this limit completely removed? -- SY, SD. |
From: Alex P. <pes...@ma...> - 2011-01-27 08:03:00
|
On 01/27/11 10:54, Dimitry Sibiryakov wrote: > 26.01.2011 17:15, Dmitry Yemanov wrote: >> So we could expect that there are no more than 300 simultaneous queries >> running, and total number of server threads should be a tiny bit more. > IIRC, number of working threads was limited by 20 (or something like that) on 2.5RC > time. Was this limit completely removed? Yes, it caused regressions in some cases. |
From: Dimitry S. <sd...@ib...> - 2011-01-27 08:09:15
|
27.01.2011 9:02, Alex Peshkoff wrote: > On 01/27/11 10:54, Dimitry Sibiryakov wrote: >> IIRC, number of working threads was limited by 20 (or something like that) on 2.5RC >> time. Was this limit completely removed? > > Yes, it caused regressions in some cases. I thought that it was just increased. For superarchitectures 200-300 would sound as a reasonable threshold and, probably, could prevent the crashes. -- SY, SD. |
From: Dmitry Y. <fir...@ya...> - 2011-01-27 08:14:35
|
27.01.2011 11:02, Alex Peshkoff пишет: >> IIRC, number of working threads was limited by 20 (or something like that) on 2.5RC >> time. Was this limit completely removed? > > Yes, it caused regressions in some cases. And the limit was 127, not 20 ;-) Dmitry |
From: Vlad K. <hv...@us...> - 2011-01-27 08:49:12
|
>> 3 clients, each with 99 threads and each client thread sending a query to >> server. > > So we could expect that there are no more than 300 simultaneous queries > running, and total number of server threads should be a tiny bit more. At least this is how it should work. > So far it looks like the server cannot reuse the inactive thread found > in the pool and thus keep creating new ones. Looking at Worker::wait() > it seems that if it won't be woken up during the timeout (1 min), then > the thread pool may get totally blocked, as both waiting and signaling > are protected by the same mutex. I hope Vlad would comment on it. Sorry, i failed to see how you made such conclusion. bool Worker::wait(int timeout) { // here we don't hold any global locks // wait 60 sec for request to work on it if (m_sem.tryEnter(timeout)) // here this thread will be reused return true; // wait timed out, remove thread from idle list and destroy it // acquire common mutex to modify common idle list Firebird::MutexLockGuard guard(m_mutex); // last quick check if we was awaken if (m_sem.tryEnter(0)) return true; // remove thread from idle list remove(); // destroy it return false; } bool Worker::wakeUp() { // acquire common mutex to check idle list Firebird::MutexLockGuard guard(m_mutex); if (m_idleWorkers) { // wake up idle thread Worker* idle = m_idleWorkers; idle->setState(true); idle->m_sem.release(); return true; } // no idle threads, create one more if allowed return (m_cntAll >= MAX_THREADS); } You see - waiting is not protected by the common m_mutex and deadlock is not possible. Regards, Vlad PS I can't help without reproducible example |
From: Dimitry S. <sd...@ib...> - 2011-01-27 09:46:15
|
27.01.2011 9:49, Vlad Khorsun wrote: > // wait 60 sec for request to work on it Errr... IMHO, this timeout could be bigger... Were there reasons behind this value? -- SY, SD. |
From: Alex P. <pes...@ma...> - 2011-01-27 09:56:20
|
On 01/27/11 12:45, Dimitry Sibiryakov wrote: > 27.01.2011 9:49, Vlad Khorsun wrote: > >> // wait 60 sec for request to work on it > Errr... IMHO, this timeout could be bigger... Were there reasons behind this value? That's original IB timeout to close unneeded worker thread. |
From: Vlad K. <hv...@us...> - 2011-01-27 10:01:39
|
> 27.01.2011 9:49, Vlad Khorsun wrote: > >> // wait 60 sec for request to work on it > > Errr... IMHO, this timeout could be bigger... Were there reasons behind this value? This is timeout for idle worker thread. Value not changed since IB6. What the reason to change it ? Note, larger value could lead to longer consume of unneeded resources. Lower value could lead to often thread's creation\deletion. Regards, Vlad |
From: Dimitry S. <sd...@ib...> - 2011-01-27 10:29:17
|
27.01.2011 11:01, Vlad Khorsun wrote: > What the reason to change it ? Note, larger value could lead to longer consume > of unneeded resources. Lower value could lead to often thread's creation\deletion. Exactly to make threads' creation/deletion seldom. Sleeping thread consumes no resources. I would think about timeout about 10 minutes... -- SY, SD. |
From: Vlad K. <hv...@us...> - 2011-01-27 10:38:43
|
>> What the reason to change it ? Note, larger value could lead to longer consume >> of unneeded resources. Lower value could lead to often thread's creation\deletion. > > Exactly to make threads' creation/deletion seldom. Sleeping thread consumes no resources. Really ? Even no virtual memory for stack, for example ? > I would think about timeout about 10 minutes... Why 10 and not 15 or 33 or 42 (c) ? Regards, Vlad |
From: Dimitry S. <sd...@ib...> - 2011-01-27 10:45:24
|
27.01.2011 11:38, Vlad Khorsun wrote: >>> What the reason to change it ? Note, larger value could lead to longer consume >>> of unneeded resources. Lower value could lead to often thread's creation\deletion. >> >> Exactly to make threads' creation/deletion seldom. Sleeping thread consumes no resources. > > Really ? Even no virtual memory for stack, for example ? Yep. Other threads have own stack and can't profit from extra megabyte. >> I would think about timeout about 10 minutes... > > Why 10 and not 15 or 33 or 42 (c) ? Because of the blue sky. I agree that 15 is better than 10. May be 42 can be better than 15, I don't know. -- SY, SD. |
From: Dmitry Y. <fir...@ya...> - 2011-01-27 10:40:22
|
27.01.2011 13:28, Dimitry Sibiryakov wrote: > Exactly to make threads' creation/deletion seldom. Sleeping thread consumes no resources. Doesn't 2MB of stack count, especially when you have hundreds of them? Dmitry |
From: Alex P. <pes...@ma...> - 2011-01-27 10:54:27
|
On 01/27/11 13:40, Dmitry Yemanov wrote: > 27.01.2011 13:28, Dimitry Sibiryakov wrote: > >> Exactly to make threads' creation/deletion seldom. Sleeping thread consumes no resources. > Doesn't 2MB of stack count, especially when you have hundreds of them? Certainly it's virtual space, but anyway - taking into an account that CPUs became a bit faster since IB6, it's reasonable to think about making timeout smaller. |
From: Vlad K. <hv...@us...> - 2011-01-27 11:02:50
|
> Certainly it's virtual space, but anyway - taking into an account that > CPUs became a bit faster since IB6, it's reasonable to think about > making timeout smaller. I'd said timeout should be tuned on the fly dependent on real load. Anyway i don't see this as "issue" which requires ugrent fix. Regards, Vlad |