From: Damyan I. <dm...@de...> - 2011-11-14 19:15:47
|
This came from the test suite of the Perl Firebird driver, DBD::Firebird, but I am able to reproduce it with plain isql, so the problem seems to be somewhere else -- either Firebird or libc. Firebird is 2.5.1.26351 from the Debian package, running on unstable. Embedded server. The rest of DBD::Firebird test suite (not involving parallel connections) runs fine. On other architecture the whole test suite completes without errors. At first I thought that the problem is in atomic-ops, whose substitution for hardware-based atomic operations using semaphores is buggy, but one of the failing architectures is ia64, and looking at the atomic-ops sources it seems there is a native implementation for ia64 :/ What is observed: # session 1 $ FIREBIRD=. FIREBIRD_LOCK=. isql-fb dbd-firebird-test.fdb Database: dbd-firebird-test.fdb SQL> # session 2, from another terminal, running in the same working # directory FIREBIRD=. FIREBIRD_LOCK=. isql-fb dbd-firebird-test.fdb # hangs Attaching gdb to the second session's isql and requesting a backtrace gives the following: Thread 4 (Thread 0x42333450 (LWP 3876)): #0 0x4065bc6c in __new_sem_wait (sem=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60 #1 0x40330444 in Firebird::SignalSafeSemaphore::enter (this=0x400341f8) at ../src/common/classes/semaphore.cpp:95 #2 0x4007d51c in (anonymous namespace)::shutdownThread () at ../src/jrd/why.cpp:933 #3 0x40078f20 in run (this=<synthetic pointer>) at ../src/jrd/ThreadStart.cpp:128 #4 (anonymous namespace)::threadStart (arg=<optimized out>) at ../src/jrd/ThreadStart.cpp:139 #5 0x40653ac0 in start_thread (arg=<optimized out>) at pthread_create.c:306 #6 0x4073602c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:113 from /lib/arm-linux-gnueabi/libc.so.6 #7 0x4073602c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:113 from /lib/arm-linux-gnueabi/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 3 (Thread 0x42e03450 (LWP 3877)): #0 0x4065bf1c in sem_timedwait (sem=0x425e7078, abstime=0x42e02d78) at ../nptl/sysdeps/unix/sysv/linux/sem_timedwait.c:88 #1 0x40330520 in Firebird::SignalSafeSemaphore::tryEnter (this=0x425e7078, seconds=<optimized out>, milliseconds=<optimized out>) at ../src/common/classes/semaphore.cpp:132 #2 0x40240948 in Jrd::ConfigStorage::touchThreadFunc (this=0x425e36c8) at ../src/jrd/trace/TraceConfigStorage.cpp:338 #3 0x402409f8 in Jrd::ConfigStorage::touchThread (arg=0x425e36c8) at ../src/jrd/trace/TraceConfigStorage.cpp:321 #4 0x40078f20 in run (this=<synthetic pointer>) at ../src/jrd/ThreadStart.cpp:128 #5 (anonymous namespace)::threadStart (arg=<optimized out>) at ../src/jrd/ThreadStart.cpp:139 #6 0x40653ac0 in start_thread (arg=<optimized out>) at pthread_create.c:306 #7 0x4073602c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:113 from /lib/arm-linux-gnueabi/libc.so.6 #8 0x4073602c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:113 from /lib/arm-linux-gnueabi/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 2 (Thread 0x4379c450 (LWP 3878)): #0 __pthread_cond_wait (cond=0x42f9c750, mutex=<optimized out>) at pthread_cond_wait.c:156 #1 0x40193200 in ISC_event_wait (event=0x42f9c730, value=<optimized out>, micro_seconds=0) at ../src/jrd/isc_sync.cpp:1290 #2 0x402bf2ac in Jrd::LockManager::blocking_action_thread (this=0x4002e778) at ../src/lock/lock.cpp:1568 #3 0x402c3304 in Jrd::LockManager::blocking_action_thread ( arg=<optimized out>) at ../src/lock/../lock/lock_proto.h:402 #4 0x40078f20 in run (this=<synthetic pointer>) at ../src/jrd/ThreadStart.cpp:128 #5 (anonymous namespace)::threadStart (arg=<optimized out>) at ../src/jrd/ThreadStart.cpp:139 #6 0x40653ac0 in start_thread (arg=<optimized out>) at pthread_create.c:306 #7 0x4073602c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:113 from /lib/arm-linux-gnueabi/libc.so.6 #8 0x4073602c in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:113 from /lib/arm-linux-gnueabi/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 (Thread 0x40021100 (LWP 3875)): #0 0x40659afc in __pthread_cond_timedwait (cond=0x42ea07f8, mutex=<optimized out>, abstime=0xbeeb78d8) at pthread_cond_timedwait.c:168 #1 0x401931d8 in ISC_event_wait (event=0x42ea07d8, value=<optimized out>, micro_seconds=1000000) at ../src/jrd/isc_sync.cpp:1270 #2 0x402c1568 in Jrd::LockManager::wait_for_request (this=0x4002e778, tdbb=0x4ec15ec2, request=<optimized out>, lck_wait=<optimized out>) at ../src/lock/lock.cpp:3998 #3 0x402c1fe4 in Jrd::LockManager::grant_or_que (this=0x4002e778, tdbb=<optimized out>, request=0x42ea0830, lock=<optimized out>, lck_wait=-1) at ../src/lock/lock.cpp:2295 #4 0x402c22c8 in Jrd::LockManager::enqueue (this=0x4002e778, tdbb=0xbeeb8328, prior_request=<optimized out>, parent_request=<optimized out>, series=1, value=0x425fa874 "\001\376", length=16, type=4 '\004', ast_routine= 0x40127430 <CCH_down_grade_dbb(void*)>, ast_argument=0x426010c0, data=0, lck_wait=-1, owner_offset=22408) at ../src/lock/lock.cpp:566 #5 0x401ae12c in enqueue (wait=-1, level=4, lock=0x425fa830, tdbb=0xbeeb8328) at ../src/jrd/lck.cpp:906 #6 ENQUEUE (wait=-1, level=4, lock=0x425fa830, tdbb=0xbeeb8328) at ../src/jrd/lck.cpp:144 #7 LCK_lock (tdbb=0xbeeb8328, lock=0x425fa830, level=<optimized out>, wait=<optimized out>) at ../src/jrd/lck.cpp:617 #8 0x40194de4 in init_database_lock (tdbb=0xbeeb8328) at ../src/jrd/jrd.cpp:4923 #9 0x401a51d4 in jrd8_attach_database (user_status=0x6503c, filename=0xbeeb87e0 "dbd-firebird-test.fdb", handle=0xbeeb8938, dpb_length=<optimized out>, dpb=0xbeeb897c "\001M") at ../src/jrd/jrd.cpp:999 #10 0x40089a64 in isc_attach_database (user_status=<optimized out>, file_length=<optimized out>, file_name=<optimized out>, public_handle=0x1, dpb_length=0, dpb=0xbeeb8bf0 "\001\214\353\276܌\353\276\310M\002@i\315i") at ../src/jrd/why.cpp:1488 #11 0x0001cbbc in newdb (dbname=0x62e3c "dbd-firebird-test.fdb", usr=<optimized out>, psw=<optimized out>, numbufs=<optimized out>, sql_role_nm=0x64edc "", start_user_trans=true) at ../temp/std/isql/isql.cpp:7333 #12 0x0002307c in do_isql () at ../temp/std/isql/isql.cpp:4991 #13 0x000263bc in ISQL_main (argc=2, argv=<optimized out>) at ../temp/std/isql/isql.cpp:986 #14 0x40683664 in __libc_start_main (main=<optimized out>, argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>, fini=0x46740 <__libc_csu_fini>, rtld_fini=0xed50 <_start+44>, stack_end=0xbeebe844) at libc-start.c:228 #15 0x0000ed50 in _start () I hope this brings some ideas about what might be wrong. |
From: Dmitry Y. <fir...@ya...> - 2011-11-14 19:42:32
|
14.11.2011 23:15, Damyan Ivanov wrote: > # session 1 > $ FIREBIRD=. FIREBIRD_LOCK=. isql-fb dbd-firebird-test.fdb > Database: dbd-firebird-test.fdb > SQL> > > # session 2, from another terminal, running in the same working > # directory > FIREBIRD=. FIREBIRD_LOCK=. isql-fb dbd-firebird-test.fdb > # hangs Do both terminals run with the same Linux user credentials? Is there anything interesting in firebird.log? > Attaching gdb to the second session's isql and requesting a backtrace > gives the following: The second connection waits for the first one to downgrade its database lock, but its request gets ignored. A lock print output (with the -a switch) could be useful here. Dmitry |
From: Damyan I. <dm...@de...> - 2011-11-15 21:22:49
|
-=| Dmitry Yemanov, 14.11.2011 23:42:53 +0400 |=- > 14.11.2011 23:15, Damyan Ivanov wrote: > > > # session 1 > > $ FIREBIRD=. FIREBIRD_LOCK=. isql-fb dbd-firebird-test.fdb > > Database: dbd-firebird-test.fdb > > SQL> > > > > # session 2, from another terminal, running in the same working > > # directory > > FIREBIRD=. FIREBIRD_LOCK=. isql-fb dbd-firebird-test.fdb > > # hangs > > Do both terminals run with the same Linux user credentials? Yes. The same setup works on several other architectures. > Is there anything interesting in firebird.log? There is no firebird.log file in the current directory. Perhaps I missed an additional environment variable that needs to be set for embedded (the system-wide firebird log directory is not writable by the user). > > Attaching gdb to the second session's isql and requesting a backtrace > > gives the following: > > The second connection waits for the first one to downgrade its database > lock, but its request gets ignored. > > A lock print output (with the -a switch) could be useful here. Here it is. This is after (1) creating the database in the first session and (2) attempting to attach to the freshly created database in another session (same user, hanging). $ FIREBIRD=. FIREBIRD_LOCK=. /usr/sbin/fb_lock_print -d test.fdb LOCK_HEADER BLOCK Version: 17, Active owner: 0, Length: 1048576, Used: 23896 Flags: 0x0001 Enqs: 79, Converts: 13, Rejects: 56, Blocks: 0 Deadlock scans: 0, Deadlocks: 0, Scan interval: 10 Acquires: 237, Acquire blocks: 0, Spin count: 0 Mutex wait: 0.0% Hash slots: 1009, Hash lengths (min/avg/max): 0/ 0/ 3 Remove node: 0, Insert queue: 0, Insert prior: 0 Owners (2): forward: 20872, backward: 23672 Free owners: *empty* Free locks: *empty* Free requests: *empty* Lock Ordering: Enabled OWNER BLOCK 20872 Owner id: 73448235728900, type: 1, pending: 0 Process id: 17101 (Alive), thread id: 1073877248 Flags: 0x10 sgnl Requests (21): forward: 21040, backward: 23440 Blocks (1): forward: 21040, backward: 21040 OWNER BLOCK 23672 Owner id: 73761768341505, type: 1, pending: 23840 Process id: 17174 (Alive), thread id: 1073877248 Flags: 0x24 wait tout Requests (1): forward: 23840, backward: 23840 Blocks: *empty* Event log: ------------- end of output --------------- If at this point the first isql is exited, it hangs and the lock table dump is: LOCK_HEADER BLOCK Version: 17, Active owner: 0, Length: 1048576, Used: 27144 Flags: 0x0001 Enqs: 531, Converts: 16, Rejects: 450, Blocks: 0 Deadlock scans: 0, Deadlocks: 0, Scan interval: 10 Acquires: 1521, Acquire blocks: 1, Spin count: 0 Mutex wait: 0.1% Hash slots: 1009, Hash lengths (min/avg/max): 0/ 0/ 3 Remove node: 0, Insert queue: 0, Insert prior: 0 Owners (1): forward: 23672, backward: 23672 Free owners (1): forward: 20872, backward: 20872 Free locks (2): forward: 21224, backward: 26840 Free requests (1): forward: 26784, backward: 26784 Lock Ordering: Enabled OWNER BLOCK 23672 Owner id: 73761768341505, type: 1, pending: 0 Process id: 17174 (Alive), thread id: 1073877248 Flags: 0x08 wake Requests (47): forward: 23840, backward: 27024 Blocks: *empty* Event log: DEL_OWNER: owner = 20872, lock = 20872, request = 0 ------------- end of output --------------- Hopefully this gives some clues. |
From: Dmitry Y. <fir...@ya...> - 2011-11-16 04:44:27
|
16.11.2011 1:22, Damyan Ivanov wrote: >> Is there anything interesting in firebird.log? > > There is no firebird.log file in the current directory. Perhaps > I missed an additional environment variable that needs to be set for > embedded (the system-wide firebird log directory is not writable by > the user). Setting FIREBIRD should be enough. > Here it is. This is after (1) creating the database in the first > session and (2) attempting to attach to the freshly created database > in another session (same user, hanging). > > OWNER BLOCK 20872 > Owner id: 73448235728900, type: 1, pending: 0 > Process id: 17101 (Alive), thread id: 1073877248 > Flags: 0x10 sgnl > Requests (21): forward: 21040, backward: 23440 > Blocks (1): forward: 21040, backward: 21040 > > OWNER BLOCK 23672 > Owner id: 73761768341505, type: 1, pending: 23840 > Process id: 17174 (Alive), thread id: 1073877248 > Flags: 0x24 wait tout > Requests (1): forward: 23840, backward: 23840 > Blocks: *empty* The blocking process has been signaled (in the shared memory) but did not see that fact. This is likely to mean that the IPC event hasn't been delivered. Dmitry |
From: Alex P. <pes...@ma...> - 2011-11-16 06:29:02
|
On 11/16/11 08:44, Dmitry Yemanov wrote: > 16.11.2011 1:22, Damyan Ivanov wrote: > >>> Is there anything interesting in firebird.log? >> There is no firebird.log file in the current directory. Perhaps >> I missed an additional environment variable that needs to be set for >> embedded (the system-wide firebird log directory is not writable by >> the user). > Setting FIREBIRD should be enough. > In standard install procedure empty, firebird group writable file firebird.log is created. Does .deb package create that file too? Is user, running that script, a member of group firebird? >> Here it is. This is after (1) creating the database in the first >> session and (2) attempting to attach to the freshly created database >> in another session (same user, hanging). >> >> OWNER BLOCK 20872 >> Owner id: 73448235728900, type: 1, pending: 0 >> Process id: 17101 (Alive), thread id: 1073877248 >> Flags: 0x10 sgnl >> Requests (21): forward: 21040, backward: 23440 >> Blocks (1): forward: 21040, backward: 21040 >> >> OWNER BLOCK 23672 >> Owner id: 73761768341505, type: 1, pending: 23840 >> Process id: 17174 (Alive), thread id: 1073877248 >> Flags: 0x24 wait tout >> Requests (1): forward: 23840, backward: 23840 >> Blocks: *empty* > The blocking process has been signaled (in the shared memory) but did > not see that fact. This is likely to mean that the IPC event hasn't been > delivered. Starting with 2.5.1 all IPC-related errors, that can't be for some reason delivered to user, are not ignored but logged. Therefore making firebird.log work appears to be very good idea. |
From: Damyan I. <dm...@de...> - 2011-11-16 20:38:57
|
-=| Alex Peshkoff, 16.11.2011 10:28:52 +0400 |=- > In standard install procedure empty, firebird group writable file > firebird.log is created. Does .deb package create that file too? Yes: $ ls -l /var/log/firebird2.5.log -rw-rw---- 1 firebird firebird 0 Nov 13 18:48 /var/log/firebird2.5.log > Is user, running that script, a member of group firebird? No and I can't make it. I am a mere user on the porter machines :) > > The blocking process has been signaled (in the shared memory) but > > did not see that fact. This is likely to mean that the IPC event > > hasn't been delivered. > > Starting with 2.5.1 all IPC-related errors, that can't be for some > reason delivered to user, are not ignored but logged. Therefore making > firebird.log work appears to be very good idea. I am not sure. Running the connecting isql (the second one) under strace doesn't mention 'log' in the output. It also seems to loop repeating rt_sigprocmask calls, which makes me think there is something wrong with strace. (Same loop on the creating isql and the SQL> prompt doesn't show up after the creation). Anyway, I can't find a way to let fbembed use another log directory, different from the one given to ./configure via --with-fblog. Perhaps I have missed something. In case it helps, the output from 'strace isql-fb test.fdb' is at http://paste.ubuntu.com/740562/ |
From: Dmitry Y. <fir...@ya...> - 2011-11-17 07:44:41
|
17.11.2011 11:26, Alex Peshkoff wrote: > And one more interesting line: > open("./fb_init", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 7 > Why is _SHARED_ file opened in CWD? It should go to /tmp/firebird. Because he explicitly sets FIREBIRD_LOCK to CWD before running ISQL :-) Dmitry |
From: Alex P. <pes...@ma...> - 2011-11-17 09:28:08
|
On 11/17/11 11:37, Dmitry Yemanov wrote: > 17.11.2011 11:26, Alex Peshkoff wrote: > >> And one more interesting line: >> open("./fb_init", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 7 >> Why is _SHARED_ file opened in CWD? It should go to /tmp/firebird. > Because he explicitly sets FIREBIRD_LOCK to CWD before running ISQL :-) And do that isql-s have same or different CWD? |
From: Damyan I. <dm...@de...> - 2011-11-18 19:27:38
|
-=| Alex Peshkoff, 17.11.2011 13:28:00 +0400 |=- > On 11/17/11 11:37, Dmitry Yemanov wrote: > > 17.11.2011 11:26, Alex Peshkoff wrote: > > > >> And one more interesting line: > >> open("./fb_init", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 7 > >> Why is _SHARED_ file opened in CWD? It should go to /tmp/firebird. > > Because he explicitly sets FIREBIRD_LOCK to CWD before running ISQL :-) > > And do that isql-s have same or different CWD? Exactly the same, running as the same user. The same test works on e.g. amd64, so there is something, somewhere that is failing on the problem architectures. Maybe it's libedit interfering with libfbembed. I'll try the same thing using the perl driver and see if strace will give any hints. |
From: Alex P. <pes...@ma...> - 2011-11-17 07:27:05
|
On 11/17/11 00:38, Damyan Ivanov wrote: >> Is user, running that script, a member of group firebird? > No and I can't make it. I am a mere user on the porter machines :) > That's a pity and can be even a reason not to deliver ASTs between processes. On the other hand we have 2 processes started by same user, therefore in theory group should not affect something. Al least I can't reproduce it locally (i.e. 2 isqls started by same non-firebird-group-member user work fine). >>> The blocking process has been signaled (in the shared memory) but >>> did not see that fact. This is likely to mean that the IPC event >>> hasn't been delivered. >> Starting with 2.5.1 all IPC-related errors, that can't be for some >> reason delivered to user, are not ignored but logged. Therefore making >> firebird.log work appears to be very good idea. > I am not sure. Running the connecting isql (the second one) under > strace doesn't mention 'log' in the output. It also seems to loop > repeating rt_sigprocmask calls, which makes me think there is > something wrong with strace. (Same loop on the creating isql and the > SQL> prompt doesn't show up after the creation). > > Anyway, I can't find a way to let fbembed use another log directory, > different from the one given to ./configure via --with-fblog. Perhaps > I have missed something. > > In case it helps, the output from 'strace isql-fb test.fdb' is at > http://paste.ubuntu.com/740562/ The only place from which I can see sigprocmask() called is editline library. But there are some mroe interesting things in trace file. First: futex(0xbefd8804, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 4066b000) = -1 EAGAIN (Resource temporarily unavailable) Appears enough to have problems with delivering signals. And one more interesting line: open("./fb_init", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 7 Why is _SHARED_ file opened in CWD? It should go to /tmp/firebird. |
From: Damyan I. <dm...@de...> - 2011-12-08 18:33:27
|
-=| Damyan Ivanov, 18.11.2011 21:27:26 +0200 |=- > Maybe it's libedit interfering with libfbembed. I'll try the same > thing using the perl driver and see if strace will give any hints. No luck. I wrote a primitive isql-like tool that is able to create databases, connect to them and execute SQL statements. Entering «create database 'test.fdb'» works if the tool is run from the prompt. If run under strace, there is a never-ending storm of rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 and the tool never finishes execution. This makes me think that there is something that interferes with firebird's synchronization routines -- maybe in strace, maybe in the C library or even the kernel. I wonder if anybody has ever used Firebird classic/embedded server on these architectures. |
From: Alex P. <pes...@ma...> - 2011-12-09 10:37:01
|
On 12/08/11 22:33, Damyan Ivanov wrote: > -=| Damyan Ivanov, 18.11.2011 21:27:26 +0200 |=- >> Maybe it's libedit interfering with libfbembed. I'll try the same >> thing using the perl driver and see if strace will give any hints. > No luck. > > I wrote a primitive isql-like tool that is able to create databases, > connect to them and execute SQL statements. > > Entering «create database 'test.fdb'» works if the tool is run from > the prompt. If run under strace, there is a never-ending storm of > rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > and the tool never finishes execution. > > This makes me think that there is something that interferes with > firebird's synchronization routines -- maybe in strace, maybe in the > C library or even the kernel. > > I wonder if anybody has ever used Firebird classic/embedded server on > these architectures. I can install kfreebsd on VM, but only after finishing with FB3-alpha showstoppers. |