From: Josh B. <jo...@si...> - 2012-02-23 23:17:28
|
Hello, I'm using a self-built Netatalk 2.2.2 on an otherwise vanilla Ubuntu 10.04 server on an EXT4 filesystem with ACLs (with a boat load of messages in the logs related to acls). Normal traffic for this server is 150-200 simultaneous clients with "live" network homes, using mostly Snow Leopard systems. I've been experimenting with various combinations here, but haven't found a "sweet spot" yet. I started with the default dbd backend, but switched to cdb after reading that it may be better for network homes. Unfortunately, cdb would produce the ugly CNID warning messages upon login every day or two, so I switched back to dbd. Now that I'm on dbd, I've had at least two afpd crashes (Signal 11), where the symptoms are *extremely* slow logins (5 minutes) and very poor throughput (~100 KBps). This has occurred with two different shares. For trial, I've set the dbpath for one of the shares to store on a different filesystem than the homes are on. It's still too soon to see if that's worked. Between switching backends, I've just been stopping netatalk and recursively removing any .AppleD* data in the shares' paths. When the crash occurs, it seems like sending a SIGHUP to cnid_metad gets things moving again. A complete restart of afpd is hardly an option when there's 200 users connected across campus. With the last crash, I couldn't test too much, but it almost seemed like the performance issue was limited to a specific share, but I don't have conclusive results on that. Log snips are below. My questions are: Does anyone have any insight on this? Is the cnid db corruption causing the afpd crash? Is afpd likely crashing for another reason? Is moving the dbpath to a different filesystem even practical? Can the CNID warning messages be silenced for the end users? Thanks! Leading up to the crash, the following message starts appearing in the logs: Feb 23 14:38:40.353861 afpd[7835] {cnid_dbd.c:425} (E:CNID): transmit: Request to dbd daemon (db_dir /media/store/homes/ms/students) timed out. Here's a snip from the log: Feb 23 14:38:40.353861 afpd[7835] {cnid_dbd.c:425} (E:CNID): transmit: Request to dbd daemon (db_dir /media/store/homes/ms/students) timed out. Feb 23 14:38:40.353929 afpd[7835] {fault.c:122} (S:Default): =============================================================== Feb 23 14:38:40.353945 afpd[7835] {fault.c:123} (S:Default): INTERNAL ERROR: Signal 11 in pid 7835 (2.2.2) Feb 23 14:38:40.353954 afpd[7835] {fault.c:124} (S:Default): =============================================================== Feb 23 14:38:40.355162 afpd[7835] {fault.c:96} (S:Default): BACKTRACE: 13 stack frames: Feb 23 14:38:40.355186 afpd[7835] {fault.c:102} (S:Default): #0 /usr/sbin/afpd(netatalk_panic+0x1f) [0x451a1f] Feb 23 14:38:40.355197 afpd[7835] {fault.c:102} (S:Default): #1 /usr/sbin/afpd() [0x451b1c] Feb 23 14:38:40.355206 afpd[7835] {fault.c:102} (S:Default): #2 /lib/libc.so.6(+0x33af0) [0x7f3c32885af0] Feb 23 14:38:40.355214 afpd[7835] {fault.c:102} (S:Default): #3 /lib/libc.so.6(cfree+0x1d) [0x7f3c328cfe2d] Feb 23 14:38:40.355223 afpd[7835] {fault.c:102} (S:Default): #4 /usr/sbin/afpd(bdestroy+0x2f) [0x449fcf] Feb 23 14:38:40.355232 afpd[7835] {fault.c:102} (S:Default): #5 /usr/sbin/afpd(dir_add+0x40f) [0x41dedf] Feb 23 14:38:40.355240 afpd[7835] {fault.c:102} (S:Default): #6 /usr/sbin/afpd(cname+0x807) [0x420627] Feb 23 14:38:40.355249 afpd[7835] {fault.c:102} (S:Default): #7 /usr/sbin/afpd(afp_getfildirparams+0xc9) [0x429189] Feb 23 14:38:40.355257 afpd[7835] {fault.c:102} (S:Default): #8 /usr/sbin/afpd(afp_over_dsi+0x4c9) [0x412109] Feb 23 14:38:40.355265 afpd[7835] {fault.c:102} (S:Default): #9 /usr/sbin/afpd() [0x410d27] Feb 23 14:38:40.355274 afpd[7835] {fault.c:102} (S:Default): #10 /usr/sbin/afpd(main+0x7b2) [0x42cc52] Feb 23 14:38:40.355282 afpd[7835] {fault.c:102} (S:Default): #11 /lib/libc.so.6(__libc_start_main+0xfd) [0x7f3c32870c4d] Feb 23 14:38:40.355291 afpd[7835] {fault.c:102} (S:Default): #12 /usr/sbin/afpd() [0x410049] Feb 23 14:38:40.357206 afpd[24212] {main.c:219} (I:AFPDaemon): child[7835]: killed by signal 6 Unrelated, my logs are absolutely filled with these kinds of messages: Feb 23 16:15:54.847196 afpd[1878] {acls.c:1745} (E:Default): posix_acls_to_uaperms(path, st, ma) failed: No such file or directory Feb 23 16:15:54.945086 afpd[1878] {ea_sys.c:341} (E:AFPDaemon): sys_set_ea("/media/store/homes/hs/students/labinek/Library/Preferences/com.apple.internetconfigpriv.plist.x0Vqw20", ea:'com.apple.quarantine', size: 46, flags: -|-|-): Operation not supported message repeated 11 times |
From: Josh B. <jo...@si...> - 2012-03-01 02:53:58
|
On 02/23/2012 04:17 PM, Josh Beard wrote: > Hello, > > I'm using a self-built Netatalk 2.2.2 on an otherwise vanilla Ubuntu > 10.04 server on an EXT4 filesystem with ACLs (with a boat load of > messages in the logs related to acls). > > Normal traffic for this server is 150-200 simultaneous clients with > "live" network homes, using mostly Snow Leopard systems. > > I've been experimenting with various combinations here, but haven't > found a "sweet spot" yet. I started with the default dbd backend, but > switched to cdb after reading that it may be better for network homes. > Unfortunately, cdb would produce the ugly CNID warning messages upon > login every day or two, so I switched back to dbd. > > Now that I'm on dbd, I've had at least two afpd crashes (Signal 11), > where the symptoms are *extremely* slow logins (5 minutes) and very poor > throughput (~100 KBps). This has occurred with two different shares. > > For trial, I've set the dbpath for one of the shares to store on a > different filesystem than the homes are on. It's still too soon to see > if that's worked. > > Between switching backends, I've just been stopping netatalk and > recursively removing any .AppleD* data in the shares' paths. > > When the crash occurs, it seems like sending a SIGHUP to cnid_metad gets > things moving again. A complete restart of afpd is hardly an option > when there's 200 users connected across campus. > > With the last crash, I couldn't test too much, but it almost seemed like > the performance issue was limited to a specific share, but I don't have > conclusive results on that. > > Log snips are below. My questions are: > Does anyone have any insight on this? > Is the cnid db corruption causing the afpd crash? > Is afpd likely crashing for another reason? > Is moving the dbpath to a different filesystem even practical? > Can the CNID warning messages be silenced for the end users? > > Thanks! > > Leading up to the crash, the following message starts appearing in the logs: > Feb 23 14:38:40.353861 afpd[7835] {cnid_dbd.c:425} (E:CNID): transmit: > Request to dbd daemon (db_dir /media/store/homes/ms/students) timed out. > > Here's a snip from the log: > <snip> Just for kicks, this was from earlier today, using dbd and 150 connected users (Xeon E5506 @ 2.13GHz 8c): load average: 81.67, 66.21, 38.74 Brutal, to say the least. :D |