|
From: Chris F. <cf...@fe...> - 2014-07-31 11:58:58
|
Hi Ralph, Thanks for your help. I'm trying to make sense of it all, but here are two key observations I have made so far. First, the OS X 10.9 Finder is continually calling fsgetpath getxattr getattrlist on files and folders all through the folder hierarchy of mounted AFP shares, as reported by running # fs_usage -w Finder on the Mac. As long as the Finder has no windows open to a folder on the server, these calls don't appear to hit the server much. (Is OS X caching them locally?) Your afp.d Dtrace script shows little activity for the afpd with corresponding PID, and prstat shows afpd sleeping for most of the time at 0.0% CPU. However, as soon as I open a Finder window (in column view) to a folder on the server, the afpd process for that Mac goes to 3-4% CPU, as reported by prstat, or almost a full core (of 24). Running your afp.d script shows that the getcwd syscall is taking very long to return occasionally, see below for a snippet from the output. The entire output is here: <https://gist.github.com/ferebee/99016ccd293b3f616c77#file-netatalk-performance-getcwd-is-slow> What does getcwd do besided get the current directory path, and why would it take such a long time? Wondering whether there might be an issue with disk I/O, I observed this behavior before and after starting a zpool scrub, with no substantive difference. (The output below is during the scrub.) The zpool scrub is now proceeding at 940 MB/s with total I/O (per zpool iostat) of 9.000/150 IOPS r/w, so I presume that ZFS should not be a bottleneck here with respect to netatalk. What do you think? BTW, when I try all the above from an OS X 10.6.8 client, everything is fine, the Finder does not lag, and there is no spurious high CPU load on the server while the Mac is idle. The issue applies to clients running 10.8 and 10.9, and also 10.10 public beta. (No 10.7 available to test.) Thanks, Chris ===== afp.d output snippet: ===== # /root/afp.d -p 4702 Ran for 35573331 us (35 s) Sum of time spend ================= ... disk IO: 115787 us (0 s) ... network IO: 128457 us (0 s) Sum of time spend in syscalls ============================= ... open 12941 chdir 15153 openat 24484 lstat 64230 pollsys 118022 getcwd 35153449 Total number of syscalls ======================== ... getcwd 373 syscalls elapsed time average ============================= ... stat 48 pollsys 426 getcwd 94245 getcwd value ------------- Distribution ------------- count 4 | 0 8 | 1 16 |@@@@@ 43 32 |@@@@@@@@@@@@@@ 127 64 |@@@@@@@@@@ 97 128 |@ 6 256 | 1 512 | 0 1024 | 0 2048 | 0 4096 | 0 8192 | 0 16384 | 0 32768 | 0 65536 | 0 131072 | 2 262144 |@@@@@@@@@@ 91 524288 |@ 5 1048576 | 0 ===== Am 29.07.2014 um 17:22 schrieb Ralph Böhme <rb...@ne...>: > Hi Chris > > Am 29.07.2014 um 16:47 schrieb cf...@fe...: >> I'm trying to troubleshoot performance issues on a netatalk server, >> version 3.1.2 running on SmartOS (release joyent_20140613T024634Z). > > I feel your pain! ;) > >> Performance is as expected when copying large files in the finder, >> opening and saving them in applications, or copying them to and from >> the server using rsync. However, browsing directories on the server >> from the Finder is sometimes extremely slow, especially if multiple >> subdirectories containing many files are involved. Most of the files >> are media files, e. g. .jpg, .tiff, .mov and .mp4. >> >> After opening a folder, the window sometimes stays blank for 30 >> seconds or longer (with the spinner running in the bottom right >> corner) before anything appears, even if the folder contains only >> 10-20 subfolders or a few hundred files. >> >> The server is running SmartOS with a large ZFS pool (24 x 4 TB SAS >> mirrored as 2 x 12), 128 GB RAM and dual 6-core XEON E5 CPUs. It is >> connected to a NetGear GS752TXS switch via 10GbE (Intel X520), while >> the workstations are all using 1GbE. >> >> There are about 20-25 active users. Perhaps 15 work regularly with >> large media files (e. g. Final Cut Pro). >> >> As a test, we have configured Samba 3 and connected some clients via >> SMB (cifs://) rather than AFP. > > Beware! If you switch protocol in production without careful evluation whether the metadata you loose is prodcution relevent or not, users may come screaming after you. :) > >> These clients are not seeing the >> slowdowns, so the issue appears to be related to AFP/netatalk in some >> way. >> >> I have attempted to observe the client activity on the server by >> tracking file opens with Dtrace (dtrace-filebyproc.d), and my >> impression is that the Finder is opening many more files than would >> seem necessary to display the immediate folder contents. Perhaps it is >> trying to cache things several levels deep in the folder hierarchy? If >> so, why are the SMB clients not affected? >> >> Thanks for any pointers on how to debug this further. > > yep, dtrace of course! Attached is a custom dtrace script I wrote but which I didn't include in the Netatalk source tarball yet. I uses both syscall tracepoint as well as application specifc tracepoints, in case the latter are enabled at compile time. > > This should show which part of Netatalk and which syscalls consume much time when you see the performance prob. > > Otoh, before tracing the problematic behaviour with the dtrace script, you may take a look with Wireshark as to see what the heck the Finder is actually doing at that point. > > Good luck! > -Ralph > > <afp.d> > > > -- > Ralph Böhme <rb...@ne...> > Netatalk Developer | Support | Services > NetAFP joins SerNet: <http://www.netafp.com/sernet-expands-apple-support-netatalk-and-samba-merge-1230/> > > > |