From: Evgeniy P. <jo...@2k...> - 2005-02-28 05:11:35
|
On Mon, 2005-02-28 at 10:59 +0900, Kaigai Kohei wrote: > Hello, >=20 > Marcelo Tosatti wrote: > > Yep, the netlink people should be able to help - they known what would= be > > required for not sending messages in case there is no listener registe= red. > > > > Maybe its already possible? I have never used netlink myself. >=20 > If we notify the fork/exec/exit-events to user-space directly as you said= , > I don't think some hackings on netlink is necessary. > For example, such packets is sent only when /proc/sys/.../process_groupin= g is set, > and user-side daemon set this value, and unset when daemon will exit. > It's not necessary to take too seriously. Kernel accounting already was discussed in lkml week ago - I'm quite=20 sure Guillaume Thouvenin created exactly that. His module creates do_fork() hook and broadcasts various process' states over netlink. Discussion at http://lkml.org/lkml/2005/2/17/87 --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski |
From: Guillaume T. <gui...@bu...> - 2005-02-28 07:21:12
|
On Mon, 2005-02-28 at 10:59 +0900, Kaigai Kohei wrote: > Marcelo Tosatti wrote: > > Yep, the netlink people should be able to help - they known what would be > > required for not sending messages in case there is no listener registered. > > > > Maybe its already possible? I have never used netlink myself. > > If we notify the fork/exec/exit-events to user-space directly as you said, > I don't think some hackings on netlink is necessary. > For example, such packets is sent only when /proc/sys/.../process_grouping is set, > and user-side daemon set this value, and unset when daemon will exit. > It's not necessary to take too seriously. I wrote a new fork connector patch with a callback to enable/disable messages in case there is or isn't listener. I will post it this week. Basically there is a global variable that is manipulated with a connector callback so a user space daemon can manipulate the variable. In the fork_connector() function you have: static inline void fork_connector(pid_t parent, pid_t child) { static DEFINE_SPINLOCK(cn_fork_lock); static __u32 seq; /* used to test if message is lost */ if (cn_fork_enable) { [...] cn_netlink_send(msg, CN_IDX_FORK); } } and in the cn_fork module (drivers/connector/cn_fork.c) the callback is defined as: static void cn_fork_callback(void *data) { if (cn_already_initialized) cn_fork_enable = cn_fork_enable ? 0 : 1; } Ok the protocol is maybe too "basic" but with this mechanism the user space application that uses the fork connector can start and stop the send of messages. This implementation needs somme improvements because currently, if two application are using the fork connector one can enable it and the other don't know if it is enable or not, but the idea is here I think. Regards, Guillaume |
From: Andrew M. <ak...@os...> - 2005-02-28 07:40:53
|
Guillaume Thouvenin <gui...@bu...> wrote: > > Ok the protocol is maybe too "basic" but with this mechanism the user > space application that uses the fork connector can start and stop the > send of messages. This implementation needs somme improvements because > currently, if two application are using the fork connector one can > enable it and the other don't know if it is enable or not, but the idea > is here I think. Yes. But this problem can be solved in userspace, with a little library function and a bit of locking. IOW: use the library to enable/disable the fork connector rather than directly doing syscalls. It has the problem that if a client of that library crashes, the counter gets out of whack, but really, it's not all _that_ important, and to handle this properly in-kernel each client would need an open fd against some object so we can do the close-on-exit thing properly. You'd need to create a separate netlink socket for the purpose. |
From: Evgeniy P. <jo...@2k...> - 2005-02-28 07:58:46
|
On Sun, 2005-02-27 at 23:39 -0800, Andrew Morton wrote: > Guillaume Thouvenin <gui...@bu...> wrote: > > > > Ok the protocol is maybe too "basic" but with this mechanism the use= r > > space application that uses the fork connector can start and stop the > > send of messages. This implementation needs somme improvements because > > currently, if two application are using the fork connector one can > > enable it and the other don't know if it is enable or not, but the ide= a > > is here I think. >=20 > Yes. But this problem can be solved in userspace, with a little library > function and a bit of locking. >=20 > IOW: use the library to enable/disable the fork connector rather than > directly doing syscalls. >=20 > It has the problem that if a client of that library crashes, the counter > gets out of whack, but really, it's not all _that_ important, and to hand= le > this properly in-kernel each client would need an open fd against some > object so we can do the close-on-exit thing properly. You'd need to crea= te > a separate netlink socket for the purpose. Why dont just extend protocol a bit? Add header after cn_msg, which will have get/set field and that is all. Properly using seq/ack fields userspace can avoid locks. --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski |
From: Kaigai K. <ka...@ak...> - 2005-03-01 13:38:59
|
Hello, > I tested without user space listeners and the cost is negligible. I will > test with a user space listeners and see the results. I'm going to run > the test this week after improving the mechanism that switch on/off the > sending of the message. I'm also trying to mesure the process-creation/destruction performance on following three environment. Archtechture: i686 / Distribution: Fedora Core 3 * Kernel Preemption is DISABLE * SMP kernel but UP-machine / Not Hyper Threading [1] 2.6.11-rc4-mm1 normal [2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module [3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled) When 367th-fork() was called after fork-connector notification, kernel was locked up. (User-Space-Listener has been also run until 366th-fork() notification was received) Does this number have any sort of means ? In my second trial, kernel was also locked up after 366th-fork() notification. Currently, I don't know its causition. Is there a person encounted it? # I wanted to say "[2] is faster than [3]" when process-grouping is enable, but the plan came off. :( Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Paul J. <pj...@sg...> - 2005-03-02 04:55:07
|
Just a thought - perhaps you could see if Jay can test the performance scaling of these changes on larger systems (8 to 64 CPUs, give or take, small for SGI, but big for some vendors.) Things like a global lock, for example, might be harmless on smaller systems, but hurt big time on bigger systems. I don't know if you have any such constructs ... perhaps this doesn't matter. At the very least, we need to know that performance and scaling are not significantly impacted, on systems not using accounting, either because it is obvious from the code, or because someone has tested it. And if performance or scaling was impacted when accounting was enabled, then at least we would want to know how much performance was impacted, so that users would know what to expect when they use accounting. > the process-creation/destruction performance on following three environment. I think this is a good choice of what to measure, and where. Thank-you. > kernel was also locked up after 366th-fork() I have no idea what this is -- good luck finding it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Guillaume T. <gui...@bu...> - 2005-03-01 13:54:21
|
On Tue, 2005-03-01 at 22:38 +0900, Kaigai Kohei wrote: > > I tested without user space listeners and the cost is negligible. I will > > test with a user space listeners and see the results. I'm going to run > > the test this week after improving the mechanism that switch on/off the > > sending of the message. > > I'm also trying to mesure the process-creation/destruction performance on following three environment. > Archtechture: i686 / Distribution: Fedora Core 3 > * Kernel Preemption is DISABLE > * SMP kernel but UP-machine / Not Hyper Threading > [1] 2.6.11-rc4-mm1 normal > [2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module > [3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled) > > When 367th-fork() was called after fork-connector notification, kernel was locked up. > (User-Space-Listener has been also run until 366th-fork() notification was received) I don't see this limit on my computer. I'm currently running the lmbench with a new fork connector patch (one that enable/disable fork connector) on an SMP computer. I will send results and the new patch tomorrow because the test takes a while... I'm using a small patch provided by Evgeniy and not included in the 2.6.11-rc4-mm1 tree. Best regards, Guillaume --- orig/connector.c +++ mod/connector.c @@ -168,12 +168,11 @@ group = NETLINK_CB((skb)).groups; msg = (struct cn_msg *)NLMSG_DATA(nlh); - if (msg->len != nlh->nlmsg_len - sizeof(*msg) - sizeof(*nlh)) { + if (NLMSG_SPACE(msg->len + sizeof(*msg)) != nlh->nlmsg_len) { printk(KERN_ERR "skb does not have enough length: " - "requested msg->len=%u[%u], nlh->nlmsg_len=%u[%u], skb->len=%u[must be %u].\n", - msg->len, NLMSG_SPACE(msg->len), - nlh->nlmsg_len, nlh->nlmsg_len - sizeof(*nlh), - skb->len, msg->len + sizeof(*msg)); + "requested msg->len=%u[%u], nlh->nlmsg_len=%u, skb->len=%u.\n", + msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)), + nlh->nlmsg_len, skb->len); kfree_skb(skb); return -EINVAL; } |
From: Evgeniy P. <jo...@2k...> - 2005-03-01 15:00:32
|
On Tue, 2005-03-01 at 14:53 +0100, Guillaume Thouvenin wrote: > On Tue, 2005-03-01 at 22:38 +0900, Kaigai Kohei wrote: > > > I tested without user space listeners and the cost is negligible. I w= ill > > > test with a user space listeners and see the results. I'm going to ru= n > > > the test this week after improving the mechanism that switch on/off t= he > > > sending of the message. > >=20 > > I'm also trying to mesure the process-creation/destruction performance = on following three environment. > > Archtechture: i686 / Distribution: Fedora Core 3 > > * Kernel Preemption is DISABLE > > * SMP kernel but UP-machine / Not Hyper Threading > > [1] 2.6.11-rc4-mm1 normal > > [2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module > > [3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled) > >=20 > > When 367th-fork() was called after fork-connector notification, kernel = was locked up. > > (User-Space-Listener has been also run until 366th-fork() notification = was received) >=20 > I don't see this limit on my computer. I'm currently running the lmbench > with a new fork connector patch (one that enable/disable fork connector) > on an SMP computer. I will send results and the new patch tomorrow > because the test takes a while... >=20 > I'm using a small patch provided by Evgeniy and not included in the > 2.6.11-rc4-mm1 tree. 2.6.11-rc4-mm1 tree does not have the latest connector. Various fixes were added, not only that. I run the latest patch Guillaume sent to me(with small updates),=20 fork bomb with more than 100k forks passed already without any freeze. I do not have numbers thought. > Best regards, > Guillaume >=20 > --- orig/connector.c > +++ mod/connector.c > @@ -168,12 +168,11 @@ > group =3D NETLINK_CB((skb)).groups; > msg =3D (struct cn_msg *)NLMSG_DATA(nlh); >=20 > - if (msg->len !=3D nlh->nlmsg_len - sizeof(*msg) - sizeof(*nlh)) { > + if (NLMSG_SPACE(msg->len + sizeof(*msg)) !=3D nlh->nlmsg_len) { > printk(KERN_ERR "skb does not have enough length: " > - "requested msg->len=3D%u[%u], nlh->nlmsg_= len=3D%u[%u], skb->len=3D%u[must be %u].\n", > - msg->len, NLMSG_SPACE(msg->len), > - nlh->nlmsg_len, nlh->nlmsg_len - sizeof(*= nlh), > - skb->len, msg->len + sizeof(*msg)); > + "requested msg->len=3D%u[%u], nlh->nlmsg_= len=3D%u, skb->len=3D%u.\n", > + msg->len, NLMSG_SPACE(msg->len + sizeof(*= msg)), > + nlh->nlmsg_len, skb->len); > kfree_skb(skb); > return -EINVAL; > } >=20 --=20 Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski |
From: Guillaume T. <gui...@bu...> - 2005-03-02 08:58:27
|
On Tue, 2005-03-01 at 22:38 +0900, Kaigai Kohei wrote: > > I tested without user space listeners and the cost is negligible. I will > > test with a user space listeners and see the results. I'm going to run > > the test this week after improving the mechanism that switch on/off the > > sending of the message. > > I'm also trying to mesure the process-creation/destruction performance on following three environment. > Archtechture: i686 / Distribution: Fedora Core 3 > * Kernel Preemption is DISABLE > * SMP kernel but UP-machine / Not Hyper Threading > [1] 2.6.11-rc4-mm1 normal > [2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module > [3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled) > > When 367th-fork() was called after fork-connector notification, kernel was locked up. > (User-Space-Listener has been also run until 366th-fork() notification was received) So I ran the lmbench with three different kernels with the fork connector patch I just sent. Results are attached at the end of the mail and there are three different lines which are: o First line is a linux-2.6.11-rc4-mm1-cnfork o Second line is a linux-2.6.11-rc4-mm1 o Third line is a linux-2.6.11-rc4-mm1-cnfork with a user space application. The user space application listened during 15h and received 6496 messages. Each test has been ran only once. Best regards, Guillaume --- cd results && make summary percent 2>/dev/null | more make[1]: Entering directory `/home/guill/benchmark/lmbench/lmbench-3.0-a4/results' L M B E N C H 3 . 0 S U M M A R Y ------------------------------------ (Alpha software, do not distribute) Basic system parameters ------------------------------------------------------------------------------ Host OS Description Mhz tlb cache mem scal pages line par load bytes --------- ------------- ----------------------- ---- ----- ----- ------ ---- account Linux 2.6.11- i686-pc-linux-gnu 2765 63 128 2.4900 1 account Linux 2.6.11- i686-pc-linux-gnu 2765 67 128 2.4200 1 account Linux 2.6.11- i686-pc-linux-gnu 2765 69 128 2.4400 1 Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- account Linux 2.6.11- 2765 0.17 0.26 3.57 4.19 16.9 0.51 2.31 162. 629. 2415 account Linux 2.6.11- 2765 0.16 0.26 3.56 4.17 17.6 0.50 2.30 163. 628. 2417 account Linux 2.6.11- 2765 0.16 0.27 3.67 4.25 17.6 0.51 2.28 176. 664. 2456 Basic integer operations - times in nanoseconds - smaller is better ------------------------------------------------------------------- Host OS intgr intgr intgr intgr intgr bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ account Linux 2.6.11- 0.1800 0.1700 4.9900 20.8 23.1 account Linux 2.6.11- 0.1800 0.1700 4.9900 20.8 23.1 account Linux 2.6.11- 0.1800 0.1700 4.9900 20.8 23.1 Basic float operations - times in nanoseconds - smaller is better ----------------------------------------------------------------- Host OS float float float float add mul div bogo --------- ------------- ------ ------ ------ ------ account Linux 2.6.11- 1.7300 2.4800 15.5 15.4 account Linux 2.6.11- 1.7300 2.4800 15.5 15.6 account Linux 2.6.11- 1.7400 2.5000 15.7 15.6 Basic double operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS double double double double add mul div bogo --------- ------------- ------ ------ ------ ------ account Linux 2.6.11- 1.7300 2.4800 15.5 15.4 account Linux 2.6.11- 1.7300 2.4800 15.5 15.6 account Linux 2.6.11- 1.7400 2.5000 15.7 15.6 Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- account Linux 2.6.11- 5.1300 5.2900 4.9700 3.1700 10.9 6.30000 32.6 account Linux 2.6.11- 4.9000 5.2100 5.1600 4.4700 20.3 6.48000 27.7 account Linux 2.6.11- 4.8600 5.3000 4.9200 3.5600 20.5 6.87000 31.5 *Local* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- account Linux 2.6.11- 5.130 14.3 11.9 17.7 23.2 20.3 28.3 40. account Linux 2.6.11- 4.900 14.6 12.0 18.5 23.9 20.8 28.6 40. account Linux 2.6.11- 4.860 14.8 12.6 18.1 23.9 20.8 27.8 40. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- account Linux 2.6.11- 18.9 16.1 65.6 33.5 15.4K 0.771 2.22520 16.4 account Linux 2.6.11- 18.8 16.3 64.2 33.2 15.7K 0.841 2.20690 16.5 account Linux 2.6.11- 19.2 16.4 65.4 33.5 15.7K 0.782 2.19950 16.4 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- account Linux 2.6.11- 664. 497. 369. 1468.8 1836.1 596.6 568.4 1819 779.7 account Linux 2.6.11- 671. 521. 338. 1481.6 1817.2 593.8 568.8 1838 783.0 account Linux 2.6.11- 667. 543. 372. 1469.4 1816.8 594.2 568.3 1818 783.0 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ------------------------------------------------------------------------------ Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses --------- ------------- --- ---- ---- -------- -------- ------- account Linux 2.6.11- 2765 0.7030 6.5710 140.6 246.7 account Linux 2.6.11- 2765 0.7090 6.6350 142.4 249.5 account Linux 2.6.11- 2765 0.7110 6.6340 142.5 249.5 make[1]: Leaving directory `/home/guill/benchmark/lmbench/lmbench-3.0-a4/results' |
From: Andrew M. <ak...@os...> - 2005-03-02 09:08:24
|
Guillaume Thouvenin <gui...@bu...> wrote: > > So I ran the lmbench with three different kernels with the fork > connector patch I just sent. Results are attached at the end of the mail > and there are three different lines which are: > > o First line is a linux-2.6.11-rc4-mm1-cnfork > o Second line is a linux-2.6.11-rc4-mm1 > o Third line is a linux-2.6.11-rc4-mm1-cnfork with a user space > application. The user space application listened during 15h > and received 6496 messages. > > Each test has been ran only once. > > ... > ------------------------------------------------------------------------------ > Host OS Mhz null null open slct sig sig fork exec sh > call I/O stat clos TCP inst hndl proc proc proc > --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- > account Linux 2.6.11- 2765 0.17 0.26 3.57 4.19 16.9 0.51 2.31 162. 629. 2415 > account Linux 2.6.11- 2765 0.16 0.26 3.56 4.17 17.6 0.50 2.30 163. 628. 2417 > account Linux 2.6.11- 2765 0.16 0.27 3.67 4.25 17.6 0.51 2.28 176. 664. 2456 This is the interesting bit, yes? 5-10% slowdown on fork is expected, but why was exec slower? What does "The user space application listened during 15h" mean? |
From: Guillaume T. <gui...@bu...> - 2005-03-02 09:25:55
|
On Wed, 2005-03-02 at 01:06 -0800, Andrew Morton wrote: > Guillaume Thouvenin <gui...@bu...> wrote: > > > > So I ran the lmbench with three different kernels with the fork > > connector patch I just sent. Results are attached at the end of the mail > > and there are three different lines which are: > > > > o First line is a linux-2.6.11-rc4-mm1-cnfork > > o Second line is a linux-2.6.11-rc4-mm1 > > o Third line is a linux-2.6.11-rc4-mm1-cnfork with a user space > > application. The user space application listened during 15h > > and received 6496 messages. > > > > Each test has been ran only once. > > > > ... > > ------------------------------------------------------------------------------ > > Host OS Mhz null null open slct sig sig fork exec sh > > call I/O stat clos TCP inst hndl proc proc proc > > --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- > > account Linux 2.6.11- 2765 0.17 0.26 3.57 4.19 16.9 0.51 2.31 162. 629. 2415 > > account Linux 2.6.11- 2765 0.16 0.26 3.56 4.17 17.6 0.50 2.30 163. 628. 2417 > > account Linux 2.6.11- 2765 0.16 0.27 3.67 4.25 17.6 0.51 2.28 176. 664. 2456 > > This is the interesting bit, yes? 5-10% slowdown on fork is expected, but > why was exec slower? I can't explain it for the moment. I will run test more than once to see if this difference is still here. > What does "The user space application listened during 15h" mean? It means that I ran the user space application before the test and stop it 15 hours later (this morning for me). The test ran during 5h30mn. The user space application increments a counter to show how many processes have been created during a period of time. I have not use the user space daemon that manages group of processes because the it still uses the old mechanism (a signal sends from the do_fork()) and as I wanted to provide quick results, I used another user space application. I attache the test program (get_fork_info.c) that I'm using at the end of the mail to clearly show what it does. I will run new tests with the real user space daemon but it will be ready next week, sorry for the delay. Best regards, Guillaume --- /* * get_fork_info.c * * This program listens netlink interface to retreive information * sends by the kernel when forking. It increments a counter for * each forks and when the user hit CRL-C, it displays how many * fork occured during the period. */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <errno.h> #include <signal.h> #include <asm/types.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/time.h> #include <linux/netlink.h> #include <linux/connector.h> #define CN_FORK_OFF 0 #define CN_FORK_ON 1 #define MESSAGE_SIZE (sizeof(struct nlmsghdr) + \ sizeof(struct cn_msg) + \ sizeof(int)) int sock; unsigned long total_p; struct timeval test_time; static inline void switch_cn_fork(int sock, int action) { char buff[128]; /* must be > MESSAGE_SIZE */ struct nlmsghdr *hdr; struct cn_msg *msg; /* Clear the buffer */ memset(buff, '\0', sizeof(buff)); /* fill the message header */ hdr = (struct nlmsghdr *) buff; hdr->nlmsg_len = MESSAGE_SIZE; hdr->nlmsg_type = NLMSG_DONE; hdr->nlmsg_flags = 0; hdr->nlmsg_seq = 0; hdr->nlmsg_pid = getpid(); /* the message */ msg = (struct cn_msg *) NLMSG_DATA(hdr); msg->id.idx = CN_IDX_FORK; msg->id.val = CN_VAL_FORK; msg->seq = 0; msg->ack = 0; msg->len = sizeof(int); msg->data[0] = action; send(sock, hdr, hdr->nlmsg_len, 0); } static void cleanup() { struct timeval tmp_time; switch_cn_fork(sock, CN_FORK_OFF); tmp_time = test_time; gettimeofday(&test_time, NULL); printf("%lu processes were created in %li seconds.\n", total_p, test_time.tv_sec - tmp_time.tv_sec); close(sock); exit(EXIT_SUCCESS); } int main() { int err; struct sockaddr_nl sa; /* information for NETLINK interface */ /* * To be able to quit the application properly we install a * signal handler that catch the CTRL-C */ signal(SIGTERM, cleanup); signal(SIGINT, cleanup); /* * Create an endpoint for communication. Use the kernel user * interface device (PF_NETLINK) which is a datagram oriented * service (SOCK_DGRAM). The protocol used is the netfilter/iptables * ULOG protocol (NETLINK_NFLOG) */ sock = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_NFLOG); if (sock == -1) { perror("socket"); return -1; } sa.nl_family = AF_NETLINK; sa.nl_groups = CN_IDX_FORK; sa.nl_pid = getpid(); err = bind(sock, (struct sockaddr *) &sa, sizeof(struct sockaddr_nl)); if (err == -1) { perror("bind"); close(sock); return -1; } switch_cn_fork(sock, CN_FORK_ON); total_p = 0; gettimeofday(&test_time, NULL); for (;;) { char buff[1024]; /* it's large enough */ struct nlmsghdr *hdr; struct cn_msg *msg; int len; /* Clear the buffer */ memset(buff, '\0', sizeof(buff)); /* Listen */ len = recv(sock, buff, sizeof(buff), 0); if (len == -1) { perror("recv"); close(sock); return -1; } /* point to the message header */ hdr = (struct nlmsghdr *) buff; switch (hdr->nlmsg_type) { case NLMSG_DONE: msg = (struct cn_msg *) NLMSG_DATA(hdr); total_p++; #if 0 printf("[idx=0x%x seq=%u] %s\n", msg->id.idx, msg->seq, msg->data); #endif break; case NLMSG_ERROR: printf("NLMSG_ERROR\n"); /* Fall through */ default: break; } } /* * in fact we never reach this part of the code because there is an * infinite loop above. */ cleanup(); return 0; } |
From: Paul J. <pj...@sg...> - 2005-03-02 15:33:43
|
Andrew wrote: > 5-10% slowdown on fork is expected, but > why was exec slower? Thanks for the summary, Andrew. Guillaume (or anyone else tempted to do this) - it's a good idea, when posting 100 lines of data, to summarize with a line or two of words, as Andrew did here. It is far more efficient for one writer to do this, than each of a thousand readers. Hmmm ... so why was exec slower? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Paul J. <pj...@sg...> - 2005-02-24 01:28:28
|
> So, I think such a fork/execve/exit hooks is harmless now. I don't recall seeing any microbenchmarking of the impact on fork/exit of such hooks. You might find such a benchmark in lmbench, or at http://bulk.fefe.de/scalability/. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Jay L. <jl...@sg...> - 2005-02-22 20:11:38
|
Guillaume Thouvenin wrote: > On Fri, 2005-02-18 at 17:16 -0800, Andrew Morton wrote: > >>Jay Lan <jl...@sg...> wrote: >> >>>Since the need of Linux system accounting has gone beyond what BSD >>>accounting provides, i think it is a good idea to create a thin layer >>>of common code for various accounting packages, such as BSD accounting, >>>CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke >>>a routine in the common code which would then invoke those accounting >>>packages that register to the acct_common to handle do_exit situation. >> >>This all seems to be heading in the wrong direction. Do we really want to >>have lots of different system accounting packages all hooking into a >>generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead >>framework? >> >>Can't we get _one_ accounting system in there, get it right, avoid the >>framework? > > > Is it possible to just merge the BSD accounting and the CSA accounting > by adding in the current BSD per-process accounting structure some > missing fields like the mm integral provided by the CSA patch? Hi Guillaume, All raw data CSA needs already stored in task_struct of the process. > > ELSA is just a user of the accounting data. We need a hook in the > do_fork() routine to manage group of processes, not to do accounting. I see at least three layers of functions in doing system accounting: data collection, handling of the raw data, and presentation of the data to users. We have merged the data collection part. :) Handling of the raw data seems done in ELSA by user spaced daemon and you are proposing to add a hook at fork time. I am interested in learning your approach. How ELSA adds per process accounting data to your grouping (banks) when a process exit? How do you save accounting data you need in task_struct before it is disposed? BSD handles that through acct_process() hook at do_exit(). CSA also depends on a hook at do_exit() to merge per-process data to per-job data. How does ELSA handle this without a need of a do_exit() hook? Thanks, - jay > > Guillaume > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech |
From: Guillaume T. <gui...@bu...> - 2005-02-23 07:31:28
|
On Tue, 2005-02-22 at 12:11 -0800, Jay Lan wrote: > How ELSA adds per process accounting data > to your grouping (banks) when a process exit? How do you save > accounting data you need in task_struct before it is disposed? BSD > handles that through acct_process() hook at do_exit(). CSA also > depends on a hook at do_exit() to merge per-process data to per-job > data. How does ELSA handle this without a need of a do_exit() hook? There are three parts in ELSA. There is a job daemon that does process aggregation. It needs a hook in the do_fork() routine to be able to manage group of processes. So this part handles process-aggregation by maintaining a complete picture of the process/thread hierarchy. You can interact with the job daemon with classical IPC and message operations. Thus we wrote a second part that is the interface between the user and the job daemon. Through this interface you can add and remove a process in/from a group, you can stop the job daemon and you can dump information in a file about current group of processes. This file (that contains information about group of processes) is used by ELSA, with the accounting file provided by the accton(8) command and the BSD accounting, to provide per-group of process accounting. So the third part of ELSA is a parser and also an analyzer. The architecture of ELSA is as follow (I hope that the ASCII picture will be readable): KERNEL | USER SPACE | ------------------- | --------------- | 1. Fork connector | Netlink | 2. Job Daemon | | |---------->| | ------------------- | --------------- | ^ | | IPC ----------------- | ---->| 3. Interface | | | (webmin, ...) |--- | --->| | | | | ----------------- | | Per-group of Accounting File processes (see accton(8)) accounting You can see how it works on the following web page: http://elsa.sourceforge.net/sample_session.html In the session we're using the fork_history.ko which will be replace by the fork hook connector. Best regards, Guillaume |
From: Andrew M. <ak...@os...> - 2005-02-23 08:52:40
|
Guillaume Thouvenin <gui...@bu...> wrote: > > ... > > > We really want to avoid doing such stuff in-kernel if at all possible, of > > course. > > > > Is it not possible to implement the fork/exec/exit notifications to > > userspace so that a daemon can track the process relationships and perform > > aggregation based upon individual tasks' accounting? That's what one of > > the accounting systems is proposing doing, I believe. > > It's what I'm proposing. The problem is to be alerted when a new process > is created in order to add it in the correct group of processes if the > parent belongs to one (or several) groups. The notification can be done > with the fork connector patch. Yes, it sounds sane. The 2.6.8.1 ELSA patch adds quite a bit of kernel code, but from what you're saying it seems like most of that has become redundant, and all you now need is the fork notifier. Is that correct? That 2.6.8.1 ELSA patch looks reasonable to me - it only adds two lines to generic code and the rest looks pretty straightforward. Are we sure that this level of functionality is not sufficient for everyone else? > > (In fact, why do we even need the notifications? /bin/ps can work this > > stuff out). > > Yes it can but the risk is to lose some forks no? > I think that /bin/ps is using the /proc interface. If we're polling > the /proc to catch process creation we may lost some of them. With the > fork connector we catch all forks and we can check that by using the > sequence number (incremented by each fork) of the message. Oh, I wasn't proposing that it all be done via existing /proc interfaces - I was just getting my head straight ;) |
From: Guillaume T. <gui...@bu...> - 2005-02-23 09:30:11
|
On Wed, 2005-02-23 at 00:51 -0800, Andrew Morton wrote: > > It's what I'm proposing. The problem is to be alerted when a new process > > is created in order to add it in the correct group of processes if the > > parent belongs to one (or several) groups. The notification can be done > > with the fork connector patch. > > Yes, it sounds sane. > > The 2.6.8.1 ELSA patch adds quite a bit of kernel code, but from what > you're saying it seems like most of that has become redundant, and all > you now need is the fork notifier. Is that correct? Yes, that's correct. All I need is the fork connector patch. It needs more work like, as you said, sending an on/off message down the netlink socket. I'm working on this (thank you very much Andrew for your comments). I will run benchmarks found at http://bulk.fefe.de/scalability/ to see how the fork connector impacts on the kernel. All stuff that was previously done in kernel space and provided by the 2.6.8.1 ELSA patch has been moved in the ELSA user space daemon called "jobd". Best, Guillaume |
From: Andrew M. <ak...@os...> - 2005-02-23 09:37:04
|
Guillaume Thouvenin <gui...@bu...> wrote: > > I will run benchmarks found at http://bulk.fefe.de/scalability/ to see > how the fork connector impacts on the kernel. The lmbench fork microbenchmark would suffice. > All stuff that was previously done in kernel space and provided by the > 2.6.8.1 ELSA patch has been moved in the ELSA user space daemon called > "jobd". Excellent. Will it work? |
From: Jay L. <jl...@sg...> - 2005-02-23 19:13:27
|
Guillaume Thouvenin wrote: > On Tue, 2005-02-22 at 23:20 -0800, Andrew Morton wrote: > >>Kaigai Kohei <ka...@ak...> wrote: >> >>> The common agreement for the method of dealing with process aggregation >>> has not been constructed yet, I understood. And, we will not able to >>> integrate each process aggregation model because of its diverseness. >>> >>> For example, a process which belong to JOB-A must not belong any other >>> 'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently >>> belong to BANK-B1 which is a child of BANK-B. >>> >>> And, there are other defferences: >>> Whether a process not to belong to any process-aggregation is permitted or not ? >>> Whether a process-aggregation should be inherited to child process or not ? >>> (There is possibility not to be inherited in a rule-based process aggregation like CKRM) >>> >>> Some process-aggregation model have own philosophy and implemantation, >>> so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event handling >>> framework to implement any kinds of process-aggregation. > > > I can add "policies". With ELSA, a process belongs to one or several > groups and if a process is removed from one group, its children still > belong to the group. Thus a good idea could be to associate a > "philosophy" to a group. For exemple, when a group of processes is > created it can be tagged as UNIQUE or SHARED. UNIQUE means that a > process that belongs to it could not be added in another group by > opposition to SHARED. It's not needed inside the kernel. This makes sense to me. CSA can use the UNIQUE policy to enforce its "can't escape from job container" philisophy. > > >>We really want to avoid doing such stuff in-kernel if at all possible, of >>course. >> >>Is it not possible to implement the fork/exec/exit notifications to >>userspace so that a daemon can track the process relationships and perform >>aggregation based upon individual tasks' accounting? That's what one of >>the accounting systems is proposing doing, I believe. > > > It's what I'm proposing. The problem is to be alerted when a new process > is created in order to add it in the correct group of processes if the > parent belongs to one (or several) groups. The notification can be done > with the fork connector patch. I am not quite comfortable of ELSA requesting a fork hook this way. How many hooks in the stock kernel that are related to accounting? Can anyone answer this question? I know of 'acct_process()' in exit.c used by the BSD accounting and ELSA is requesting a hook in fork. If people raise the same question again a few years later, how many people will still remember this ELSA hook? That was the reason i thought a central piece was a good idea. I would rather see the fork hook is coded in acct.c and then invokes a routine that handles what ELSA needs. If CSA would adopt the ELSA's daemon's approach, CSA may also need to use the fork hook. Actually the acct_process() was modified not long ago to become a wrapper, which then invokes do_acct_process() which is completely BSD specific. The fork hook can be the same. - jay > > >>(In fact, why do we even need the notifications? /bin/ps can work this >>stuff out). > > > Yes it can but the risk is to lose some forks no? > I think that /bin/ps is using the /proc interface. If we're polling > the /proc to catch process creation we may lost some of them. With the > fork connector we catch all forks and we can check that by using the > sequence number (incremented by each fork) of the message. > > Guillaume |
From: Guillaume T. <gui...@bu...> - 2005-02-24 07:42:36
|
On Wed, 2005-02-23 at 11:11 -0800, Jay Lan wrote: > Guillaume Thouvenin wrote: > > It's what I'm proposing. The problem is to be alerted when a new process > > is created in order to add it in the correct group of processes if the > > parent belongs to one (or several) groups. The notification can be done > > with the fork connector patch. > > I am not quite comfortable of ELSA requesting a fork hook this way. > How many hooks in the stock kernel that are related to accounting? Can > anyone answer this question? I know of 'acct_process()' in exit.c used > by the BSD accounting and ELSA is requesting a hook in fork. If people > raise the same question again a few years later, how many people will > still remember this ELSA hook? The fork connector is not related to accounting. It's a connector that allows to send information to a user space application when a fork occurs in the kernel. This information is used by ELSA by I think that this hook will be used by some others user space applications and IMHO, it's not incompatible with a specific hook for accounting tool if needed. Guillaume |
From: Jay L. <jl...@sg...> - 2005-02-24 01:56:44
|
Hi Paul, I think the microbenchmarking your link provides is irrelevant. Your link provides benchmarking of doing a fork. However, we are talking about inserting a callback routine in a fork and/or an exit. The overhead is a function call and time spent in the routine. The callback routine can be configured to "do {} while (0)" if a certain CONFIG flag is not set. Thanks, - jay Paul Jackson wrote: >>So, I think such a fork/execve/exit hooks is harmless now. > > > I don't recall seeing any microbenchmarking of the impact on fork/exit > of such hooks. You might find such a benchmark in lmbench, or at > http://bulk.fefe.de/scalability/. > |
From: Paul J. <pj...@sg...> - 2005-02-24 02:20:32
|
Jay wrote: > I think the microbenchmarking your link provides is irrelevant. In the cases such as you describe where it's just some sort of empty function call, then yes, I am willing to accept a wave of the hands and a simple explanation of how it's not significant. I've done the same myself ;). What about the case where accounting is enabled, and thus actually has to do work? How does that compare with just doing the traditional BSD accounting? I presume in that case that the benchmarking is no longer irrelevant. Though if you can make a decent case that it is, I'm willing to listen. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: jamal <ha...@cy...> - 2005-02-28 12:11:53
|
Havent seen the beginnings of this thread. But whatever you are trying to do seems to suggest some complexity that you are trying to workaround. What was wrong with just going ahead and just always invoking your netlink_send()? If there are nobody in user space (or kernel) listening, it wont go anywhere. cheers, jamal On Mon, 2005-02-28 at 02:39, Andrew Morton wrote: > Guillaume Thouvenin <gui...@bu...> wrote: > > > > Ok the protocol is maybe too "basic" but with this mechanism the user > > space application that uses the fork connector can start and stop the > > send of messages. This implementation needs somme improvements because > > currently, if two application are using the fork connector one can > > enable it and the other don't know if it is enable or not, but the idea > > is here I think. > > Yes. But this problem can be solved in userspace, with a little library > function and a bit of locking. > > IOW: use the library to enable/disable the fork connector rather than > directly doing syscalls. > > It has the problem that if a client of that library crashes, the counter > gets out of whack, but really, it's not all _that_ important, and to handle > this properly in-kernel each client would need an open fd against some > object so we can do the close-on-exit thing properly. You'd need to create > a separate netlink socket for the purpose. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech > |
From: Thomas G. <tg...@su...> - 2005-02-28 13:20:50
|
> Havent seen the beginnings of this thread. But whatever you are trying > to do seems to suggest some complexity that you are trying to > workaround. What was wrong with just going ahead and just always > invoking your netlink_send()? I guess parts of the wheel are broken and need to be reinvented ;-> > If there are nobody in user space (or kernel) listening, it wont go anywhere. Additional you may want to extend netlink a bit to check whether there is a listener before creating the messages. The method to do so depends on whether you use netlink_send() or netlink_brodacast(). The latter is more flexiable because you can add more groups later on and the userspace applications can decicde which ones they want to listen to. Both methods handle dying clients perfectly fine, the association to the netlink socket gets destroyed as soon as the socket is closed. Therefore you can simply check mc_list of the netlink protocol you use to see if there are any listeners registered: static inline int netlink_has_listeners(struct sock *sk) { int ret; read_lock(&nl_table_lock); ret = list_empty(&nl_table[sk->sk_protocol].mc_list) read_unlock(&nl_table_lock); return !ret; } This is simplified and ignores the actual group assignments, i.e. you might want to extend it to have it check if there are listeners for a certain group. |
From: jamal <ha...@cy...> - 2005-02-28 14:32:11
|
netlink broadcast or a wrapper around it. Why even bother doing the check with netlink_has_listeners()? cheers, jamal On Mon, 2005-02-28 at 08:20, Thomas Graf wrote: > > Havent seen the beginnings of this thread. But whatever you are trying > > to do seems to suggest some complexity that you are trying to > > workaround. What was wrong with just going ahead and just always > > invoking your netlink_send()? > > I guess parts of the wheel are broken and need to be reinvented ;-> > > > If there are nobody in user space (or kernel) listening, it wont go anywhere. > > Additional you may want to extend netlink a bit to check whether > there is a listener before creating the messages. The method to do so > depends on whether you use netlink_send() or netlink_brodacast(). The > latter is more flexiable because you can add more groups later on > and the userspace applications can decicde which ones they want to > listen to. Both methods handle dying clients perfectly fine, the > association to the netlink socket gets destroyed as soon as the socket > is closed. Therefore you can simply check mc_list of the netlink > protocol you use to see if there are any listeners registered: > > static inline int netlink_has_listeners(struct sock *sk) > { > int ret; > > read_lock(&nl_table_lock); > ret = list_empty(&nl_table[sk->sk_protocol].mc_list) > read_unlock(&nl_table_lock); > > return !ret; > } > > This is simplified and ignores the actual group assignments, i.e. you > might want to extend it to have it check if there are listeners for > a certain group. > |