You can subscribe to this list here.
2005 |
Jan
|
Feb
(61) |
Mar
(153) |
Apr
(39) |
May
(10) |
Jun
(15) |
Jul
(15) |
Aug
(2) |
Sep
|
Oct
(17) |
Nov
(2) |
Dec
(13) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(18) |
Feb
(9) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(7) |
Aug
(1) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
|
2007 |
Jan
(8) |
Feb
(3) |
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
(6) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: SourceForge.net <no...@so...> - 2005-03-18 07:50:22
|
Feature Requests item #1156899, was opened at 2005-03-04 20:30 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646 Category: Tcl-API Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add optional "numbytes" arg to ns_conncptofp Initial Comment: The current ns_conncptofp will copy the content to the file. The entire content. What is missing is to tell to the command to copy only part of the content. The proposed API would be: ns_conncptofp channel ?numbytes? The numbytes can be: ommited (procedure does what it did before) >0 (will copy somany bytes up to contentLength or end-of-data whatever comes first) 0 (will do nothing) -1 (will behave as when argument is ommited) This command is one of those which support the old-compat connId argument. So the actual API is as of today: ns_conncptofp ?connId? channel I would vote to scrap the connId parsing. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-18 08:50 Message: Logged In: YES user_id=95086 Good point. Thanks for looking into this. I will now look into the reqPtr->avail instead of contentLength. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-18 01:39 Message: Logged In: YES user_id=87254 This change introduces a new side-effect: the contentLength is adjusted down by the number of bytes coppied. Does this need to happen? The C API doesn't do this. I think reqPtr->avail needs to be checked rather then connPtr->contentLength. reqPtr->next and reqPtr->avail will be managed by the C API so no need to adjust them here. Also, as the Tcl API may return less than you asked for rather than error out like the C API, should it return the number of bytes copied? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 21:24 Message: Logged In: YES user_id=95086 Added into CVS. ns_conncptofp ?-bytes tocopy? channel The command will silently accept <=0 bytes and revert to copying all that is there. Also, if more than available is requested it will silently copy all that is there. Otherwise, -bytes bytes will be copied to the channel. Channel must be opened for writing otherwise error is thrown. Oh yes, Stephen, I love the arg parsing code This is really a relief. After years any years of parsing... heh.. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 00:30 Message: Logged In: YES user_id=87254 This seems reasonable. Should it look like this though? ns_conncptofp ?-bytes tocopy? channel The reason it is attractive to drop the connid arg is that with two optional args it becomes a pain to parse. I'm also thinking of how ns_sendmail has been extended over time by appending optional args. The C version of this command returns an error if you ask for more bytes than are available. Should the Tcl version silently truncate? Maybe it should return the number of bytes actually copied? Does it make sense to ask for 0 (zero) bytes? Maybe bytes <= 0 should revert to default behaviour which is to copy all that's available? The existing error message "could not copy content (likely client disconnect)" is wrong. This might have been the case in 3.x, but not 4.x. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 04:55:50
|
Feature Requests item #1145957, was opened at 2005-02-22 03:13 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1145957&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Stephen Deasey (sdeasey) Assigned to: Stephen Deasey (sdeasey) Summary: Cookie support for C and Tcl Initial Comment: Atached is a patch which implements a simple cookie API for both C and Tcl. Here's what it looks like (the formating will probably get mangled, sorry): void Ns_ConnSetCookie(Ns_Conn *conn, char *name, char *value, int maxage); void Ns_ConnSetSecureCookie(Ns_Conn *conn, char *name, char *value, int maxage); void Ns_ConnSetCookieEx(Ns_Conn *conn, char *name, char *value, int maxage, char *domain, char *path, int secure); void Ns_ConnDeleteCookie(Ns_Conn *conn, char *name, char *domain, char *path); void Ns_ConnDeleteSecureCookie(Ns_Conn *conn, char *name, char *domain, char *path); char *Ns_ConnGetCookie(Ns_DString *dest, Ns_Conn *conn, char *name); ns_setcookie ?-secure bool? ?-domain d? ?-path p? ?-maxage secs? name data ns_getcookie name ?default? ns_deletecookie ?-secure bool? ?-domain d? ?-path p? name As you can see it only tackles the basics, no comment or version field. Although I've never personaly had to use those features. Cookies are a pre-requisite for the authentication module which exercises the new autht/authz API I haven't written yet... Obviously, the string parsing in the GetCookie code will have to be very carefully looked at before this sees the Internet. But in general, does this look like it's on the right track? ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-18 04:55 Message: Logged In: YES user_id=184124 I uploaded modified patch which looks into all Cookies headers. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 23:25 Message: Logged In: YES user_id=87254 Good call. I've been using a simillar algorithm in Tcl only and it works fine, but I guess that's because I've only been storing session keys etc. which are quite small. I'll have to examine all the cookie headers. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-24 16:05 Message: Logged In: YES user_id=184124 Looks good, the only issue i have, some browser can send more than one Cookie header and you are looking only in the first header. Also, this is new API, no problem adding it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1145957&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 04:33:53
|
Feature Requests item #1159259, was opened at 2005-03-08 18:30 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159259&group_id=130646 Category: Tcl-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Stephen Deasey (sdeasey) Summary: Add nscache module functionality Initial Comment: This is really what needs to be done sine the server itself does support the caches and what lacks is the Tcl access to them. The nscache module (or its equivalent) deserves to be part of the server distribution. ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-18 04:33 Message: Logged In: YES user_id=184124 For a start we can just add tclcache.c into the core and leter improve/rewrite caching if needed. Currenlty i am pretty happy with this module functionality as is. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 19:48 Message: Logged In: YES user_id=184124 Yes, i agree with it ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159259&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 04:31:03
|
Feature Requests item #1151137, was opened at 2005-02-24 16:01 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1151137&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Vlad Seryakov (seryakov) Assigned to: Vlad Seryakov (seryakov) Summary: New driver API and Udp module Initial Comment: Hi guys, Attached is minor driver extensions which do not change existing drivers but add new functionality. There are some cosmetic changes, like moving some fields in the Ns_Sock/Ns_Driver structres so they can be accessed publically and made some private functions public but functionality preserved as before. I included udp driver as an example of new API, and also added ns_sha1 command in the tclmisc.c, it is just one command and it is uses practically everywhere. ------------------------------------------------------------------------- To test udp driver i use new ns_udp command: ossweb:nscp 8> ns_udp send 127.0.0.1 5060 "GET / HTTP/1.0\n\n" HTTP/1.0 200 OK MIME-Version: 1.0 Date: Thu, 24 Feb 2005 05:39:50 GMT Server: NaviServer/4.0.10 Content-Type: text/html; charset=iso-8859-1 Content-Length: 661 Connection: close <HEAD><TITLE>Seryakov's Family Intranet</TITLE></HEAD> ..... ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-18 04:31 Message: Logged In: YES user_id=184124 Do we have any conclusions about this? Soon i will have to develop some application with support for SIP over UDP, i would like to do it on Naviserver and not on my hacked AS. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 16:44 Message: Logged In: YES user_id=184124 And again, i have nothing against it, i said it before, what i am offering is a littl ebit different, for network based protocols driver stuff is kay, i just want to be able to submit conn threads from within the server, not through the main driver. If i have already request data, through shared memory, file or other way, i want to submit wroker thread directly, not by connecting to myself and submitting data using HTTP or other protocol i will have to implement just to queue te request. This is for other applications, not HTTP/Web based, more like messaging/authentication servers and alike. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 14:41 Message: Logged In: YES user_id=184124 Can you provide what you have to modify in driver/nssock, i think that will not be easy and will require too many modifications. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 10:08 Message: Logged In: YES user_id=87254 What do you think of my suggestion that the nssock module and nsd/driver.c could be easily modified to handle udp and unix domain sockets when specified in the config file? A change like that combined with the work you've already done to the binder would mean no development is required to handle all socket types, and would be transparent to the upper layers. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-26 17:10 Message: Logged In: YES user_id=184124 Stephan, I am suggesting a compromise, let's put ParseProc and DriverSockQueu patches, they can coexists easily but will offer two way to add new drivers. Once we have new drivers working we will see what can be modified/adjusted, for now we do not have anything except my several drivers i use in the production and i use my own loop for smtp driver to keep processing and state machine in one place. Again, i have nothing against your method, i used to do similar in my previous versions but now i need to be completely free to implement aditional driver and i think we should have this ability. Zoran, what do you think? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 22:57 Message: Logged In: YES user_id=184124 Try to implement HTTP over UDP as i provided in the patch, you will need to modify core driver again, this is one example. Next, if i want to implement small driver that do not interact with web part of aolserver, no filters, it just need to do one small thing, it should be fast but will handle very many requests. I do not need filters, callbacks, i need my own main loop. I do not understand why you are insisting that all drivers should go through http driver's main loop. If i implement dns server and web interface to it, why they should compete in the same driver's main loop and go through registered filters if they share only backend database structure? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-02-25 22:22 Message: Logged In: YES user_id=87254 You don't have to use registered filters or procs. It's an option for protocol implementations where the initial accept and parsing take place in the driver thread, and processing takes place in a conn thread. Once you submit to the queue however, either a filter, a registered proc, or a registered proxy proc must handle the request. It makes no difference whether you submit explicitly with Ns_DriverSockQueue or implicitly after your Ns_ParseProc completes. How much of the request structure you fill in is up to you, it's all optional. If you choose not to use it there is no overhead. It doesn't have to be a text based protocol for this to make sense. You might fill in the request structure if you wanted to enable people to handle different request types via C or Tcl. Or, you could store binary data in Ns_Cls storage and access it from registered procs etc. In the case where you want to handle connection accept/parse/reponse processing in the same thread, nothing needs to be added. You can do that today with Ns_SockCallback (or by placing all your code in an Ns_ParseProc and returning the response from there) Can you be more specific about the overhead of using an Ns_ParseProc? Exactly what memory is allocated, what code is run that should not..? What specifically can you not do with the Ns_ParseProc interface that you need the Ns_Driver* routines for? I'm looking for concrete examples. As far as I can tell, with Ns_ParseProc you write less code, get more options, and it takes advantage of infrastructure to give you more speed. I must be missing something, but you'll have to explain it to me in more detail because I don't get it. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 20:43 Message: Logged In: YES user_id=184124 Okay, the biggest problem with your way i see that i am enforced to go through http driver, even with my parse proc, i need to know how it works to implement my driver so correct hooks/filters will be called. If i want skip some parts i am not able to do it, http driver works only one way. Another issue is if something will not fit into current driver, additional hook needs to be introduced and core driver needs to be modified again. If my driver fits http-like/text mode paradigm, reusing http driver is the easiest way, but if i need something specific, all extra step to mimic the connection as http request, so all other parts of the http driver will not fail is just unnecesssary extra efforts. And i you mentioned, using Ns_DriverSockQueue is low level function and requires from the developer full attention for bulding the driver, but if this is what i want, why not. I can spawn my own thread if some callbacks are long running Tcl scripts and queue connection from my own loop. If i omly want to received packets, decode them and submit connection, i do not need all http driver infrusturcture, why to enforce using it? So we can have at least 3 ways for supporting multiprotocols: - standard callbacks, completely different threads, no connection pooling (Exists now) - Ns_DriverSockQueue, new thread or callback thread, reusing connection pooling. All filters/traces are reused as well - driver Parse proc in the http driver to reuse drivers thread and connection pooling with all filters/traces All 3 methiods can co-exists and do not interfere with each other, that's my point, and all they add just couple extra functions. I would implement all methods, i have 3 drivers i need to run: smtp, dns, snmp, can add imap in the future ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-02-25 20:31 Message: Logged In: YES user_id=87254 "Each way has its own drawbacks..." That's what I'm asking. What are the drawbacks for each approach we've come up with so far? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 15:23 Message: Logged In: YES user_id=184124 I am not arguing with you about your ParseProc driver extension, it is very usefull and makes driver more flexible. I just want to add couple of new API functions that will allow developer to queue connections from any place, that's it. How drivers will be written and how developer will decide to handle it is up to each particular developer. But nobody will be locked up in only one way of doing it. Each way has its own drawbacks so we have multiple choices and developer will decide for itself how it should be. Both our ways do not mutually exclusive. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-02-25 08:28 Message: Logged In: YES user_id=87254 Ah I see you're right, an extra thread is not created for each incoming request. 'udpThread' is perhaps not the best name for the socket callback though... :-) I still don't see the advantage of using Ns_SockCalback. A single thread is created by the server to handle all registered callbacks, including those from ns_sockcallback. At runtime, while this single thread is running tcl code to handle one callback, your dirver callback cannot run, no new connections will be accepted, etc. Why is it better to use the Ns_SockCallback thread rather than a driver thread? Maybe I'm reading this wrong, but how do you handle the case where the request arrives as more than one packet? Ns_DriverSockRead() is called from the 'udpThread' socket callback. The only return value checked is NS_OK, but couldn't this also be SOCK_MORE? How would you handle things like keepalive? I think you'd have to reimplement that in the Ns_SockCallback thread. Re the proxy stuff, Ns_RegisterRequest() and Ns_RegisterProxyRequest() seem very simillar. With Ns_RegisterRequest(), filters are run and you get the choice of using C or Tcl. Ns_RegisterProxyRequest() offers no advantage that I can see -- you still need a complete Request structure, even if you just ignore it. The comm driver initialization should probably be changed to automatically handle unix domain sockets. You add /foo rather that 127.0.0.1 in the config file and it knows to create the correct type of socket. nssock and nsopenssl wouldn't have to be modified at all. UDP is different. There are two types of protocols: single packet, such as DNS or RADIUS; multi packet, as used in some streaming media and p2p protocols. I think for the single packet case we again might want to modify the driver code to automatically handle it. No read-ahead is necessary, there's only one packet per request, so it's placed in the request buffer and passed on to the next stage, which is parsing. Here, a return code of SOCK_MORE would be illegal. Every multi-packet UDP protocol will require custom framing/sequencing and the developer will have to create a new socket driver. Taking RADIUS as an example, which is a single packet UDP protocol, you'd create a very simple Ns_ParseProc who's only job is to check that the number of bytes specified in the header actually arrived, and return SOCK_READY. A default request structure is created for you so the only other thing you have to do is Ns_RegisterRequest() for the '/' URL. Within your request proc, call Ns_ConnContent() and parse the buffer. Now, you do have a number of other options to make this more flexible. You could parse the request in your Ns_ParseProc and then fill out the request structure. e.g. the different RADIUS message types could be expressed as HTTP verbs. This buys you flexibility. Now you can Ns_RegisterRequest() a different routine for each RADIUS message type, and someone reusing your code can override your default implementation. You can also handle some message types in C, and others in Tcl. You might decide to put some useful information about the RADIUS request in the URL line. Now you get logging for free. You might decide to parse the key/value pairs from the RADIUS request into query vars or headers. Now you don't have to write a bunch of RADIUS specific Tcl commands to access the data, should you want to handle that from Tcl. ns_queryget or ns_conn headers will work. How far you want to go is up to you, the writer of the protocol module. But at the low end all you need is an Ns_ParseProc and usually a single registered request proc, which is a lot less code to write than the Ns_Driver* stuff, I think. I'm sure it's not perfect! So what's wrong with it? Why would it suck to write a RADIUS protocol module as I've just described, to take one example? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 05:51 Message: Logged In: YES user_id=184124 I think we need to collect all solutions and then see where are going, i am holding to port all my old/running modules because i do not know how naviserver will handle foreign protocols. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 05:48 Message: Logged In: YES user_id=184124 yes, using proxy can solve immediate requirements without hacking NsConnProc by adding hooks to call driver specific C functions. If i need my smtp server and main loop is in C, i need somehow call it in the connection thread. Using registered proxy function i can do it now, i do not need filter/traces. This is for completely new modules. I can implement main loop in the module as Tcl command and then call it in the connection filter, it is possible, it will just require many different parts to be in place and still filter should be registered as Tcl proc which will call another driver main loop. sometimes low level stuff makes things easier and simpler:-))) ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 05:45 Message: Logged In: YES user_id=184124 In all my installations i need sha module but it wasn't in the distribution, so i need to download it or repackage aolserver to include nssha1. That is my point, it is just one simple function use more often than something like ns_jpegsize or so. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-02-25 05:42 Message: Logged In: YES user_id=87254 Well I don't know about the wisdom of adding sha1 to the core at this stage :-) I see your point though, encryption and hashing functions are almost universally required for systems/server work. Maybe we need to consider adding a new encryption module to the core distribution. Like nsdb it would export a C API via libnscypher.so (or whatever) as well as the Tcl module nscypher.so. Times have changed and things like the openssl libraries are common on all platforms so it's not the big deal it used to be to add such dependiencies. Such a module should make implementing SSL in nsopenssl easier. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-25 05:41 Message: Logged In: YES user_id=184124 No, i do not spawn new thread, i use callback feature of the server, when socket is ready, server calls provided callback ad that callback just submits the socket to connection pool. if pool is full, Ns_DriverSockQueue will return NS_TIMEOUT, you can retry. Yes, it is low-level, for high level, HTTP driver provides a lot of functionality, it could be extended like you did, but still it is HTTP driver hacked. If i want completely new driver, like RADIUS server, http driver will not help me, i need low level stuff, and it is there already. To reuse resource limiting, i added Ns_DriverSockQueue function, so new conections can be queue instead of creating new threads. I do not have anything against your patch for extending current driver, it is very usefull. i am offering new API for low level drivers. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-02-25 05:36 Message: Logged In: YES user_id=87254 The proxy stuff is too late in the cycle to do much. The request has to be fully parsed by then (for read-ahead). If a proxy function is available, then all filters, the auth phase, registered procs, cleanup procs etc. are bypassed. A lot of that stuff can be very useful for non-HTTP protocols. I don't think handling stuff via a proxy function buys you much. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-02-25 05:32 Message: Logged In: YES user_id=87254 I don't understand what you're trying to acheive here (well, apart from multi-protocol... :-). The newly exposed Ns_Driver* entry points are quite low level, and so the implementor of the new protocol is left to do a lot of the heavy lifting. For example, the way I read it you have to create your own listen socket and register a callback. Every time a new reaquest comes in a thread is spawned to handle it. From that thread you then submit the parsed request to one of the conn threads. Excessive thread creation and message passing between threads is not going to perform well. And it seems you have to write more code than e.g. the example POP3 driver I posted some time ago. You're also not taking advantage of the other facilities that the server offers. What happens if 1000 connections arrive, do you spawn 1000 threads? You could of course code up some limit checks, but this already exists. What if a client sends you a continuous stream of data, 2GB... etc. By using the driver hooks to provide the new protocol parser, you deny yourself the opportunity to use something like the nsopenssl module. This should work just fine for protocols like SMTP, IMAP, POP3 and probably others. Anyway, I think one of the most carefully coded aspects of the server is it's attention to resource usage. That goes for IO, context switching, memory, etc. It's espescialy nice that most of the time you're not even aware that all this work is being done for you. I'd like new protocol drivers to be able to transparently take advantage of that. Could you take a look at my old POP3 demo driver? It's the attachment nspopd-0.3.tar.bz2 over here: http://sourceforge.net/tracker/index.php?func=detail&aid=973010&group_id=3152&atid=353152 It's not obvious that anything interesting is going on, so it's not much to look at. But actually, conn socket read-ahead is happening eficiently in one thread with async IO, the conn threads are treated as a precious resource (heavy-weight Tcl interps) and are allocated at the last minute, there's an easy API in C and Tcl to implement the actual reading of data from the INBOX (could be from the file system, db etc.). You've got a lot of experience writing servers, what do you think is wrong with this model? What can it not do, or what could it do better? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-02-24 21:29 Message: Logged In: YES user_id=184124 Another thing, once we can submit connections from any place, no need to build any drivers, even in C, i can register new proxy proc and set protocol field in my request, so when submitted, connection will run registered proxy proc. for example: in my smtp driver/module, i create driver, register proxy for smtp: protocol, register callback for the socket. Once connection accepted, in my module i submit that connection to the queue with request-protocl set to smtp:. queue.c will call my proxy handler, which is C function. No need to add anything else. This way even standard aolserver can be extended without touching precious http driver thread. Sorry for sarcasm. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1151137&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 04:28:45
|
Feature Requests item #1165562, was opened at 2005-03-17 20:50 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Scrap aolpress handling Initial Comment: In Ns_ConnSetRequiredHeaders() there is a provision to add NaviServer/2.0 to headers. I doubt that anybody will ever need this any more. I'd scrap this entirely. ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-18 04:28 Message: Logged In: YES user_id=184124 There are also places in server.c , nsd.h and nsdconf.h with aolpress. Let's remove them ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-18 00:21 Message: Logged In: YES user_id=87254 There's more aolpress junk in tcl/fastpath.tcl which can be removed at the same time. I think this closed-source client is officially dead... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 04:26:55
|
Feature Requests item #1159471, was opened at 2005-03-09 00:40 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159471&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Vlad Seryakov (seryakov) Assigned to: Vlad Seryakov (seryakov) Summary: Virtual Hosting Initial Comment: Okay, i did some digging and testing, looks working and much simpler. I tried to simplify default AS 4.x virtual hosting and added pageroot virtual hosting, simple way to use different pagetroots on the same server. The change vor default virtual hosting is: no defaultserver anymore, the server who registered virtual hosts is default server, so nssock is loaded in the default server, other than that virtual servers are defined the same way. One thing i left is to chamge ns_info pageroot to use Ns_GetConn() and then if exists use connPtr->pageroot, but this is simple change if you approve current virtual hosting patch. Here is the nsd.tcl config example: ns_section "ns/server/${server}/module/nssock/servers" ns_param test vlad.seryakov.com ns_param test vlad.seryakov.com:80 ns_section "ns/server/${server}/module/nssock/pageroots" ns_param ${home}/html/test vlad.seryakov.com ns_param ${home}/html/test vlad.seryakov.com:80 ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-18 04:26 Message: Logged In: YES user_id=184124 On the other hand if it will be possible to configure/rename "servers/" and "pages/" parts or omit them completely i would be happy with this patch. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 14:48 Message: Logged In: YES user_id=184124 The problem i see with this patch, it will break all my AS installations, because it assumes that every servers is under servers/ direcotory and pageroot is awlays named pages/. I my on-server installs i do not have this complex directory structure, i have html/ under ns_info home and pageroot is set in the nsd.tcl. Also, what is the point of ServerRoot dir, we have PageRoot already, if it is set into absolute path other apps use ns_info pageroot? As i understand, under ServerRoot there is only pages difrectory? Why not to use pageroot directly? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 14:05 Message: Logged In: YES user_id=87254 Oh, forgot to mention. It depends on the Tcl Callbacks patch. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 14:03 Message: Logged In: YES user_id=87254 I've added the patch naviserver-4.0.10-server-root-vhost.patch. It adds the following new routines: Ns_ServerPath() Ns_PagePath() Ns_SetServerRootProc() Ns_ConnLocationAppend() Ns_SetConnLocationProc() NsServerRoot() NsPageRoot() ns_serverpath ?pathSegment ...? ns_pagepath ?pathSegment ...? ns_serverrootproc script ?arg? ns_locationproc script ?arg? And the following new configuration options: ns_section "ns/server/${servername}/vhost" ns_param enabled false ns_param prefix "" ns_param pagedir pages # overides fastpath/pageroot ns_param stripwww true ns_parma stripport true This version of host header based virtual hosting which depends on the existance of the pages directory is a superset of the functionality provided by a static configuration. Applications which call Ns_ConnLocation() will need to be modified to call Ns_ConnLocationAppend() or they will not be vhost aware. I've reinstated the depreciated proc Ns_SetConnLocationProc() and changed it's signature. It dissapeared from ns.h almost 5 years ago, then reappreared ~2.5 years ago. I don't think this will be a problem. In turn, I've depreciated the Ns_SetLocationProc(). It compiles and the server runs. I haven't had time to test it. Does this look acceptable? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-13 00:41 Message: Logged In: YES user_id=184124 Just to consider the possibility, instead of mallocing pageroot/location, by default it can use sockPtr->pageroot/sockPtr->location, then when ns_conn pageroot newpageroot called, it will set connPtr->pageroot with malloced string and ns_conn pageroot will check and return it instead of sockPtr->pageroot. This way, no overhead at conn queue and still new pageroot/location can be set in Tcl ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-13 00:02 Message: Logged In: YES user_id=184124 That is the probem with nssock, it is actually an extender of the driver but still somehow kept independent. nssock itself is useless without driver and only used to bind to more than one address for different servers. it could be moved in the core but the problem will be how to define more than once instance. malloc are overhead indeed, but once copied they can be used independently and canbe set in Tcl by using ns_conn location newLocation or ns_conn pageroot newPageroot. In this case they should be a copies. Just do not make mass virtual hosting the only virtual hosting way, being able to change pageroot in the Tcl/C give developer more flexibility if required. For simple cases, mass virtual hosting is okay. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-12 23:48 Message: Logged In: YES user_id=87254 I'm not keen at all on adding virtual hosting to the nssock driver. There's nothing HTTP specific about the nssock dirver and I'd like to keep it that way. There are a couple of problems with the other proposed solution. The paired functions Ns_SetLocationProc()/Ns_ConnSetLocationProc() etc. seems excessive, and the enforced malloc()ing at runtime of the location and pageroot strings is an unwelcome overhead. Using Tls storage is clever but pretty ugly. I think dstrings are the way to go here. I'm not sure Tls is safe in this implementation. The same dstring is used for location and pageroot strings, so it depends what the caller decides to do with the result and in what order, whether or not one overwrites the other. I would like to explore adding mas virtual hosting into the core. Let me work up a patch, I think I can get to this this weekend... ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-12 21:57 Message: Logged In: YES user_id=184124 Attached is nssock with virtual hosting patch ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-12 20:39 Message: Logged In: YES user_id=184124 Actually, vhost can be combined with nssock, if options given it will enable virtual hosting, if not works as regular sock driver. This way it is always with the core and at the same time independent. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-10 21:38 Message: Logged In: YES user_id=184124 If loaded, vhost module works as Stephan suggested, strips port/host and usesd pageroot if other root is not specified. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-10 21:37 Message: Logged In: YES user_id=184124 There is new patch with Stephans corrections/additions. I think we can provide core module for virtual hosting, i called it nsvhost and we can extend this module to do all sorts of hosting. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-10 18:53 Message: Logged In: YES user_id=184124 And keeping port and www. is sometimes necessary, you can do virtual hosting by port only, IDT does that for example, and many sites work without www. prefix, just stripping them by default may not be appropriate. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-10 18:24 Message: Logged In: YES user_id=184124 >Looks good. However, isn't ns_conn pageroot >unneccessary? For templating engines ns_info pageroot is the only way to figure out the root, UrlToFile works for fastpath only. >Would it be better to have the servers and pageroot config >in one section, to avoid duplication? They are mutually exclusive, that's why i put them in different sections, if full virtual server is set, pageroots are ignored, this is for AOL like virtual hosting with different servers(rare situation though). So in most cases pageroots will be used only, thus only one simple syntax. As for PageRoot and ServerRoot, i think this is a little cofusing. Currently, pageroot returns full path and whoever calls pageroot assumes that it will return full path. Virtual hosting using directories as hostnames is what i am currently using with vhost module and i think it can be included as a standard feature for easy virtualhosting solutions. I do not think this should be the ONLY virtual hosting solution, everybody can write their own modules using SetLocation/SetPageRoot procs or register filter which will set pageroot for each connection. Let me prepare another patch-set with your suggestions included . ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-10 08:00 Message: Logged In: YES user_id=87254 Looks good. However, isn't ns_conn pageroot unneccessary? Would it be better to have the servers and pageroot config in one section, to avoid duplication? ns_section "ns/server/${server}/module/nssock/servers" ns_param example.com exampleserver ns_param foo.example.com "exampleserver ${home}/whatever" Here, the first entry maps the example.com host to the exampleserver and uses it's default pageroot. The second entry supplies a new pageroot. How about mass virtual hosting, i.e. where you don't have to explicitly configure each host header to pageroot mapping, but construct the pageroot from the host heafer at runtime? I've attached the file nsmassvhost.c which implements the above. It uses the hooks Ns_PageRootProc and Ns_LocationProc which would be unneccessary if the functionality was included as standard. It trims the port and any leading 'www.' from the host header. It would be nice to have this for the static mapping also, as at the moment to be robust you often need 4 mappings for each virtual host. It also uses the function Ns_ServerPath(). The idea here is to introduce the concept of the virtual server root as a distinct location in the file system, where the pageroot is a location below that. I want to change this so that the serverroot is dynamic and based on the host header (when configured), and the pageroot is simply the serverroot with "/pages" (or whatever is configured) appended. It would look something like this: /srv/server1/pages /srv/server1/example.com/pages /srv/server1/example.com/cache The first path is the default or non-virtual hosted case. server1 is a server defind in the config file and has it's own private tcl library. The second path is the pageroot of a virtual host. The third is an example of some data which is specific to the example.com virtual host. So, without virtual hosts: Ns_ServerRoot() -> /srv/server1 Ns_PageRoot() -> /srv/server1/pages With virtual hosts (and called in the context of a conn thread): Ns_ServerRoot() -> /srv/server1/example.com Ns_PageRoot() -> /srv/server1/example.com/pages The advantage of this system is that you don't have to restart your server every time you add or remove a virtual host. There is also a convenient location to store data associated with both virtual servers and virtual hosts. Easy to backup, remove, etc. I haven't had time to look at how this would be integrated into what you've got here, maybe at the weekend. Feel free to take a shot at though :-) Does the scheme outlined above make sense to you? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159471&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 00:39:53
|
Feature Requests item #1156899, was opened at 2005-03-04 12:30 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646 Category: Tcl-API Group: None >Status: Open Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add optional "numbytes" arg to ns_conncptofp Initial Comment: The current ns_conncptofp will copy the content to the file. The entire content. What is missing is to tell to the command to copy only part of the content. The proposed API would be: ns_conncptofp channel ?numbytes? The numbytes can be: ommited (procedure does what it did before) >0 (will copy somany bytes up to contentLength or end-of-data whatever comes first) 0 (will do nothing) -1 (will behave as when argument is ommited) This command is one of those which support the old-compat connId argument. So the actual API is as of today: ns_conncptofp ?connId? channel I would vote to scrap the connId parsing. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 17:39 Message: Logged In: YES user_id=87254 This change introduces a new side-effect: the contentLength is adjusted down by the number of bytes coppied. Does this need to happen? The C API doesn't do this. I think reqPtr->avail needs to be checked rather then connPtr->contentLength. reqPtr->next and reqPtr->avail will be managed by the C API so no need to adjust them here. Also, as the Tcl API may return less than you asked for rather than error out like the C API, should it return the number of bytes copied? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 13:24 Message: Logged In: YES user_id=95086 Added into CVS. ns_conncptofp ?-bytes tocopy? channel The command will silently accept <=0 bytes and revert to copying all that is there. Also, if more than available is requested it will silently copy all that is there. Otherwise, -bytes bytes will be copied to the channel. Channel must be opened for writing otherwise error is thrown. Oh yes, Stephen, I love the arg parsing code This is really a relief. After years any years of parsing... heh.. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 16:30 Message: Logged In: YES user_id=87254 This seems reasonable. Should it look like this though? ns_conncptofp ?-bytes tocopy? channel The reason it is attractive to drop the connid arg is that with two optional args it becomes a pain to parse. I'm also thinking of how ns_sendmail has been extended over time by appending optional args. The C version of this command returns an error if you ask for more bytes than are available. Should the Tcl version silently truncate? Maybe it should return the number of bytes actually copied? Does it make sense to ask for 0 (zero) bytes? Maybe bytes <= 0 should revert to default behaviour which is to copy all that's available? The existing error message "could not copy content (likely client disconnect)" is wrong. This might have been the case in 3.x, but not 4.x. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-18 00:21:35
|
Feature Requests item #1165562, was opened at 2005-03-17 13:50 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Scrap aolpress handling Initial Comment: In Ns_ConnSetRequiredHeaders() there is a provision to add NaviServer/2.0 to headers. I doubt that anybody will ever need this any more. I'd scrap this entirely. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 17:21 Message: Logged In: YES user_id=87254 There's more aolpress junk in tcl/fastpath.tcl which can be removed at the same time. I think this closed-source client is officially dead... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 20:50:35
|
Feature Requests item #1165562, was opened at 2005-03-17 21:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Scrap aolpress handling Initial Comment: In Ns_ConnSetRequiredHeaders() there is a provision to add NaviServer/2.0 to headers. I doubt that anybody will ever need this any more. I'd scrap this entirely. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 20:24:32
|
Feature Requests item #1156899, was opened at 2005-03-04 20:30 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646 Category: Tcl-API Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add optional "numbytes" arg to ns_conncptofp Initial Comment: The current ns_conncptofp will copy the content to the file. The entire content. What is missing is to tell to the command to copy only part of the content. The proposed API would be: ns_conncptofp channel ?numbytes? The numbytes can be: ommited (procedure does what it did before) >0 (will copy somany bytes up to contentLength or end-of-data whatever comes first) 0 (will do nothing) -1 (will behave as when argument is ommited) This command is one of those which support the old-compat connId argument. So the actual API is as of today: ns_conncptofp ?connId? channel I would vote to scrap the connId parsing. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 21:24 Message: Logged In: YES user_id=95086 Added into CVS. ns_conncptofp ?-bytes tocopy? channel The command will silently accept <=0 bytes and revert to copying all that is there. Also, if more than available is requested it will silently copy all that is there. Otherwise, -bytes bytes will be copied to the channel. Channel must be opened for writing otherwise error is thrown. Oh yes, Stephen, I love the arg parsing code This is really a relief. After years any years of parsing... heh.. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 00:30 Message: Logged In: YES user_id=87254 This seems reasonable. Should it look like this though? ns_conncptofp ?-bytes tocopy? channel The reason it is attractive to drop the connid arg is that with two optional args it becomes a pain to parse. I'm also thinking of how ns_sendmail has been extended over time by appending optional args. The C version of this command returns an error if you ask for more bytes than are available. Should the Tcl version silently truncate? Maybe it should return the number of bytes actually copied? Does it make sense to ask for 0 (zero) bytes? Maybe bytes <= 0 should revert to default behaviour which is to copy all that's available? The existing error message "could not copy content (likely client disconnect)" is wrong. This might have been the case in 3.x, but not 4.x. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 18:26:23
|
Feature Requests item #1156141, was opened at 2005-03-03 21:02 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 Category: None Group: None >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add ns_conn channel command Initial Comment: Extend the ns_conn command to allow wrapping an Tcl channel arround it. This channel can be used to talk to the remote client afterwards. The idea is simple. The client uses standard http protocol to open up a connection to the server. Server accepts the connection and obtains the channel from it. The client also gets the channel to the server and from this point on, both can exchange arbitrary data. Example implementation is given as a patch against the current CVS state. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 19:26 Message: Logged In: YES user_id=95086 Added to CVS head ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-17 15:20 Message: Logged In: YES user_id=184124 i do not ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 13:10 Message: Logged In: YES user_id=95086 OK, If you do not mind, I'll add ns_conn channel It will return the Tcl channel you can use for talking to the client. When you've finished, you just "close $sock" and finish the proc. The connection thread is then ready for other users/requests. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 12:41 Message: Logged In: YES user_id=87254 Yes, I get it :-) My only point was that HTTP is not a short lived connection protocol, if you don't want it to be. Lengthy connections with multiple messages seems to be one of the key features here. The *only* thing HTTP can't do is have the server initiate a message to the client. For the kinds of cases you're talking about here with multi-hour long connections, I'd be wanting some kind of 'ping' request from the client every so often just to make sure the server was idle and not broken. The ping becomes a 'what messages do you have for me?' request, the server's response is itself a request to the client. Anyway, I was talking pure nonsense about the keepalive stuff; the client doen't have to be so broken after all. I'm fine with you adding this feature. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 10:03 Message: Logged In: YES user_id=95086 "still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do Stephen, I do not think you understood my point. What this command buys me is to be able to initiate an http connection and reuse the socket-to-socket comm channel for other purposes. For example we make "GET" request to a certain URL and then when the connection gets established, the server, takes the conn socket, wraps a Tcl channel around it and passes this channel to my (arbitraty) Tcl proc. The client, instead of getting the response body, parses only the returned http headers to see that the connection succeeded and then passes this socket to my custom Tcl proc. Now, on the sever and on the client side you have a channel and both sides can use it to establish whatever protocol they want. They have a socket pipe between. The http is just used to handshake the peers, clear? What we do with this? We have two server instances on two different hosts. We open a channel (using the above trick) from one to another and then use this to generate a stream of preformatted packets (our protocol) on one side which does the backup of data from to the other side ultimately to the tape drive(s) attached to that host. We shuffle TB of data this way daily in about 500 installations worldwide, so I know this works fine. Now, see, this has nothing to do with HTTP1.1. or 1.0 alone. And yes, if you look in the implementation, you will see that it is a simple wrapper to the Ns_ConnSock. It simply wraps a Tcl channel around the the connection socket, nothing more and nothing less. And, I would not say that it requires a broken client. It just requires a special client-processing. This is not ment for browser-clients. It has nothing to do with serving pages or any other short lived connections. Once we setup such a connection, we use it for hours and maybe days. We could have achieved this with a simple socket connection from client to a specific port on the server. But we are not able to use any arbitrary port. All we have is port 80 (the web server port). So we must play by the http rules initially when accessing the server. When we passed this http handshake stuff, we are left with a socket-to-socket comm-channel which is transparent and can be used for any purpose whatsoever. Now, is this little bit more clear now? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-05 04:32 Message: Logged In: YES user_id=184124 What is broken about it? It is the Tcl interface to Ns_ConnSock(), just Tcl way, using channel. The merit is to use sync inter-exchange between client and server using established connection. Client is not broken, it is designed for this exchange, it is the tool to build messaging, not general purpose protocol. It make naviserver more flexible platform for implementing different ppas on top of it using Tcl, that's all. Btw, when are you going to commit your protocol extensions? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 01:20 Message: Logged In: YES user_id=87254 This command isn't a good idea because it creates a broken interface which requires a broken client to work, and I still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do it, then it may still have some merit. But Ns_ConnSock() is trivial and gives you all you need to create a Tcl channel. We've already decided to do this the right way, whatever that turns out to be, so why should we simultaneously also do it the wrong way? In fact you seem to agree, noting that this sucks because it's not UDP and the overhead of HTTP is just too much for you. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 16:36 Message: Logged In: YES user_id=184124 And that's why i want to add simple UDP support to the server: for implementing simple messaging between servers in the cluster, using UDP without HTTP overhead will make it simple and faster. Using Tcl channel is intermediate solution for such messaging because it eliminates making multiple HTTP requests to the server but still uses heavy TCP/HTTP overhead. Still i think this command is very usefull and will make server more versatile provided that we will supply usage examples with the server as well. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 16:26 Message: Logged In: YES user_id=95086 "Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again." This is true. It is a regular http request, just for the body part, the peers are engaging in a bidirectional communication and can do whatever they pleased. The request is considered done when one of the peers close the socket. Strictly speaking, I'm using the http body and this trick to implement arbitrary client-server comm channel. This will not work for browsers of course. It has nothing to do with browsers at all in fact. It arose out of the necessity to support alternate protocols given what we had in AS, that is pure http machinery. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 15:38 Message: Logged In: YES user_id=184124 On the closing note, keepalive has nothing to do with this and should not be considered in this case. Even if you connect using HTTP 1.1 and specify keepalive, sever has the right to issue Connection: close header which will disable keepalive. So, it WILL work for all connections. Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again. And pipelining is a bad example here, the point it to send the request and wait for the answer and base on that send another request. Pipelining will allow you just to submit all requests and then later receive replies, it was designed for speeding browser's connection especially retrieving images. It looks to me Stephen you are all for HTTP core and around-the-HTTP-driver solutions only. This will make simple solutions complex because of getting around/workaround HTTP based core in the server. What is wrong with one simpe function which will give the server more options without breaking anything? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 10:45 Message: Logged In: YES user_id=95086 I see. The gotcha's are: o. it's yet another implementation of non-HTTP protocol o. (HTTP 1.0 only, and no Keepalive header) be used to get o. function in the docs might reasonably expect it to work for all HTTP connections That's pretty clear. Well, I think you are right. This is our attempt to work around http-only caps of the server. Agreed. It will be better to get it done properly and we will then internally switch to the new implementation when it's ready. At the moment I have all this in a separate module and this can stay so until the multi-protocol caps are agreed upon and implemented. I will close this RFE then. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 10:36 Message: Logged In: YES user_id=87254 I don't understand why you're not using standard HTTP pipelining. It would seem to be so much easier than having to write custom client and server code. This is a small change, but I'm not excited about it because it's yet another implementation of non-HTTP protocol support, which makes 3 so far... :-( It also seems to require that a deliberately broken client (HTTP 1.0 only, and no Keepalive header) be used to get around the server's io read-ahead. Someone seeing this function in the docs might reasonably expect it to work for all HTTP connections. Also, I don't think this requires core support. Ns_ConnSock() will return the underlying file descriptor for a connection. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 09:36 Message: Logged In: YES user_id=95086 OK. Maybe I should explain little bit more what I'm using this for. Imagine you'd like to make a client-server connection and then exchange arbitrary data on this connection. Imagine something like telnet connection. You telnet from client to server, get a comm-pipe betwen them and then you can shuffle whatever you like across this comm-pipe. What we have is a http server responding only to http requests (as it is now, w/o multiprotocol support). So, what I do is to use http to setup the connection, have a registered procedure on the server grab the socket and install my custom protocol handler on it. The http is only used to establish the communication. All the rest is handled by my custom code on the server and on the client. Once the connection is setup, we have a point-to-point socket communication link between server and client and we can use it for just about anything. All you need for this is a client-side http-library (we have ours written in Tcl) which will give you the socket back once the connection is established) and serve-side proc implementing your protocol of choice. So, the entire server http-machinery stays in place. I do not see any problem there because we have been using this trick for about 4 years in production very successfully. What I do not know is wether there are any gotcha's with that approach (we haven't hit any yet). I also do not know what will happen when we add alternate multiprotocol caps to server. Maybe this will become obsolete. But, as I see this now, the change is trivial and does not hurt anybody. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 00:21 Message: Logged In: YES user_id=87254 HTTP also has (although our server doesn't, but I want to add this) pipelining, which is sending multiple requests at once without first waiting for the reply to previous requests. That sounds like your "i do it when I send multiple http requests". ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 23:45 Message: Logged In: YES user_id=184124 Keepalive works when i close my connection, i.e. when i exited from conn thread, but the point here is to do it from the conn thread, i received request from my other server sending me some info, i send my info back and get ack. Everything inside one connection, before keepalive. And yes, i send messages in my format, anyway i do it when i send multiple http reaqests, i am interested in message body only. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 23:32 Message: Logged In: YES user_id=87254 HTTP already handles sending multiple requests accross one socket with keepalives. It might be more useful to add keepalives to our HTTP client code. In the case of HTTP keepalives, the socket is returned to the driver thread after the first request, still open, and the conn thread goes on to do other useful work. If a new request arrives before the connection times out it is passed to the next free conn thread. If you don't usee HTTP with keepalives you will have to invent some kind of message chunking syntax and some kind of time-out protocol. But you will still end up wasting conn threads as they idle waiting for possible extra messages. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 23:20 Message: Logged In: YES user_id=184124 I could use it in my apps where i have cluster of several servers exchanging http requests between each other, sometimes one session needs 2 or more exchanges. With this channel i can exchange using same connection and i need to send/receive data, not headers, so this can save and improve my operations. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 23:12 Message: Logged In: YES user_id=87254 Usually, by the time your script runs and you call [ns_conn channel], the data from the client will already have been read and buffered. You can access this with [ns_conn content]. Does reading work? What's the objective? Do you want to hadle streaming data, or make reading and writing more Tcl-like? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 23:07 Message: Logged In: YES user_id=184124 Very usefull, commit it ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 17:51:54
|
Feature Requests item #1156875, was opened at 2005-03-04 20:03 Message generated for change (Settings changed) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Watchdog process restarts failed server Initial Comment: We have been using this for quite some time and it proved extremely useful. We doublefork the nsd process and make the first forked instance control the second. The first one (the watchdog) reacts on exit codes and signals caught during the watch and correspondingly restarts the second instance (the worker). Also, we have added the the "-restart" option to the "ns_shutdown" command. This just sends the SIGINT to the worker process. The watchdog is handling this signal and respawns the worker automatically. During operation, the watchdog logs events and their cause into the system log file. This looks like: Feb 28 04:00:05 Develop nsd[19400]: worker: started. Mar 1 04:00:13 Develop nsd[4475]: watchdog: worker 19400 exited (2). Mar 1 04:00:15 Develop nsd[21290]: worker: started. Mar 1 04:00:18 Develop nsd[14705]: watchdog: worker 19399 exited (2). Mar 1 04:00:20 Develop nsd[21300]: worker: started. We have done all the changes with "--enable-watchdog" so anybody who needs this feature will have to compile with this option. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 18:51 Message: Logged In: YES user_id=95086 Added to cvs head. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 12:46 Message: Logged In: YES user_id=87254 Oh, I thought you were going to commit this. Sure, revert the exit() stuff and commit away, or I'll get to it at the wekend. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 12:23 Message: Logged In: YES user_id=95086 Now, what? You go change/remove _exit and commit? Other things are ok for me. Or should I commit? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 12:02 Message: Logged In: YES user_id=87254 Calling _exit() was an attempt to clarify the intent of the code. But I didn't pay attention to the fact that the main function is pluggable, so I guess that's a no-go. 4 hours of retry time before giving up sounds almost like 'inifinite'... Looks like we can get something suitable without adding extra switches, so we can commit this and tweak the timing later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-15 14:54 Message: Logged In: YES user_id=95086 Stephen, why do you call _exit(0) instead of returning control to caller routine? Also.. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. Restart time doubling is limited to 64 seconds (insane, should rather be 60 seconds, heh) so it should not double after reaching 60 secs. After that time we constantly wait 60 secs between restarts. With the server retry of 16 times and we will wait 1, 2, 4, 8, 16, 32, 64 (i.e. total of 127) seconds and after that 8 times 64 secs (= 512 total ) secs and eventually abort. All together, that will be 640 secs = about 10 minutes wait/retry. After that it will exit. I think if we extend retry count from 16 to 256 it will be pretty enough (about 4 hours) where one can fix the failure cause. Otherwise, I see syntactic problems of how to actually specify the retries parameter. Of curse, you can do something like bin/nsd -w -r 16 but the "-r" (restart) would really have no meaning w/o -w and I do not like this very much indeed. I'm really for bumping the retry count higher (127 or 256 tries). ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-15 01:50 Message: Logged In: YES user_id=184124 Last time addition: Can we make syslog function public, Ns_Syslog? Also Tcl interface to it would be usefull as well, below is syslog from my nssys module, just a suggestion, no rush with this: static struct syslog_data { int opened; int facility; int options; char ident[32]; Tcl_HashTable *priorities; Tcl_HashTable *facilities; } syslog_data; /* Syslog initialization */ void SyslogInit() { memset(&syslog_data,0,sizeof(syslog_data)); syslog_data.options = LOG_PID; syslog_data.facility = LOG_USER; if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY))) strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1); syslog_data.facilities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS); AddEntry(syslog_data.facilities,"auth",LOG_AUTH); AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV); AddEntry(syslog_data.facilities,"cron",LOG_CRON); AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON); AddEntry(syslog_data.facilities,"kernel",LOG_KERN); AddEntry(syslog_data.facilities,"lpr",LOG_LPR); AddEntry(syslog_data.facilities,"mail",LOG_MAIL); AddEntry(syslog_data.facilities,"news",LOG_NEWS); AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG); AddEntry(syslog_data.facilities,"user",LOG_USER); AddEntry(syslog_data.facilities,"uucp",LOG_UUCP); AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0); AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1); AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2); AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3); AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4); AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5); AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6); AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7); syslog_data.priorities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS); AddEntry(syslog_data.priorities,"emerg",LOG_EMERG); AddEntry(syslog_data.priorities,"alert",LOG_ALERT); AddEntry(syslog_data.priorities,"crit",LOG_CRIT); AddEntry(syslog_data.priorities,"err",LOG_ERR); AddEntry(syslog_data.priorities,"error",LOG_ERR); AddEntry(syslog_data.priorities,"warning",LOG_WARNING); AddEntry(syslog_data.priorities,"notice",LOG_NOTICE); AddEntry(syslog_data.priorities,"info",LOG_INFO); AddEntry(syslog_data.priorities,"debug",LOG_DEBUG); } /* * sys_log * * Usage: * sys_log ?-facility f -options o -ident i? priority message * * facility - kernel, cron, authpriv, mail, local0, local1, daemon, local2, * news, local3, local4, local5, local6, syslog, local7, auth, uucp, lpr, user * options - list with any of { CONS NDELAY PERROR PID ODELAY NOWAIT } * ident - ident is prepended to every message, and is typically the program name * priority - info, alert, emerg, err, notice, warning, error, crit, debug * * Returns: * nothing */ static int SysLog(ClientData data,Tcl_Interp *interp,int argc,char **argv) { int i,priority = 0; Tcl_HashEntry *entry; if(argc < 1) { Tcl_AppendResult(interp,argv[0]," ?-facility f -options o -ident i? priority message",0); return TCL_ERROR; } for(i = 1;i < argc-1;) { if(!strcmp(argv[i],"-facility")) { entry = Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]); if(!entry) { Tcl_AppendResult(interp,"wrong facility ",argv[i+1],": available facilities: ",NULL); ListHash(interp,syslog_data.facilities); return TCL_ERROR; } syslog_data.facility = (int)Tcl_GetHashValue(entry); closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-options")) { syslog_data.options = 0; if(strstr(argv[i+1],"CONS")) syslog_data.options |= LOG_CONS; if(strstr(argv[i+1],"NDELAY")) syslog_data.options |= LOG_NDELAY; if(strstr(argv[i+1],"PERROR")) syslog_data.options |= LOG_PERROR; if(strstr(argv[i+1],"PID")) syslog_data.options |= LOG_PID; if(strstr(argv[i+1],"ODELAY")) syslog_data.options |= LOG_ODELAY; if(strstr(argv[i+1],"NOWAIT")) syslog_data.options |= LOG_NOWAIT; closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-ident")) { memset(syslog_data.ident,0,sizeof(syslog_data.ident)); strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1); closelog(); syslog_data.opened = 0; i += 2; continue; } break; } if(i < argc) { entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]); if(!entry) { Tcl_AppendResult(interp,"wrong level ",argv[i],": available levels: ",NULL); ListHash(interp,syslog_data.priorities); return TCL_ERROR; } priority = (int)Tcl_GetHashValue(entry); i++; } if(i < argc) { if(!syslog_data.opened) { openlog(syslog_data.ident,syslog_data.options,syslog_data.facility); syslog_data.opened = 1; } syslog(priority,argv[i]); return TCL_OK; } return TCL_OK; } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-15 00:44 Message: Logged In: YES user_id=87254 Oops, looks like I forgot to attach the patch... Here it is. Yes, the server will attempt to restart a number of times before giving up. I think this is as appropriate for config errors as it is for others, then there doesn't need to be a dual-standard fatal error. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 19:12 Message: Logged In: YES user_id=95086 I'd rather wait for Stephens code and see what's done there and then make final changes and commit. It should not take long. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 19:07 Message: Logged In: YES user_id=184124 Then i do not see any reason not committing it, we can have discussions about it for another couple of years, but until we have it and use it, it is just discussions. We aready agreed on having it in the core. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 19:00 Message: Logged In: YES user_id=95086 Stephen, You mind uploading your patch changes so I can peek into it? I think we should soon settle on some solution and move on. Vlad, The patch already does that. After 16 unsuccessfull restarts the watchdog exits. Between every restart we wait 1, 2, 4, 8, 16, 32, .... 16384 seconds and then exit. This would be: 32767 seconds or about 9 hours. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 15:53 Message: Logged In: YES user_id=184124 I would say, repeat configured number of time and then exit. This way all socket related problems will be cleared after couple of restartes, config problems will make server exit . ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 15:41 Message: Logged In: YES user_id=95086 Well, because I could not differentiate between config and later runtime errors, I thought its wiser to abort rather to repeatedly restart broken server. One possibility is to add different Ns_Fatal call like: Ns_FatalEx(int exitcode, char *fmt, ...); Or? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 15:12 Message: Logged In: YES user_id=87254 Are you sure Ns_Fatal should not restart the server? Many of the fatal errors are caused by bad configuration and I can see why you might want the server to exit immediately. But there are also runtime fatal errors, and here I'd expect the server to be restarted. Some config errors are due to external factors like missing directories or file system permissions and it would be nice for the server to come back up as soon as the issue resolved itself. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 10:52 Message: Logged In: YES user_id=95086 Calling watchdog before or after chroot has no real implications I believe. Also, when the server exits with Ns_Fatal, it hsould not be restarted, I will look into your changes today. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 07:27 Message: Logged In: YES user_id=87254 Oh yeah, and I moved the watchdog stuff eaven earlier, before the prebind and chroot. Hmm, now that I think about it it's only the prebind that really needs to happen after the watchdog is started, to ensure the sockets are always in a sane state... ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 07:17 Message: Logged In: YES user_id=87254 Am I reading this right? It looks like if the server calls Ns_Fatal() it will exit with code 1, but 0 and 1 are treated as OK exit codes and the server will not be restarted. I've attached a patch which changes the above, fixes the pid problem in a different way because I completely forgot you already posted below about that small glitch, use Ns_ParseObjv(), adds the -w switch, and a couple of small name changes. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 21:09 Message: Logged In: YES user_id=95086 This is correct. As soon as we agree on some implementation, I will put the -i processing and avoid starting watchdog for inittab starts. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 20:48 Message: Logged In: YES user_id=184124 There still should be possibility to run nsd from inittab, so when -i switch is given, no watchdog should be running, let /sbin/init to handle restarts ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 17:19 Message: Logged In: YES user_id=95086 small glich: After applying the patch, change nsmain.c from: nsconf.pid = serverPid: to nsconf.pid = getpid(); otherwise the pid file will contain the bogus server pid if the watchdog restarted it later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 16:26 Message: Logged In: YES user_id=95086 Another try... Important to note: SIGKILL (signal 9) cannot be handled hence if somebody kills the watchdog with SIGKILL, the server will be left lingering w/o the watchdog. This is important to know. I do not see any possibility how to recover in such cases (i.e. how to stop the server). Apart from that, all objections from Stephen are taken into account. Please try again. A new copy of watchdog.patch file is attached. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 06:20 Message: Logged In: YES user_id=184124 I just found this code in my old server sources, i just chnaged internal name to Ns/NS, used to run pretty stable. ------------------------------------------------------------------- #define NS_EXIT 99 static void NsTerminate(int sig) { printf("NS[%d] signal %d received...",getpid(),sig); // Kill the server with the same signal if(nsPid > 0) kill(nsPid,sig); // Exit in case of fatal signal if(sig != SIGHUP) { printf("NS[%d] terminating...",getpid()); exit(0); } // Reassign signal handler signal(sig,NsTerminate); } void NsWatchdog(int argc, char *argv[], char *envp[]) { int failcount = 0; time_t start; int status; pid_t pid; signal(SIGTTOU,SIG_IGN); signal(SIGTTIN,SIG_IGN); signal(SIGTSTP,SIG_IGN); signal(SIGPIPE,SIG_IGN); signal(SIGQUIT,SIG_IGN); signal(SIGHUP,NsTerminate); signal(SIGINT,NsTerminate); signal(SIGTERM,NsTerminate); // Go background if((pid = fork())) { if(pid < 0) err_logger("warpConfigure: fork: %s",strerror(errno)); exit(0); } setsid(); for(;;) { // Execute the real server nsPid = fork(); // Child, continue as server if(nsPid == 0) { exit(nsMain(argc, argv, ServerInit)); } /* parent, behaves like a guardian */ time(&start); printf("NS[%d] server process started",getpid()); pid = waitpid(-1, &status, 0); if(WIFEXITED(status)) printf("NS[%d] child process exited with status %d",pid,WEXITSTATUS(status)); else if(WIFSIGNALED(status)) printf("NS[%d] child process exited due to signal %d",pid,WTERMSIG(status)); else printf("NS[%d] child process exited", pid); // Special exit code if(WIFEXITED(status)) { if(WEXITSTATUS(status) == NS_EXIT) { printf("NS[%d] child configuration error, exiting",getpid()); exit(0); } else if(WEXITSTATUS(status) == SIGHUP) { } else { if(time(0) - start < 10) failcount++; if(failcount == 10) { printf("Exiting due to repeated, frequent failures"); exit(1); } } } sleep(3); } } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-08 05:23 Message: Logged In: YES user_id=87254 NsWatchdog() is called after the server drops root privs, so both the watchdog and the server run as the defined user. What happens if the server dies and on restart needs to rebind privileged ports? A ps listing shows two running processes, parent and child. If I kill either one, the watchdog dies, the server continues to run. If I kill -9 the parent, the child continues to run. If I kill -9 the child, the server is restarted. Something seems not quite right here... I'm a bit confused about how the code works. For example, NsWatchdog() seems to ignore all of it's arguments? Here's the code which calls it: if (mode == 0) { i = ns_fork(); if (i < 0) { Ns_Fatal("nsmain: fork() failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } setsid(); i = NsWatchdog(argc, argv, initProc); if (i < 0) { Ns_Fatal("nsmain: watchdog failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } nsconf.pid = getpid(); } NsWatchdog() says it returns the worker pid, it also sets the global variable wpid. The code above ignores the returned value, and the global variable, and instead calls getpid()... The variable 'i' is existing code, but still somehow doesn't seem suitable... Could that Ns_Fatal() above be moved into the NsWatchdog() function? I think maybe some comment is needed here. The code structure is very like that just above where ns_fork() is called, but this function will return *multiple* times, right? This is kind of suprising, and an extra twist on the already confusing fork() semantics (or maybe it's just me who gets confused by fork...). A return value of 0 here is a 'request for orderly shutdown', right? How about some more logging to syslog? For example, distinguish between start and restart. Mention when the MAX_SLEEP_PERIOD has been reached, etc. Couple of small things: Can we refer to 'the server', rather than 'the worker'? Worker and Watchdog begin with 'w', and so does the global variable wpid... Maybe serverPid? NsWatchdog is a static function, it doesn't need the Ns prefix. In NsWatchdog(), the variable 'run' should be something like 'nretries', 'nap' should be something more like 'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS. The comment for WaitForWorker() is misslabeled. Should SigHandler() be called WatchdogSigHandler()? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-07 00:03 Message: Logged In: YES user_id=184124 Looks good to me ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 13:06 Message: Logged In: YES user_id=95086 Ah, correction: The restart option sends SIGINT to the worker process which causes the watchdog to restart it. And, the patchfile is now attached! ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 13:03 Message: Logged In: YES user_id=95086 Here is the patch. I have added "-restart" option to "ns_shutdown". It is rather clumsy to parse but should do. We should rewrite this with your args parsing routine. The restart option sends SIGTERM to the worker process which causes the watchdog to restart it. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 00:46 Message: Logged In: YES user_id=87254 I don't think this has to hide behind a config option. It's either a good idea or it's not. Sounds good to me. Is there a patch? I'm wondering about some of the implementation details... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 14:22:31
|
Feature Requests item #1161597, was opened at 2005-03-11 19:30 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Vlad Seryakov (seryakov) Assigned to: Vlad Seryakov (seryakov) Summary: Extend ns_info with info about traces/filters/procs Initial Comment: Attached is patch that extends ns_info with 3 commands: ns_info traces ns_info filters ns_info requestprocs They work the same way as ns_info callbacks, no functionality changed or added, just information commands ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-17 14:22 Message: Logged In: YES user_id=184124 Yes, this is what i did with UrlSpecific walk function, it will call Ns_ArpProc for every node which will append what-ever info to DString. We can live with that for the beginning, anyway, that info will be used in Tcl and should be converted into strings. So, i guess i can add info proc for UrlSpecific trie? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 11:06 Message: Logged In: YES user_id=87254 To get info on the binary callbacks we're just going to have to register proc info for all the default filters, request procs etc. If we cover everything there should be a reasonable number of examples for module writers to follow. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-13 01:47 Message: Logged In: YES user_id=184124 Also, somein docs would be usefull to put info how to get info about some binary callbacks: Welcome to ossweb running at /usr/local/ns/bin/nsd (pid 5524) ossweb:nscp 1> ns_info traces p:0xb7187be0 a:0x80817d8 # gdb (gdb) attach 12345 (gdb) info symbol 0xb7187be0 LogTrace in section .text ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-13 01:44 Message: Logged In: YES user_id=184124 I started with minimal changes, #defines and &syntax can be chnages of course. now that you started TclCallbacks, this thing depends when you commit your changes, it requires Ns_ArgProc and ns_info stuff. Once we have unified API for callbask i will change it to be server-specific as well, it took a while to figure out how it works :-)) ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-13 01:37 Message: Logged In: YES user_id=87254 This is a nice addition. ns_info filters/traces is server specific, but ns_info requestprocs returns info for all servers. Can you make this per-server also, to be consistent? The URL walking procs look a little tricky, I tnink they would benefit from having their own more detailed comments. The comment currently says it calls a function for each node, and the effect depends on the function. But the function signature is Ns_ArgProc so there's really only one thing it can do. Why does it fix the stack size at 512? Can these magic numbers be #define'd. Using a while loop rather than an empty for loop would be a little easier to read. There's a lot of this kind of thing: (&(triePtr->branches))->n Could this be simplified to: triePtr->branches.n ? Otherwise, looks pretty good. I like it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 14:20:28
|
Feature Requests item #1156141, was opened at 2005-03-03 20:02 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 Category: None Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add ns_conn channel command Initial Comment: Extend the ns_conn command to allow wrapping an Tcl channel arround it. This channel can be used to talk to the remote client afterwards. The idea is simple. The client uses standard http protocol to open up a connection to the server. Server accepts the connection and obtains the channel from it. The client also gets the channel to the server and from this point on, both can exchange arbitrary data. Example implementation is given as a patch against the current CVS state. ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-17 14:20 Message: Logged In: YES user_id=184124 i do not ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 12:10 Message: Logged In: YES user_id=95086 OK, If you do not mind, I'll add ns_conn channel It will return the Tcl channel you can use for talking to the client. When you've finished, you just "close $sock" and finish the proc. The connection thread is then ready for other users/requests. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 11:41 Message: Logged In: YES user_id=87254 Yes, I get it :-) My only point was that HTTP is not a short lived connection protocol, if you don't want it to be. Lengthy connections with multiple messages seems to be one of the key features here. The *only* thing HTTP can't do is have the server initiate a message to the client. For the kinds of cases you're talking about here with multi-hour long connections, I'd be wanting some kind of 'ping' request from the client every so often just to make sure the server was idle and not broken. The ping becomes a 'what messages do you have for me?' request, the server's response is itself a request to the client. Anyway, I was talking pure nonsense about the keepalive stuff; the client doen't have to be so broken after all. I'm fine with you adding this feature. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 09:03 Message: Logged In: YES user_id=95086 "still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do Stephen, I do not think you understood my point. What this command buys me is to be able to initiate an http connection and reuse the socket-to-socket comm channel for other purposes. For example we make "GET" request to a certain URL and then when the connection gets established, the server, takes the conn socket, wraps a Tcl channel around it and passes this channel to my (arbitraty) Tcl proc. The client, instead of getting the response body, parses only the returned http headers to see that the connection succeeded and then passes this socket to my custom Tcl proc. Now, on the sever and on the client side you have a channel and both sides can use it to establish whatever protocol they want. They have a socket pipe between. The http is just used to handshake the peers, clear? What we do with this? We have two server instances on two different hosts. We open a channel (using the above trick) from one to another and then use this to generate a stream of preformatted packets (our protocol) on one side which does the backup of data from to the other side ultimately to the tape drive(s) attached to that host. We shuffle TB of data this way daily in about 500 installations worldwide, so I know this works fine. Now, see, this has nothing to do with HTTP1.1. or 1.0 alone. And yes, if you look in the implementation, you will see that it is a simple wrapper to the Ns_ConnSock. It simply wraps a Tcl channel around the the connection socket, nothing more and nothing less. And, I would not say that it requires a broken client. It just requires a special client-processing. This is not ment for browser-clients. It has nothing to do with serving pages or any other short lived connections. Once we setup such a connection, we use it for hours and maybe days. We could have achieved this with a simple socket connection from client to a specific port on the server. But we are not able to use any arbitrary port. All we have is port 80 (the web server port). So we must play by the http rules initially when accessing the server. When we passed this http handshake stuff, we are left with a socket-to-socket comm-channel which is transparent and can be used for any purpose whatsoever. Now, is this little bit more clear now? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-05 03:32 Message: Logged In: YES user_id=184124 What is broken about it? It is the Tcl interface to Ns_ConnSock(), just Tcl way, using channel. The merit is to use sync inter-exchange between client and server using established connection. Client is not broken, it is designed for this exchange, it is the tool to build messaging, not general purpose protocol. It make naviserver more flexible platform for implementing different ppas on top of it using Tcl, that's all. Btw, when are you going to commit your protocol extensions? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 00:20 Message: Logged In: YES user_id=87254 This command isn't a good idea because it creates a broken interface which requires a broken client to work, and I still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do it, then it may still have some merit. But Ns_ConnSock() is trivial and gives you all you need to create a Tcl channel. We've already decided to do this the right way, whatever that turns out to be, so why should we simultaneously also do it the wrong way? In fact you seem to agree, noting that this sucks because it's not UDP and the overhead of HTTP is just too much for you. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 15:36 Message: Logged In: YES user_id=184124 And that's why i want to add simple UDP support to the server: for implementing simple messaging between servers in the cluster, using UDP without HTTP overhead will make it simple and faster. Using Tcl channel is intermediate solution for such messaging because it eliminates making multiple HTTP requests to the server but still uses heavy TCP/HTTP overhead. Still i think this command is very usefull and will make server more versatile provided that we will supply usage examples with the server as well. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 15:26 Message: Logged In: YES user_id=95086 "Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again." This is true. It is a regular http request, just for the body part, the peers are engaging in a bidirectional communication and can do whatever they pleased. The request is considered done when one of the peers close the socket. Strictly speaking, I'm using the http body and this trick to implement arbitrary client-server comm channel. This will not work for browsers of course. It has nothing to do with browsers at all in fact. It arose out of the necessity to support alternate protocols given what we had in AS, that is pure http machinery. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 14:38 Message: Logged In: YES user_id=184124 On the closing note, keepalive has nothing to do with this and should not be considered in this case. Even if you connect using HTTP 1.1 and specify keepalive, sever has the right to issue Connection: close header which will disable keepalive. So, it WILL work for all connections. Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again. And pipelining is a bad example here, the point it to send the request and wait for the answer and base on that send another request. Pipelining will allow you just to submit all requests and then later receive replies, it was designed for speeding browser's connection especially retrieving images. It looks to me Stephen you are all for HTTP core and around-the-HTTP-driver solutions only. This will make simple solutions complex because of getting around/workaround HTTP based core in the server. What is wrong with one simpe function which will give the server more options without breaking anything? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 09:45 Message: Logged In: YES user_id=95086 I see. The gotcha's are: o. it's yet another implementation of non-HTTP protocol o. (HTTP 1.0 only, and no Keepalive header) be used to get o. function in the docs might reasonably expect it to work for all HTTP connections That's pretty clear. Well, I think you are right. This is our attempt to work around http-only caps of the server. Agreed. It will be better to get it done properly and we will then internally switch to the new implementation when it's ready. At the moment I have all this in a separate module and this can stay so until the multi-protocol caps are agreed upon and implemented. I will close this RFE then. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 09:36 Message: Logged In: YES user_id=87254 I don't understand why you're not using standard HTTP pipelining. It would seem to be so much easier than having to write custom client and server code. This is a small change, but I'm not excited about it because it's yet another implementation of non-HTTP protocol support, which makes 3 so far... :-( It also seems to require that a deliberately broken client (HTTP 1.0 only, and no Keepalive header) be used to get around the server's io read-ahead. Someone seeing this function in the docs might reasonably expect it to work for all HTTP connections. Also, I don't think this requires core support. Ns_ConnSock() will return the underlying file descriptor for a connection. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 08:36 Message: Logged In: YES user_id=95086 OK. Maybe I should explain little bit more what I'm using this for. Imagine you'd like to make a client-server connection and then exchange arbitrary data on this connection. Imagine something like telnet connection. You telnet from client to server, get a comm-pipe betwen them and then you can shuffle whatever you like across this comm-pipe. What we have is a http server responding only to http requests (as it is now, w/o multiprotocol support). So, what I do is to use http to setup the connection, have a registered procedure on the server grab the socket and install my custom protocol handler on it. The http is only used to establish the communication. All the rest is handled by my custom code on the server and on the client. Once the connection is setup, we have a point-to-point socket communication link between server and client and we can use it for just about anything. All you need for this is a client-side http-library (we have ours written in Tcl) which will give you the socket back once the connection is established) and serve-side proc implementing your protocol of choice. So, the entire server http-machinery stays in place. I do not see any problem there because we have been using this trick for about 4 years in production very successfully. What I do not know is wether there are any gotcha's with that approach (we haven't hit any yet). I also do not know what will happen when we add alternate multiprotocol caps to server. Maybe this will become obsolete. But, as I see this now, the change is trivial and does not hurt anybody. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 23:21 Message: Logged In: YES user_id=87254 HTTP also has (although our server doesn't, but I want to add this) pipelining, which is sending multiple requests at once without first waiting for the reply to previous requests. That sounds like your "i do it when I send multiple http requests". ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 22:45 Message: Logged In: YES user_id=184124 Keepalive works when i close my connection, i.e. when i exited from conn thread, but the point here is to do it from the conn thread, i received request from my other server sending me some info, i send my info back and get ack. Everything inside one connection, before keepalive. And yes, i send messages in my format, anyway i do it when i send multiple http reaqests, i am interested in message body only. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 22:32 Message: Logged In: YES user_id=87254 HTTP already handles sending multiple requests accross one socket with keepalives. It might be more useful to add keepalives to our HTTP client code. In the case of HTTP keepalives, the socket is returned to the driver thread after the first request, still open, and the conn thread goes on to do other useful work. If a new request arrives before the connection times out it is passed to the next free conn thread. If you don't usee HTTP with keepalives you will have to invent some kind of message chunking syntax and some kind of time-out protocol. But you will still end up wasting conn threads as they idle waiting for possible extra messages. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 22:20 Message: Logged In: YES user_id=184124 I could use it in my apps where i have cluster of several servers exchanging http requests between each other, sometimes one session needs 2 or more exchanges. With this channel i can exchange using same connection and i need to send/receive data, not headers, so this can save and improve my operations. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 22:12 Message: Logged In: YES user_id=87254 Usually, by the time your script runs and you call [ns_conn channel], the data from the client will already have been read and buffered. You can access this with [ns_conn content]. Does reading work? What's the objective? Do you want to hadle streaming data, or make reading and writing more Tcl-like? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 22:07 Message: Logged In: YES user_id=184124 Very usefull, commit it ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 14:16:24
|
Feature Requests item #1162223, was opened at 2005-03-13 00:22 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Stephen Deasey (sdeasey) Assigned to: Stephen Deasey (sdeasey) Summary: Tcl Callbacks Initial Comment: The server offers a dual C/Tcl interface. There are a number of of places where Tcl code needs to be executed by the C core, or the core doesn't know whether a callback is C or Tcl. I've attached the patch naviserver-4.1-tclcallbacks.patch. It extracts the Tcl callback infrastructure from tclsched.c into tclcallbacks.c and makes it a public interface. Here it is: Ns_TclNewCallback() Ns_TclNewCallbackObj() Ns_TclEvalCallback() Ns_TclCallbackProc() Ns_TclFreeCallback() Ns_TclCallbackArgProc() It provides a standard interface for Tcl callbacks throughout the server, handling interp allocation, introspection, evaluation and cleanup. Also, tclcallbacks.c is somewhat analogous to callbacks.c, and so the generic ns_atexit etc. callbacks which were in tclsched.c have been moved to the tclcallbacks file. The first set of code to be converted to the new interface is tclsched.c itself, and I've converted to using Ns_ParseObjv() while I was in there. File size has dropped ~250 lines. One thing I'm wondering about is whether the Ns_TclEvalCallback() routine should take extra args to send to the Tcl proc. There seems to be no standard scheme for ordering the args so there still needs to be a way to manually construct the command, which is why the members of the Ns_TclCallback struct are public. But for this very reason, it might be worth providing a standard way to do this so all future callbacks are consistent. It would be nice in the future to come up with some way of registereing these callback points such that you don't have to create an ns_info subcommand for example manually. It should be possible for loadable modules to register their callback points and have them show up in the right place. ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-17 14:16 Message: Logged In: YES user_id=184124 Yes, let's start slowly, we know the direction we are going, let 's do it step by step starting with simple things. We can have generic Tcl callbacks already and all new callbacks will use it and we will see how flexible this mechanism is. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 11:28 Message: Logged In: YES user_id=87254 Yes, I forgot to mention that Tcl callbacks are a little more general than those for C in callbacks.c. Those take a single arg which is the context you registered with the proc, and return void. Tcl callbacks on the other hand can return a result which at the C level gets appended to a dstring if you pass that in. The members of the Ns_TclCallback structure are public so that you can construct a command and args in any format. This is the question I was asking originally: should there be a default recomended way to do this? So for example Ns_TclEvalCallback() might take varargs cstrings, which if passed would be added to the tcl command to execute: command ?cbarg1? ?cbarg2? ?context? Where cbarg2 and cbarg2 are specific to the type of callback and context is what was registered with the proc by the Tcl programmer. Unfortunately that wouldn't cover all the bases as some of the existing APIs do weird things like place callback specific args after the context or if no context was passed, always append an empty string. For new callbacks, something standard like the above scheme should be used. If you look at the patch for virtual hosting you'll see that two new Tcl callbacks were created and they contain simillar boiler plate. This makes me think that maybe there should be something like an [ns_addcallback serverroot myrootproc mycontext] interface. It's a little more complicated than that because some callbacks are singular, like the serverroot proc, and some are plural, like filters. So maybe the thing to do is get some experience with this simple version and then expand it later... ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-13 00:46 Message: Logged In: YES user_id=184124 It looks good, unified callback interface would be better but Tcl filters. traces, requestprocs uses different arguments/parameters and currently the only way to get info about them to use ns_info interface and Ns_ArgProc interface. Tcl_Callback are for simple proc/args callbacks, filters is special case, see if you can convert them. But i am for this change. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-13 00:30 Message: Logged In: YES user_id=87254 I've attached a patch which converts the tclrequest.c file to use the new callback API as an example. It also uses the Ns_ParseObjv() API, and I've removed support for the legacy conn variable. It removes ~200 lines and all utility functions. The callback API provides default Ns_ArgProc introspection, but this will be much nicer when combined with Vlad's RFE 1161597 :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 12:10:54
|
Feature Requests item #1156141, was opened at 2005-03-03 21:02 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 Category: None Group: None >Status: Open >Resolution: Accepted Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add ns_conn channel command Initial Comment: Extend the ns_conn command to allow wrapping an Tcl channel arround it. This channel can be used to talk to the remote client afterwards. The idea is simple. The client uses standard http protocol to open up a connection to the server. Server accepts the connection and obtains the channel from it. The client also gets the channel to the server and from this point on, both can exchange arbitrary data. Example implementation is given as a patch against the current CVS state. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 13:10 Message: Logged In: YES user_id=95086 OK, If you do not mind, I'll add ns_conn channel It will return the Tcl channel you can use for talking to the client. When you've finished, you just "close $sock" and finish the proc. The connection thread is then ready for other users/requests. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 12:41 Message: Logged In: YES user_id=87254 Yes, I get it :-) My only point was that HTTP is not a short lived connection protocol, if you don't want it to be. Lengthy connections with multiple messages seems to be one of the key features here. The *only* thing HTTP can't do is have the server initiate a message to the client. For the kinds of cases you're talking about here with multi-hour long connections, I'd be wanting some kind of 'ping' request from the client every so often just to make sure the server was idle and not broken. The ping becomes a 'what messages do you have for me?' request, the server's response is itself a request to the client. Anyway, I was talking pure nonsense about the keepalive stuff; the client doen't have to be so broken after all. I'm fine with you adding this feature. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 10:03 Message: Logged In: YES user_id=95086 "still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do Stephen, I do not think you understood my point. What this command buys me is to be able to initiate an http connection and reuse the socket-to-socket comm channel for other purposes. For example we make "GET" request to a certain URL and then when the connection gets established, the server, takes the conn socket, wraps a Tcl channel around it and passes this channel to my (arbitraty) Tcl proc. The client, instead of getting the response body, parses only the returned http headers to see that the connection succeeded and then passes this socket to my custom Tcl proc. Now, on the sever and on the client side you have a channel and both sides can use it to establish whatever protocol they want. They have a socket pipe between. The http is just used to handshake the peers, clear? What we do with this? We have two server instances on two different hosts. We open a channel (using the above trick) from one to another and then use this to generate a stream of preformatted packets (our protocol) on one side which does the backup of data from to the other side ultimately to the tape drive(s) attached to that host. We shuffle TB of data this way daily in about 500 installations worldwide, so I know this works fine. Now, see, this has nothing to do with HTTP1.1. or 1.0 alone. And yes, if you look in the implementation, you will see that it is a simple wrapper to the Ns_ConnSock. It simply wraps a Tcl channel around the the connection socket, nothing more and nothing less. And, I would not say that it requires a broken client. It just requires a special client-processing. This is not ment for browser-clients. It has nothing to do with serving pages or any other short lived connections. Once we setup such a connection, we use it for hours and maybe days. We could have achieved this with a simple socket connection from client to a specific port on the server. But we are not able to use any arbitrary port. All we have is port 80 (the web server port). So we must play by the http rules initially when accessing the server. When we passed this http handshake stuff, we are left with a socket-to-socket comm-channel which is transparent and can be used for any purpose whatsoever. Now, is this little bit more clear now? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-05 04:32 Message: Logged In: YES user_id=184124 What is broken about it? It is the Tcl interface to Ns_ConnSock(), just Tcl way, using channel. The merit is to use sync inter-exchange between client and server using established connection. Client is not broken, it is designed for this exchange, it is the tool to build messaging, not general purpose protocol. It make naviserver more flexible platform for implementing different ppas on top of it using Tcl, that's all. Btw, when are you going to commit your protocol extensions? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 01:20 Message: Logged In: YES user_id=87254 This command isn't a good idea because it creates a broken interface which requires a broken client to work, and I still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do it, then it may still have some merit. But Ns_ConnSock() is trivial and gives you all you need to create a Tcl channel. We've already decided to do this the right way, whatever that turns out to be, so why should we simultaneously also do it the wrong way? In fact you seem to agree, noting that this sucks because it's not UDP and the overhead of HTTP is just too much for you. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 16:36 Message: Logged In: YES user_id=184124 And that's why i want to add simple UDP support to the server: for implementing simple messaging between servers in the cluster, using UDP without HTTP overhead will make it simple and faster. Using Tcl channel is intermediate solution for such messaging because it eliminates making multiple HTTP requests to the server but still uses heavy TCP/HTTP overhead. Still i think this command is very usefull and will make server more versatile provided that we will supply usage examples with the server as well. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 16:26 Message: Logged In: YES user_id=95086 "Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again." This is true. It is a regular http request, just for the body part, the peers are engaging in a bidirectional communication and can do whatever they pleased. The request is considered done when one of the peers close the socket. Strictly speaking, I'm using the http body and this trick to implement arbitrary client-server comm channel. This will not work for browsers of course. It has nothing to do with browsers at all in fact. It arose out of the necessity to support alternate protocols given what we had in AS, that is pure http machinery. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 15:38 Message: Logged In: YES user_id=184124 On the closing note, keepalive has nothing to do with this and should not be considered in this case. Even if you connect using HTTP 1.1 and specify keepalive, sever has the right to issue Connection: close header which will disable keepalive. So, it WILL work for all connections. Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again. And pipelining is a bad example here, the point it to send the request and wait for the answer and base on that send another request. Pipelining will allow you just to submit all requests and then later receive replies, it was designed for speeding browser's connection especially retrieving images. It looks to me Stephen you are all for HTTP core and around-the-HTTP-driver solutions only. This will make simple solutions complex because of getting around/workaround HTTP based core in the server. What is wrong with one simpe function which will give the server more options without breaking anything? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 10:45 Message: Logged In: YES user_id=95086 I see. The gotcha's are: o. it's yet another implementation of non-HTTP protocol o. (HTTP 1.0 only, and no Keepalive header) be used to get o. function in the docs might reasonably expect it to work for all HTTP connections That's pretty clear. Well, I think you are right. This is our attempt to work around http-only caps of the server. Agreed. It will be better to get it done properly and we will then internally switch to the new implementation when it's ready. At the moment I have all this in a separate module and this can stay so until the multi-protocol caps are agreed upon and implemented. I will close this RFE then. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 10:36 Message: Logged In: YES user_id=87254 I don't understand why you're not using standard HTTP pipelining. It would seem to be so much easier than having to write custom client and server code. This is a small change, but I'm not excited about it because it's yet another implementation of non-HTTP protocol support, which makes 3 so far... :-( It also seems to require that a deliberately broken client (HTTP 1.0 only, and no Keepalive header) be used to get around the server's io read-ahead. Someone seeing this function in the docs might reasonably expect it to work for all HTTP connections. Also, I don't think this requires core support. Ns_ConnSock() will return the underlying file descriptor for a connection. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 09:36 Message: Logged In: YES user_id=95086 OK. Maybe I should explain little bit more what I'm using this for. Imagine you'd like to make a client-server connection and then exchange arbitrary data on this connection. Imagine something like telnet connection. You telnet from client to server, get a comm-pipe betwen them and then you can shuffle whatever you like across this comm-pipe. What we have is a http server responding only to http requests (as it is now, w/o multiprotocol support). So, what I do is to use http to setup the connection, have a registered procedure on the server grab the socket and install my custom protocol handler on it. The http is only used to establish the communication. All the rest is handled by my custom code on the server and on the client. Once the connection is setup, we have a point-to-point socket communication link between server and client and we can use it for just about anything. All you need for this is a client-side http-library (we have ours written in Tcl) which will give you the socket back once the connection is established) and serve-side proc implementing your protocol of choice. So, the entire server http-machinery stays in place. I do not see any problem there because we have been using this trick for about 4 years in production very successfully. What I do not know is wether there are any gotcha's with that approach (we haven't hit any yet). I also do not know what will happen when we add alternate multiprotocol caps to server. Maybe this will become obsolete. But, as I see this now, the change is trivial and does not hurt anybody. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 00:21 Message: Logged In: YES user_id=87254 HTTP also has (although our server doesn't, but I want to add this) pipelining, which is sending multiple requests at once without first waiting for the reply to previous requests. That sounds like your "i do it when I send multiple http requests". ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 23:45 Message: Logged In: YES user_id=184124 Keepalive works when i close my connection, i.e. when i exited from conn thread, but the point here is to do it from the conn thread, i received request from my other server sending me some info, i send my info back and get ack. Everything inside one connection, before keepalive. And yes, i send messages in my format, anyway i do it when i send multiple http reaqests, i am interested in message body only. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 23:32 Message: Logged In: YES user_id=87254 HTTP already handles sending multiple requests accross one socket with keepalives. It might be more useful to add keepalives to our HTTP client code. In the case of HTTP keepalives, the socket is returned to the driver thread after the first request, still open, and the conn thread goes on to do other useful work. If a new request arrives before the connection times out it is passed to the next free conn thread. If you don't usee HTTP with keepalives you will have to invent some kind of message chunking syntax and some kind of time-out protocol. But you will still end up wasting conn threads as they idle waiting for possible extra messages. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 23:20 Message: Logged In: YES user_id=184124 I could use it in my apps where i have cluster of several servers exchanging http requests between each other, sometimes one session needs 2 or more exchanges. With this channel i can exchange using same connection and i need to send/receive data, not headers, so this can save and improve my operations. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 23:12 Message: Logged In: YES user_id=87254 Usually, by the time your script runs and you call [ns_conn channel], the data from the client will already have been read and buffered. You can access this with [ns_conn content]. Does reading work? What's the objective? Do you want to hadle streaming data, or make reading and writing more Tcl-like? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 23:07 Message: Logged In: YES user_id=184124 Very usefull, commit it ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 11:46:21
|
Feature Requests item #1156875, was opened at 2005-03-04 12:03 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Watchdog process restarts failed server Initial Comment: We have been using this for quite some time and it proved extremely useful. We doublefork the nsd process and make the first forked instance control the second. The first one (the watchdog) reacts on exit codes and signals caught during the watch and correspondingly restarts the second instance (the worker). Also, we have added the the "-restart" option to the "ns_shutdown" command. This just sends the SIGINT to the worker process. The watchdog is handling this signal and respawns the worker automatically. During operation, the watchdog logs events and their cause into the system log file. This looks like: Feb 28 04:00:05 Develop nsd[19400]: worker: started. Mar 1 04:00:13 Develop nsd[4475]: watchdog: worker 19400 exited (2). Mar 1 04:00:15 Develop nsd[21290]: worker: started. Mar 1 04:00:18 Develop nsd[14705]: watchdog: worker 19399 exited (2). Mar 1 04:00:20 Develop nsd[21300]: worker: started. We have done all the changes with "--enable-watchdog" so anybody who needs this feature will have to compile with this option. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 04:46 Message: Logged In: YES user_id=87254 Oh, I thought you were going to commit this. Sure, revert the exit() stuff and commit away, or I'll get to it at the wekend. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 04:23 Message: Logged In: YES user_id=95086 Now, what? You go change/remove _exit and commit? Other things are ok for me. Or should I commit? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 04:02 Message: Logged In: YES user_id=87254 Calling _exit() was an attempt to clarify the intent of the code. But I didn't pay attention to the fact that the main function is pluggable, so I guess that's a no-go. 4 hours of retry time before giving up sounds almost like 'inifinite'... Looks like we can get something suitable without adding extra switches, so we can commit this and tweak the timing later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-15 06:54 Message: Logged In: YES user_id=95086 Stephen, why do you call _exit(0) instead of returning control to caller routine? Also.. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. Restart time doubling is limited to 64 seconds (insane, should rather be 60 seconds, heh) so it should not double after reaching 60 secs. After that time we constantly wait 60 secs between restarts. With the server retry of 16 times and we will wait 1, 2, 4, 8, 16, 32, 64 (i.e. total of 127) seconds and after that 8 times 64 secs (= 512 total ) secs and eventually abort. All together, that will be 640 secs = about 10 minutes wait/retry. After that it will exit. I think if we extend retry count from 16 to 256 it will be pretty enough (about 4 hours) where one can fix the failure cause. Otherwise, I see syntactic problems of how to actually specify the retries parameter. Of curse, you can do something like bin/nsd -w -r 16 but the "-r" (restart) would really have no meaning w/o -w and I do not like this very much indeed. I'm really for bumping the retry count higher (127 or 256 tries). ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 17:50 Message: Logged In: YES user_id=184124 Last time addition: Can we make syslog function public, Ns_Syslog? Also Tcl interface to it would be usefull as well, below is syslog from my nssys module, just a suggestion, no rush with this: static struct syslog_data { int opened; int facility; int options; char ident[32]; Tcl_HashTable *priorities; Tcl_HashTable *facilities; } syslog_data; /* Syslog initialization */ void SyslogInit() { memset(&syslog_data,0,sizeof(syslog_data)); syslog_data.options = LOG_PID; syslog_data.facility = LOG_USER; if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY))) strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1); syslog_data.facilities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS); AddEntry(syslog_data.facilities,"auth",LOG_AUTH); AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV); AddEntry(syslog_data.facilities,"cron",LOG_CRON); AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON); AddEntry(syslog_data.facilities,"kernel",LOG_KERN); AddEntry(syslog_data.facilities,"lpr",LOG_LPR); AddEntry(syslog_data.facilities,"mail",LOG_MAIL); AddEntry(syslog_data.facilities,"news",LOG_NEWS); AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG); AddEntry(syslog_data.facilities,"user",LOG_USER); AddEntry(syslog_data.facilities,"uucp",LOG_UUCP); AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0); AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1); AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2); AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3); AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4); AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5); AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6); AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7); syslog_data.priorities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS); AddEntry(syslog_data.priorities,"emerg",LOG_EMERG); AddEntry(syslog_data.priorities,"alert",LOG_ALERT); AddEntry(syslog_data.priorities,"crit",LOG_CRIT); AddEntry(syslog_data.priorities,"err",LOG_ERR); AddEntry(syslog_data.priorities,"error",LOG_ERR); AddEntry(syslog_data.priorities,"warning",LOG_WARNING); AddEntry(syslog_data.priorities,"notice",LOG_NOTICE); AddEntry(syslog_data.priorities,"info",LOG_INFO); AddEntry(syslog_data.priorities,"debug",LOG_DEBUG); } /* * sys_log * * Usage: * sys_log ?-facility f -options o -ident i? priority message * * facility - kernel, cron, authpriv, mail, local0, local1, daemon, local2, * news, local3, local4, local5, local6, syslog, local7, auth, uucp, lpr, user * options - list with any of { CONS NDELAY PERROR PID ODELAY NOWAIT } * ident - ident is prepended to every message, and is typically the program name * priority - info, alert, emerg, err, notice, warning, error, crit, debug * * Returns: * nothing */ static int SysLog(ClientData data,Tcl_Interp *interp,int argc,char **argv) { int i,priority = 0; Tcl_HashEntry *entry; if(argc < 1) { Tcl_AppendResult(interp,argv[0]," ?-facility f -options o -ident i? priority message",0); return TCL_ERROR; } for(i = 1;i < argc-1;) { if(!strcmp(argv[i],"-facility")) { entry = Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]); if(!entry) { Tcl_AppendResult(interp,"wrong facility ",argv[i+1],": available facilities: ",NULL); ListHash(interp,syslog_data.facilities); return TCL_ERROR; } syslog_data.facility = (int)Tcl_GetHashValue(entry); closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-options")) { syslog_data.options = 0; if(strstr(argv[i+1],"CONS")) syslog_data.options |= LOG_CONS; if(strstr(argv[i+1],"NDELAY")) syslog_data.options |= LOG_NDELAY; if(strstr(argv[i+1],"PERROR")) syslog_data.options |= LOG_PERROR; if(strstr(argv[i+1],"PID")) syslog_data.options |= LOG_PID; if(strstr(argv[i+1],"ODELAY")) syslog_data.options |= LOG_ODELAY; if(strstr(argv[i+1],"NOWAIT")) syslog_data.options |= LOG_NOWAIT; closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-ident")) { memset(syslog_data.ident,0,sizeof(syslog_data.ident)); strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1); closelog(); syslog_data.opened = 0; i += 2; continue; } break; } if(i < argc) { entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]); if(!entry) { Tcl_AppendResult(interp,"wrong level ",argv[i],": available levels: ",NULL); ListHash(interp,syslog_data.priorities); return TCL_ERROR; } priority = (int)Tcl_GetHashValue(entry); i++; } if(i < argc) { if(!syslog_data.opened) { openlog(syslog_data.ident,syslog_data.options,syslog_data.facility); syslog_data.opened = 1; } syslog(priority,argv[i]); return TCL_OK; } return TCL_OK; } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 16:44 Message: Logged In: YES user_id=87254 Oops, looks like I forgot to attach the patch... Here it is. Yes, the server will attempt to restart a number of times before giving up. I think this is as appropriate for config errors as it is for others, then there doesn't need to be a dual-standard fatal error. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 11:12 Message: Logged In: YES user_id=95086 I'd rather wait for Stephens code and see what's done there and then make final changes and commit. It should not take long. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 11:07 Message: Logged In: YES user_id=184124 Then i do not see any reason not committing it, we can have discussions about it for another couple of years, but until we have it and use it, it is just discussions. We aready agreed on having it in the core. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 11:00 Message: Logged In: YES user_id=95086 Stephen, You mind uploading your patch changes so I can peek into it? I think we should soon settle on some solution and move on. Vlad, The patch already does that. After 16 unsuccessfull restarts the watchdog exits. Between every restart we wait 1, 2, 4, 8, 16, 32, .... 16384 seconds and then exit. This would be: 32767 seconds or about 9 hours. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 07:53 Message: Logged In: YES user_id=184124 I would say, repeat configured number of time and then exit. This way all socket related problems will be cleared after couple of restartes, config problems will make server exit . ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 07:41 Message: Logged In: YES user_id=95086 Well, because I could not differentiate between config and later runtime errors, I thought its wiser to abort rather to repeatedly restart broken server. One possibility is to add different Ns_Fatal call like: Ns_FatalEx(int exitcode, char *fmt, ...); Or? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 07:12 Message: Logged In: YES user_id=87254 Are you sure Ns_Fatal should not restart the server? Many of the fatal errors are caused by bad configuration and I can see why you might want the server to exit immediately. But there are also runtime fatal errors, and here I'd expect the server to be restarted. Some config errors are due to external factors like missing directories or file system permissions and it would be nice for the server to come back up as soon as the issue resolved itself. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 02:52 Message: Logged In: YES user_id=95086 Calling watchdog before or after chroot has no real implications I believe. Also, when the server exits with Ns_Fatal, it hsould not be restarted, I will look into your changes today. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-10 23:27 Message: Logged In: YES user_id=87254 Oh yeah, and I moved the watchdog stuff eaven earlier, before the prebind and chroot. Hmm, now that I think about it it's only the prebind that really needs to happen after the watchdog is started, to ensure the sockets are always in a sane state... ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-10 23:17 Message: Logged In: YES user_id=87254 Am I reading this right? It looks like if the server calls Ns_Fatal() it will exit with code 1, but 0 and 1 are treated as OK exit codes and the server will not be restarted. I've attached a patch which changes the above, fixes the pid problem in a different way because I completely forgot you already posted below about that small glitch, use Ns_ParseObjv(), adds the -w switch, and a couple of small name changes. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 13:09 Message: Logged In: YES user_id=95086 This is correct. As soon as we agree on some implementation, I will put the -i processing and avoid starting watchdog for inittab starts. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 12:48 Message: Logged In: YES user_id=184124 There still should be possibility to run nsd from inittab, so when -i switch is given, no watchdog should be running, let /sbin/init to handle restarts ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 09:19 Message: Logged In: YES user_id=95086 small glich: After applying the patch, change nsmain.c from: nsconf.pid = serverPid: to nsconf.pid = getpid(); otherwise the pid file will contain the bogus server pid if the watchdog restarted it later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 08:26 Message: Logged In: YES user_id=95086 Another try... Important to note: SIGKILL (signal 9) cannot be handled hence if somebody kills the watchdog with SIGKILL, the server will be left lingering w/o the watchdog. This is important to know. I do not see any possibility how to recover in such cases (i.e. how to stop the server). Apart from that, all objections from Stephen are taken into account. Please try again. A new copy of watchdog.patch file is attached. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-07 22:20 Message: Logged In: YES user_id=184124 I just found this code in my old server sources, i just chnaged internal name to Ns/NS, used to run pretty stable. ------------------------------------------------------------------- #define NS_EXIT 99 static void NsTerminate(int sig) { printf("NS[%d] signal %d received...",getpid(),sig); // Kill the server with the same signal if(nsPid > 0) kill(nsPid,sig); // Exit in case of fatal signal if(sig != SIGHUP) { printf("NS[%d] terminating...",getpid()); exit(0); } // Reassign signal handler signal(sig,NsTerminate); } void NsWatchdog(int argc, char *argv[], char *envp[]) { int failcount = 0; time_t start; int status; pid_t pid; signal(SIGTTOU,SIG_IGN); signal(SIGTTIN,SIG_IGN); signal(SIGTSTP,SIG_IGN); signal(SIGPIPE,SIG_IGN); signal(SIGQUIT,SIG_IGN); signal(SIGHUP,NsTerminate); signal(SIGINT,NsTerminate); signal(SIGTERM,NsTerminate); // Go background if((pid = fork())) { if(pid < 0) err_logger("warpConfigure: fork: %s",strerror(errno)); exit(0); } setsid(); for(;;) { // Execute the real server nsPid = fork(); // Child, continue as server if(nsPid == 0) { exit(nsMain(argc, argv, ServerInit)); } /* parent, behaves like a guardian */ time(&start); printf("NS[%d] server process started",getpid()); pid = waitpid(-1, &status, 0); if(WIFEXITED(status)) printf("NS[%d] child process exited with status %d",pid,WEXITSTATUS(status)); else if(WIFSIGNALED(status)) printf("NS[%d] child process exited due to signal %d",pid,WTERMSIG(status)); else printf("NS[%d] child process exited", pid); // Special exit code if(WIFEXITED(status)) { if(WEXITSTATUS(status) == NS_EXIT) { printf("NS[%d] child configuration error, exiting",getpid()); exit(0); } else if(WEXITSTATUS(status) == SIGHUP) { } else { if(time(0) - start < 10) failcount++; if(failcount == 10) { printf("Exiting due to repeated, frequent failures"); exit(1); } } } sleep(3); } } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-07 21:23 Message: Logged In: YES user_id=87254 NsWatchdog() is called after the server drops root privs, so both the watchdog and the server run as the defined user. What happens if the server dies and on restart needs to rebind privileged ports? A ps listing shows two running processes, parent and child. If I kill either one, the watchdog dies, the server continues to run. If I kill -9 the parent, the child continues to run. If I kill -9 the child, the server is restarted. Something seems not quite right here... I'm a bit confused about how the code works. For example, NsWatchdog() seems to ignore all of it's arguments? Here's the code which calls it: if (mode == 0) { i = ns_fork(); if (i < 0) { Ns_Fatal("nsmain: fork() failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } setsid(); i = NsWatchdog(argc, argv, initProc); if (i < 0) { Ns_Fatal("nsmain: watchdog failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } nsconf.pid = getpid(); } NsWatchdog() says it returns the worker pid, it also sets the global variable wpid. The code above ignores the returned value, and the global variable, and instead calls getpid()... The variable 'i' is existing code, but still somehow doesn't seem suitable... Could that Ns_Fatal() above be moved into the NsWatchdog() function? I think maybe some comment is needed here. The code structure is very like that just above where ns_fork() is called, but this function will return *multiple* times, right? This is kind of suprising, and an extra twist on the already confusing fork() semantics (or maybe it's just me who gets confused by fork...). A return value of 0 here is a 'request for orderly shutdown', right? How about some more logging to syslog? For example, distinguish between start and restart. Mention when the MAX_SLEEP_PERIOD has been reached, etc. Couple of small things: Can we refer to 'the server', rather than 'the worker'? Worker and Watchdog begin with 'w', and so does the global variable wpid... Maybe serverPid? NsWatchdog is a static function, it doesn't need the Ns prefix. In NsWatchdog(), the variable 'run' should be something like 'nretries', 'nap' should be something more like 'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS. The comment for WaitForWorker() is misslabeled. Should SigHandler() be called WatchdogSigHandler()? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-06 16:03 Message: Logged In: YES user_id=184124 Looks good to me ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 05:06 Message: Logged In: YES user_id=95086 Ah, correction: The restart option sends SIGINT to the worker process which causes the watchdog to restart it. And, the patchfile is now attached! ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 05:03 Message: Logged In: YES user_id=95086 Here is the patch. I have added "-restart" option to "ns_shutdown". It is rather clumsy to parse but should do. We should rewrite this with your args parsing routine. The restart option sends SIGTERM to the worker process which causes the watchdog to restart it. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 16:46 Message: Logged In: YES user_id=87254 I don't think this has to hide behind a config option. It's either a good idea or it's not. Sounds good to me. Is there a patch? I'm wondering about some of the implementation details... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 11:41:14
|
Feature Requests item #1156141, was opened at 2005-03-03 13:02 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 Category: None Group: None Status: Closed Resolution: Rejected Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Add ns_conn channel command Initial Comment: Extend the ns_conn command to allow wrapping an Tcl channel arround it. This channel can be used to talk to the remote client afterwards. The idea is simple. The client uses standard http protocol to open up a connection to the server. Server accepts the connection and obtains the channel from it. The client also gets the channel to the server and from this point on, both can exchange arbitrary data. Example implementation is given as a patch against the current CVS state. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 04:41 Message: Logged In: YES user_id=87254 Yes, I get it :-) My only point was that HTTP is not a short lived connection protocol, if you don't want it to be. Lengthy connections with multiple messages seems to be one of the key features here. The *only* thing HTTP can't do is have the server initiate a message to the client. For the kinds of cases you're talking about here with multi-hour long connections, I'd be wanting some kind of 'ping' request from the client every so often just to make sure the server was idle and not broken. The ping becomes a 'what messages do you have for me?' request, the server's response is itself a request to the client. Anyway, I was talking pure nonsense about the keepalive stuff; the client doen't have to be so broken after all. I'm fine with you adding this feature. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 02:03 Message: Logged In: YES user_id=95086 "still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do Stephen, I do not think you understood my point. What this command buys me is to be able to initiate an http connection and reuse the socket-to-socket comm channel for other purposes. For example we make "GET" request to a certain URL and then when the connection gets established, the server, takes the conn socket, wraps a Tcl channel around it and passes this channel to my (arbitraty) Tcl proc. The client, instead of getting the response body, parses only the returned http headers to see that the connection succeeded and then passes this socket to my custom Tcl proc. Now, on the sever and on the client side you have a channel and both sides can use it to establish whatever protocol they want. They have a socket pipe between. The http is just used to handshake the peers, clear? What we do with this? We have two server instances on two different hosts. We open a channel (using the above trick) from one to another and then use this to generate a stream of preformatted packets (our protocol) on one side which does the backup of data from to the other side ultimately to the tape drive(s) attached to that host. We shuffle TB of data this way daily in about 500 installations worldwide, so I know this works fine. Now, see, this has nothing to do with HTTP1.1. or 1.0 alone. And yes, if you look in the implementation, you will see that it is a simple wrapper to the Ns_ConnSock. It simply wraps a Tcl channel around the the connection socket, nothing more and nothing less. And, I would not say that it requires a broken client. It just requires a special client-processing. This is not ment for browser-clients. It has nothing to do with serving pages or any other short lived connections. Once we setup such a connection, we use it for hours and maybe days. We could have achieved this with a simple socket connection from client to a specific port on the server. But we are not able to use any arbitrary port. All we have is port 80 (the web server port). So we must play by the http rules initially when accessing the server. When we passed this http handshake stuff, we are left with a socket-to-socket comm-channel which is transparent and can be used for any purpose whatsoever. Now, is this little bit more clear now? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 20:32 Message: Logged In: YES user_id=184124 What is broken about it? It is the Tcl interface to Ns_ConnSock(), just Tcl way, using channel. The merit is to use sync inter-exchange between client and server using established connection. Client is not broken, it is designed for this exchange, it is the tool to build messaging, not general purpose protocol. It make naviserver more flexible platform for implementing different ppas on top of it using Tcl, that's all. Btw, when are you going to commit your protocol extensions? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 17:20 Message: Logged In: YES user_id=87254 This command isn't a good idea because it creates a broken interface which requires a broken client to work, and I still don't understand what it buys you that a modern HTTP 1.1 implemnetation doesn't. If there was no other way to do it, then it may still have some merit. But Ns_ConnSock() is trivial and gives you all you need to create a Tcl channel. We've already decided to do this the right way, whatever that turns out to be, so why should we simultaneously also do it the wrong way? In fact you seem to agree, noting that this sucks because it's not UDP and the overhead of HTTP is just too much for you. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 08:36 Message: Logged In: YES user_id=184124 And that's why i want to add simple UDP support to the server: for implementing simple messaging between servers in the cluster, using UDP without HTTP overhead will make it simple and faster. Using Tcl channel is intermediate solution for such messaging because it eliminates making multiple HTTP requests to the server but still uses heavy TCP/HTTP overhead. Still i think this command is very usefull and will make server more versatile provided that we will supply usage examples with the server as well. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 08:26 Message: Logged In: YES user_id=95086 "Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again." This is true. It is a regular http request, just for the body part, the peers are engaging in a bidirectional communication and can do whatever they pleased. The request is considered done when one of the peers close the socket. Strictly speaking, I'm using the http body and this trick to implement arbitrary client-server comm channel. This will not work for browsers of course. It has nothing to do with browsers at all in fact. It arose out of the necessity to support alternate protocols given what we had in AS, that is pure http machinery. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-04 07:38 Message: Logged In: YES user_id=184124 On the closing note, keepalive has nothing to do with this and should not be considered in this case. Even if you connect using HTTP 1.1 and specify keepalive, sever has the right to issue Connection: close header which will disable keepalive. So, it WILL work for all connections. Second, this has nothing to do with non-HTTP protocol, it is HTTP communication protocol, what's inside the body has nothing to do with it again. And pipelining is a bad example here, the point it to send the request and wait for the answer and base on that send another request. Pipelining will allow you just to submit all requests and then later receive replies, it was designed for speeding browser's connection especially retrieving images. It looks to me Stephen you are all for HTTP core and around-the-HTTP-driver solutions only. This will make simple solutions complex because of getting around/workaround HTTP based core in the server. What is wrong with one simpe function which will give the server more options without breaking anything? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 02:45 Message: Logged In: YES user_id=95086 I see. The gotcha's are: o. it's yet another implementation of non-HTTP protocol o. (HTTP 1.0 only, and no Keepalive header) be used to get o. function in the docs might reasonably expect it to work for all HTTP connections That's pretty clear. Well, I think you are right. This is our attempt to work around http-only caps of the server. Agreed. It will be better to get it done properly and we will then internally switch to the new implementation when it's ready. At the moment I have all this in a separate module and this can stay so until the multi-protocol caps are agreed upon and implemented. I will close this RFE then. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 02:36 Message: Logged In: YES user_id=87254 I don't understand why you're not using standard HTTP pipelining. It would seem to be so much easier than having to write custom client and server code. This is a small change, but I'm not excited about it because it's yet another implementation of non-HTTP protocol support, which makes 3 so far... :-( It also seems to require that a deliberately broken client (HTTP 1.0 only, and no Keepalive header) be used to get around the server's io read-ahead. Someone seeing this function in the docs might reasonably expect it to work for all HTTP connections. Also, I don't think this requires core support. Ns_ConnSock() will return the underlying file descriptor for a connection. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-04 01:36 Message: Logged In: YES user_id=95086 OK. Maybe I should explain little bit more what I'm using this for. Imagine you'd like to make a client-server connection and then exchange arbitrary data on this connection. Imagine something like telnet connection. You telnet from client to server, get a comm-pipe betwen them and then you can shuffle whatever you like across this comm-pipe. What we have is a http server responding only to http requests (as it is now, w/o multiprotocol support). So, what I do is to use http to setup the connection, have a registered procedure on the server grab the socket and install my custom protocol handler on it. The http is only used to establish the communication. All the rest is handled by my custom code on the server and on the client. Once the connection is setup, we have a point-to-point socket communication link between server and client and we can use it for just about anything. All you need for this is a client-side http-library (we have ours written in Tcl) which will give you the socket back once the connection is established) and serve-side proc implementing your protocol of choice. So, the entire server http-machinery stays in place. I do not see any problem there because we have been using this trick for about 4 years in production very successfully. What I do not know is wether there are any gotcha's with that approach (we haven't hit any yet). I also do not know what will happen when we add alternate multiprotocol caps to server. Maybe this will become obsolete. But, as I see this now, the change is trivial and does not hurt anybody. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 16:21 Message: Logged In: YES user_id=87254 HTTP also has (although our server doesn't, but I want to add this) pipelining, which is sending multiple requests at once without first waiting for the reply to previous requests. That sounds like your "i do it when I send multiple http requests". ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 15:45 Message: Logged In: YES user_id=184124 Keepalive works when i close my connection, i.e. when i exited from conn thread, but the point here is to do it from the conn thread, i received request from my other server sending me some info, i send my info back and get ack. Everything inside one connection, before keepalive. And yes, i send messages in my format, anyway i do it when i send multiple http reaqests, i am interested in message body only. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 15:32 Message: Logged In: YES user_id=87254 HTTP already handles sending multiple requests accross one socket with keepalives. It might be more useful to add keepalives to our HTTP client code. In the case of HTTP keepalives, the socket is returned to the driver thread after the first request, still open, and the conn thread goes on to do other useful work. If a new request arrives before the connection times out it is passed to the next free conn thread. If you don't usee HTTP with keepalives you will have to invent some kind of message chunking syntax and some kind of time-out protocol. But you will still end up wasting conn threads as they idle waiting for possible extra messages. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 15:20 Message: Logged In: YES user_id=184124 I could use it in my apps where i have cluster of several servers exchanging http requests between each other, sometimes one session needs 2 or more exchanges. With this channel i can exchange using same connection and i need to send/receive data, not headers, so this can save and improve my operations. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-03 15:12 Message: Logged In: YES user_id=87254 Usually, by the time your script runs and you call [ns_conn channel], the data from the client will already have been read and buffered. You can access this with [ns_conn content]. Does reading work? What's the objective? Do you want to hadle streaming data, or make reading and writing more Tcl-like? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-03 15:07 Message: Logged In: YES user_id=184124 Very usefull, commit it ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 11:28:28
|
Feature Requests item #1162223, was opened at 2005-03-12 17:22 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Stephen Deasey (sdeasey) Assigned to: Stephen Deasey (sdeasey) Summary: Tcl Callbacks Initial Comment: The server offers a dual C/Tcl interface. There are a number of of places where Tcl code needs to be executed by the C core, or the core doesn't know whether a callback is C or Tcl. I've attached the patch naviserver-4.1-tclcallbacks.patch. It extracts the Tcl callback infrastructure from tclsched.c into tclcallbacks.c and makes it a public interface. Here it is: Ns_TclNewCallback() Ns_TclNewCallbackObj() Ns_TclEvalCallback() Ns_TclCallbackProc() Ns_TclFreeCallback() Ns_TclCallbackArgProc() It provides a standard interface for Tcl callbacks throughout the server, handling interp allocation, introspection, evaluation and cleanup. Also, tclcallbacks.c is somewhat analogous to callbacks.c, and so the generic ns_atexit etc. callbacks which were in tclsched.c have been moved to the tclcallbacks file. The first set of code to be converted to the new interface is tclsched.c itself, and I've converted to using Ns_ParseObjv() while I was in there. File size has dropped ~250 lines. One thing I'm wondering about is whether the Ns_TclEvalCallback() routine should take extra args to send to the Tcl proc. There seems to be no standard scheme for ordering the args so there still needs to be a way to manually construct the command, which is why the members of the Ns_TclCallback struct are public. But for this very reason, it might be worth providing a standard way to do this so all future callbacks are consistent. It would be nice in the future to come up with some way of registereing these callback points such that you don't have to create an ns_info subcommand for example manually. It should be possible for loadable modules to register their callback points and have them show up in the right place. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 04:28 Message: Logged In: YES user_id=87254 Yes, I forgot to mention that Tcl callbacks are a little more general than those for C in callbacks.c. Those take a single arg which is the context you registered with the proc, and return void. Tcl callbacks on the other hand can return a result which at the C level gets appended to a dstring if you pass that in. The members of the Ns_TclCallback structure are public so that you can construct a command and args in any format. This is the question I was asking originally: should there be a default recomended way to do this? So for example Ns_TclEvalCallback() might take varargs cstrings, which if passed would be added to the tcl command to execute: command ?cbarg1? ?cbarg2? ?context? Where cbarg2 and cbarg2 are specific to the type of callback and context is what was registered with the proc by the Tcl programmer. Unfortunately that wouldn't cover all the bases as some of the existing APIs do weird things like place callback specific args after the context or if no context was passed, always append an empty string. For new callbacks, something standard like the above scheme should be used. If you look at the patch for virtual hosting you'll see that two new Tcl callbacks were created and they contain simillar boiler plate. This makes me think that maybe there should be something like an [ns_addcallback serverroot myrootproc mycontext] interface. It's a little more complicated than that because some callbacks are singular, like the serverroot proc, and some are plural, like filters. So maybe the thing to do is get some experience with this simple version and then expand it later... ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-12 17:46 Message: Logged In: YES user_id=184124 It looks good, unified callback interface would be better but Tcl filters. traces, requestprocs uses different arguments/parameters and currently the only way to get info about them to use ns_info interface and Ns_ArgProc interface. Tcl_Callback are for simple proc/args callbacks, filters is special case, see if you can convert them. But i am for this change. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-12 17:30 Message: Logged In: YES user_id=87254 I've attached a patch which converts the tclrequest.c file to use the new callback API as an example. It also uses the Ns_ParseObjv() API, and I've removed support for the legacy conn variable. It removes ~200 lines and all utility functions. The callback API provides default Ns_ArgProc introspection, but this will be much nicer when combined with Vlad's RFE 1161597 :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 11:23:33
|
Feature Requests item #1156875, was opened at 2005-03-04 20:03 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Watchdog process restarts failed server Initial Comment: We have been using this for quite some time and it proved extremely useful. We doublefork the nsd process and make the first forked instance control the second. The first one (the watchdog) reacts on exit codes and signals caught during the watch and correspondingly restarts the second instance (the worker). Also, we have added the the "-restart" option to the "ns_shutdown" command. This just sends the SIGINT to the worker process. The watchdog is handling this signal and respawns the worker automatically. During operation, the watchdog logs events and their cause into the system log file. This looks like: Feb 28 04:00:05 Develop nsd[19400]: worker: started. Mar 1 04:00:13 Develop nsd[4475]: watchdog: worker 19400 exited (2). Mar 1 04:00:15 Develop nsd[21290]: worker: started. Mar 1 04:00:18 Develop nsd[14705]: watchdog: worker 19399 exited (2). Mar 1 04:00:20 Develop nsd[21300]: worker: started. We have done all the changes with "--enable-watchdog" so anybody who needs this feature will have to compile with this option. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-17 12:23 Message: Logged In: YES user_id=95086 Now, what? You go change/remove _exit and commit? Other things are ok for me. Or should I commit? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 12:02 Message: Logged In: YES user_id=87254 Calling _exit() was an attempt to clarify the intent of the code. But I didn't pay attention to the fact that the main function is pluggable, so I guess that's a no-go. 4 hours of retry time before giving up sounds almost like 'inifinite'... Looks like we can get something suitable without adding extra switches, so we can commit this and tweak the timing later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-15 14:54 Message: Logged In: YES user_id=95086 Stephen, why do you call _exit(0) instead of returning control to caller routine? Also.. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. Restart time doubling is limited to 64 seconds (insane, should rather be 60 seconds, heh) so it should not double after reaching 60 secs. After that time we constantly wait 60 secs between restarts. With the server retry of 16 times and we will wait 1, 2, 4, 8, 16, 32, 64 (i.e. total of 127) seconds and after that 8 times 64 secs (= 512 total ) secs and eventually abort. All together, that will be 640 secs = about 10 minutes wait/retry. After that it will exit. I think if we extend retry count from 16 to 256 it will be pretty enough (about 4 hours) where one can fix the failure cause. Otherwise, I see syntactic problems of how to actually specify the retries parameter. Of curse, you can do something like bin/nsd -w -r 16 but the "-r" (restart) would really have no meaning w/o -w and I do not like this very much indeed. I'm really for bumping the retry count higher (127 or 256 tries). ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-15 01:50 Message: Logged In: YES user_id=184124 Last time addition: Can we make syslog function public, Ns_Syslog? Also Tcl interface to it would be usefull as well, below is syslog from my nssys module, just a suggestion, no rush with this: static struct syslog_data { int opened; int facility; int options; char ident[32]; Tcl_HashTable *priorities; Tcl_HashTable *facilities; } syslog_data; /* Syslog initialization */ void SyslogInit() { memset(&syslog_data,0,sizeof(syslog_data)); syslog_data.options = LOG_PID; syslog_data.facility = LOG_USER; if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY))) strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1); syslog_data.facilities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS); AddEntry(syslog_data.facilities,"auth",LOG_AUTH); AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV); AddEntry(syslog_data.facilities,"cron",LOG_CRON); AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON); AddEntry(syslog_data.facilities,"kernel",LOG_KERN); AddEntry(syslog_data.facilities,"lpr",LOG_LPR); AddEntry(syslog_data.facilities,"mail",LOG_MAIL); AddEntry(syslog_data.facilities,"news",LOG_NEWS); AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG); AddEntry(syslog_data.facilities,"user",LOG_USER); AddEntry(syslog_data.facilities,"uucp",LOG_UUCP); AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0); AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1); AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2); AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3); AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4); AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5); AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6); AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7); syslog_data.priorities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS); AddEntry(syslog_data.priorities,"emerg",LOG_EMERG); AddEntry(syslog_data.priorities,"alert",LOG_ALERT); AddEntry(syslog_data.priorities,"crit",LOG_CRIT); AddEntry(syslog_data.priorities,"err",LOG_ERR); AddEntry(syslog_data.priorities,"error",LOG_ERR); AddEntry(syslog_data.priorities,"warning",LOG_WARNING); AddEntry(syslog_data.priorities,"notice",LOG_NOTICE); AddEntry(syslog_data.priorities,"info",LOG_INFO); AddEntry(syslog_data.priorities,"debug",LOG_DEBUG); } /* * sys_log * * Usage: * sys_log ?-facility f -options o -ident i? priority message * * facility - kernel, cron, authpriv, mail, local0, local1, daemon, local2, * news, local3, local4, local5, local6, syslog, local7, auth, uucp, lpr, user * options - list with any of { CONS NDELAY PERROR PID ODELAY NOWAIT } * ident - ident is prepended to every message, and is typically the program name * priority - info, alert, emerg, err, notice, warning, error, crit, debug * * Returns: * nothing */ static int SysLog(ClientData data,Tcl_Interp *interp,int argc,char **argv) { int i,priority = 0; Tcl_HashEntry *entry; if(argc < 1) { Tcl_AppendResult(interp,argv[0]," ?-facility f -options o -ident i? priority message",0); return TCL_ERROR; } for(i = 1;i < argc-1;) { if(!strcmp(argv[i],"-facility")) { entry = Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]); if(!entry) { Tcl_AppendResult(interp,"wrong facility ",argv[i+1],": available facilities: ",NULL); ListHash(interp,syslog_data.facilities); return TCL_ERROR; } syslog_data.facility = (int)Tcl_GetHashValue(entry); closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-options")) { syslog_data.options = 0; if(strstr(argv[i+1],"CONS")) syslog_data.options |= LOG_CONS; if(strstr(argv[i+1],"NDELAY")) syslog_data.options |= LOG_NDELAY; if(strstr(argv[i+1],"PERROR")) syslog_data.options |= LOG_PERROR; if(strstr(argv[i+1],"PID")) syslog_data.options |= LOG_PID; if(strstr(argv[i+1],"ODELAY")) syslog_data.options |= LOG_ODELAY; if(strstr(argv[i+1],"NOWAIT")) syslog_data.options |= LOG_NOWAIT; closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-ident")) { memset(syslog_data.ident,0,sizeof(syslog_data.ident)); strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1); closelog(); syslog_data.opened = 0; i += 2; continue; } break; } if(i < argc) { entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]); if(!entry) { Tcl_AppendResult(interp,"wrong level ",argv[i],": available levels: ",NULL); ListHash(interp,syslog_data.priorities); return TCL_ERROR; } priority = (int)Tcl_GetHashValue(entry); i++; } if(i < argc) { if(!syslog_data.opened) { openlog(syslog_data.ident,syslog_data.options,syslog_data.facility); syslog_data.opened = 1; } syslog(priority,argv[i]); return TCL_OK; } return TCL_OK; } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-15 00:44 Message: Logged In: YES user_id=87254 Oops, looks like I forgot to attach the patch... Here it is. Yes, the server will attempt to restart a number of times before giving up. I think this is as appropriate for config errors as it is for others, then there doesn't need to be a dual-standard fatal error. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 19:12 Message: Logged In: YES user_id=95086 I'd rather wait for Stephens code and see what's done there and then make final changes and commit. It should not take long. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 19:07 Message: Logged In: YES user_id=184124 Then i do not see any reason not committing it, we can have discussions about it for another couple of years, but until we have it and use it, it is just discussions. We aready agreed on having it in the core. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 19:00 Message: Logged In: YES user_id=95086 Stephen, You mind uploading your patch changes so I can peek into it? I think we should soon settle on some solution and move on. Vlad, The patch already does that. After 16 unsuccessfull restarts the watchdog exits. Between every restart we wait 1, 2, 4, 8, 16, 32, .... 16384 seconds and then exit. This would be: 32767 seconds or about 9 hours. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 15:53 Message: Logged In: YES user_id=184124 I would say, repeat configured number of time and then exit. This way all socket related problems will be cleared after couple of restartes, config problems will make server exit . ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 15:41 Message: Logged In: YES user_id=95086 Well, because I could not differentiate between config and later runtime errors, I thought its wiser to abort rather to repeatedly restart broken server. One possibility is to add different Ns_Fatal call like: Ns_FatalEx(int exitcode, char *fmt, ...); Or? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 15:12 Message: Logged In: YES user_id=87254 Are you sure Ns_Fatal should not restart the server? Many of the fatal errors are caused by bad configuration and I can see why you might want the server to exit immediately. But there are also runtime fatal errors, and here I'd expect the server to be restarted. Some config errors are due to external factors like missing directories or file system permissions and it would be nice for the server to come back up as soon as the issue resolved itself. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 10:52 Message: Logged In: YES user_id=95086 Calling watchdog before or after chroot has no real implications I believe. Also, when the server exits with Ns_Fatal, it hsould not be restarted, I will look into your changes today. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 07:27 Message: Logged In: YES user_id=87254 Oh yeah, and I moved the watchdog stuff eaven earlier, before the prebind and chroot. Hmm, now that I think about it it's only the prebind that really needs to happen after the watchdog is started, to ensure the sockets are always in a sane state... ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 07:17 Message: Logged In: YES user_id=87254 Am I reading this right? It looks like if the server calls Ns_Fatal() it will exit with code 1, but 0 and 1 are treated as OK exit codes and the server will not be restarted. I've attached a patch which changes the above, fixes the pid problem in a different way because I completely forgot you already posted below about that small glitch, use Ns_ParseObjv(), adds the -w switch, and a couple of small name changes. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 21:09 Message: Logged In: YES user_id=95086 This is correct. As soon as we agree on some implementation, I will put the -i processing and avoid starting watchdog for inittab starts. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 20:48 Message: Logged In: YES user_id=184124 There still should be possibility to run nsd from inittab, so when -i switch is given, no watchdog should be running, let /sbin/init to handle restarts ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 17:19 Message: Logged In: YES user_id=95086 small glich: After applying the patch, change nsmain.c from: nsconf.pid = serverPid: to nsconf.pid = getpid(); otherwise the pid file will contain the bogus server pid if the watchdog restarted it later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 16:26 Message: Logged In: YES user_id=95086 Another try... Important to note: SIGKILL (signal 9) cannot be handled hence if somebody kills the watchdog with SIGKILL, the server will be left lingering w/o the watchdog. This is important to know. I do not see any possibility how to recover in such cases (i.e. how to stop the server). Apart from that, all objections from Stephen are taken into account. Please try again. A new copy of watchdog.patch file is attached. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 06:20 Message: Logged In: YES user_id=184124 I just found this code in my old server sources, i just chnaged internal name to Ns/NS, used to run pretty stable. ------------------------------------------------------------------- #define NS_EXIT 99 static void NsTerminate(int sig) { printf("NS[%d] signal %d received...",getpid(),sig); // Kill the server with the same signal if(nsPid > 0) kill(nsPid,sig); // Exit in case of fatal signal if(sig != SIGHUP) { printf("NS[%d] terminating...",getpid()); exit(0); } // Reassign signal handler signal(sig,NsTerminate); } void NsWatchdog(int argc, char *argv[], char *envp[]) { int failcount = 0; time_t start; int status; pid_t pid; signal(SIGTTOU,SIG_IGN); signal(SIGTTIN,SIG_IGN); signal(SIGTSTP,SIG_IGN); signal(SIGPIPE,SIG_IGN); signal(SIGQUIT,SIG_IGN); signal(SIGHUP,NsTerminate); signal(SIGINT,NsTerminate); signal(SIGTERM,NsTerminate); // Go background if((pid = fork())) { if(pid < 0) err_logger("warpConfigure: fork: %s",strerror(errno)); exit(0); } setsid(); for(;;) { // Execute the real server nsPid = fork(); // Child, continue as server if(nsPid == 0) { exit(nsMain(argc, argv, ServerInit)); } /* parent, behaves like a guardian */ time(&start); printf("NS[%d] server process started",getpid()); pid = waitpid(-1, &status, 0); if(WIFEXITED(status)) printf("NS[%d] child process exited with status %d",pid,WEXITSTATUS(status)); else if(WIFSIGNALED(status)) printf("NS[%d] child process exited due to signal %d",pid,WTERMSIG(status)); else printf("NS[%d] child process exited", pid); // Special exit code if(WIFEXITED(status)) { if(WEXITSTATUS(status) == NS_EXIT) { printf("NS[%d] child configuration error, exiting",getpid()); exit(0); } else if(WEXITSTATUS(status) == SIGHUP) { } else { if(time(0) - start < 10) failcount++; if(failcount == 10) { printf("Exiting due to repeated, frequent failures"); exit(1); } } } sleep(3); } } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-08 05:23 Message: Logged In: YES user_id=87254 NsWatchdog() is called after the server drops root privs, so both the watchdog and the server run as the defined user. What happens if the server dies and on restart needs to rebind privileged ports? A ps listing shows two running processes, parent and child. If I kill either one, the watchdog dies, the server continues to run. If I kill -9 the parent, the child continues to run. If I kill -9 the child, the server is restarted. Something seems not quite right here... I'm a bit confused about how the code works. For example, NsWatchdog() seems to ignore all of it's arguments? Here's the code which calls it: if (mode == 0) { i = ns_fork(); if (i < 0) { Ns_Fatal("nsmain: fork() failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } setsid(); i = NsWatchdog(argc, argv, initProc); if (i < 0) { Ns_Fatal("nsmain: watchdog failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } nsconf.pid = getpid(); } NsWatchdog() says it returns the worker pid, it also sets the global variable wpid. The code above ignores the returned value, and the global variable, and instead calls getpid()... The variable 'i' is existing code, but still somehow doesn't seem suitable... Could that Ns_Fatal() above be moved into the NsWatchdog() function? I think maybe some comment is needed here. The code structure is very like that just above where ns_fork() is called, but this function will return *multiple* times, right? This is kind of suprising, and an extra twist on the already confusing fork() semantics (or maybe it's just me who gets confused by fork...). A return value of 0 here is a 'request for orderly shutdown', right? How about some more logging to syslog? For example, distinguish between start and restart. Mention when the MAX_SLEEP_PERIOD has been reached, etc. Couple of small things: Can we refer to 'the server', rather than 'the worker'? Worker and Watchdog begin with 'w', and so does the global variable wpid... Maybe serverPid? NsWatchdog is a static function, it doesn't need the Ns prefix. In NsWatchdog(), the variable 'run' should be something like 'nretries', 'nap' should be something more like 'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS. The comment for WaitForWorker() is misslabeled. Should SigHandler() be called WatchdogSigHandler()? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-07 00:03 Message: Logged In: YES user_id=184124 Looks good to me ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 13:06 Message: Logged In: YES user_id=95086 Ah, correction: The restart option sends SIGINT to the worker process which causes the watchdog to restart it. And, the patchfile is now attached! ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 13:03 Message: Logged In: YES user_id=95086 Here is the patch. I have added "-restart" option to "ns_shutdown". It is rather clumsy to parse but should do. We should rewrite this with your args parsing routine. The restart option sends SIGTERM to the worker process which causes the watchdog to restart it. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 00:46 Message: Logged In: YES user_id=87254 I don't think this has to hide behind a config option. It's either a good idea or it's not. Sounds good to me. Is there a patch? I'm wondering about some of the implementation details... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 11:06:18
|
Feature Requests item #1161597, was opened at 2005-03-11 12:30 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646 Category: C-API Group: None Status: Open Resolution: None Priority: 5 Submitted By: Vlad Seryakov (seryakov) Assigned to: Vlad Seryakov (seryakov) Summary: Extend ns_info with info about traces/filters/procs Initial Comment: Attached is patch that extends ns_info with 3 commands: ns_info traces ns_info filters ns_info requestprocs They work the same way as ns_info callbacks, no functionality changed or added, just information commands ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 04:06 Message: Logged In: YES user_id=87254 To get info on the binary callbacks we're just going to have to register proc info for all the default filters, request procs etc. If we cover everything there should be a reasonable number of examples for module writers to follow. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-12 18:47 Message: Logged In: YES user_id=184124 Also, somein docs would be usefull to put info how to get info about some binary callbacks: Welcome to ossweb running at /usr/local/ns/bin/nsd (pid 5524) ossweb:nscp 1> ns_info traces p:0xb7187be0 a:0x80817d8 # gdb (gdb) attach 12345 (gdb) info symbol 0xb7187be0 LogTrace in section .text ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-12 18:44 Message: Logged In: YES user_id=184124 I started with minimal changes, #defines and &syntax can be chnages of course. now that you started TclCallbacks, this thing depends when you commit your changes, it requires Ns_ArgProc and ns_info stuff. Once we have unified API for callbask i will change it to be server-specific as well, it took a while to figure out how it works :-)) ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-12 18:37 Message: Logged In: YES user_id=87254 This is a nice addition. ns_info filters/traces is server specific, but ns_info requestprocs returns info for all servers. Can you make this per-server also, to be consistent? The URL walking procs look a little tricky, I tnink they would benefit from having their own more detailed comments. The comment currently says it calls a function for each node, and the effect depends on the function. But the function signature is Ns_ArgProc so there's really only one thing it can do. Why does it fix the stack size at 512? Can these magic numbers be #define'd. Using a while loop rather than an empty for loop would be a little easier to read. There's a lot of this kind of thing: (&(triePtr->branches))->n Could this be simplified to: triePtr->branches.n ? Otherwise, looks pretty good. I like it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 11:02:00
|
Feature Requests item #1156875, was opened at 2005-03-04 12:03 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Watchdog process restarts failed server Initial Comment: We have been using this for quite some time and it proved extremely useful. We doublefork the nsd process and make the first forked instance control the second. The first one (the watchdog) reacts on exit codes and signals caught during the watch and correspondingly restarts the second instance (the worker). Also, we have added the the "-restart" option to the "ns_shutdown" command. This just sends the SIGINT to the worker process. The watchdog is handling this signal and respawns the worker automatically. During operation, the watchdog logs events and their cause into the system log file. This looks like: Feb 28 04:00:05 Develop nsd[19400]: worker: started. Mar 1 04:00:13 Develop nsd[4475]: watchdog: worker 19400 exited (2). Mar 1 04:00:15 Develop nsd[21290]: worker: started. Mar 1 04:00:18 Develop nsd[14705]: watchdog: worker 19399 exited (2). Mar 1 04:00:20 Develop nsd[21300]: worker: started. We have done all the changes with "--enable-watchdog" so anybody who needs this feature will have to compile with this option. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 04:02 Message: Logged In: YES user_id=87254 Calling _exit() was an attempt to clarify the intent of the code. But I didn't pay attention to the fact that the main function is pluggable, so I guess that's a no-go. 4 hours of retry time before giving up sounds almost like 'inifinite'... Looks like we can get something suitable without adding extra switches, so we can commit this and tweak the timing later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-15 06:54 Message: Logged In: YES user_id=95086 Stephen, why do you call _exit(0) instead of returning control to caller routine? Also.. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. Restart time doubling is limited to 64 seconds (insane, should rather be 60 seconds, heh) so it should not double after reaching 60 secs. After that time we constantly wait 60 secs between restarts. With the server retry of 16 times and we will wait 1, 2, 4, 8, 16, 32, 64 (i.e. total of 127) seconds and after that 8 times 64 secs (= 512 total ) secs and eventually abort. All together, that will be 640 secs = about 10 minutes wait/retry. After that it will exit. I think if we extend retry count from 16 to 256 it will be pretty enough (about 4 hours) where one can fix the failure cause. Otherwise, I see syntactic problems of how to actually specify the retries parameter. Of curse, you can do something like bin/nsd -w -r 16 but the "-r" (restart) would really have no meaning w/o -w and I do not like this very much indeed. I'm really for bumping the retry count higher (127 or 256 tries). ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 17:50 Message: Logged In: YES user_id=184124 Last time addition: Can we make syslog function public, Ns_Syslog? Also Tcl interface to it would be usefull as well, below is syslog from my nssys module, just a suggestion, no rush with this: static struct syslog_data { int opened; int facility; int options; char ident[32]; Tcl_HashTable *priorities; Tcl_HashTable *facilities; } syslog_data; /* Syslog initialization */ void SyslogInit() { memset(&syslog_data,0,sizeof(syslog_data)); syslog_data.options = LOG_PID; syslog_data.facility = LOG_USER; if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY))) strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1); syslog_data.facilities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS); AddEntry(syslog_data.facilities,"auth",LOG_AUTH); AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV); AddEntry(syslog_data.facilities,"cron",LOG_CRON); AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON); AddEntry(syslog_data.facilities,"kernel",LOG_KERN); AddEntry(syslog_data.facilities,"lpr",LOG_LPR); AddEntry(syslog_data.facilities,"mail",LOG_MAIL); AddEntry(syslog_data.facilities,"news",LOG_NEWS); AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG); AddEntry(syslog_data.facilities,"user",LOG_USER); AddEntry(syslog_data.facilities,"uucp",LOG_UUCP); AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0); AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1); AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2); AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3); AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4); AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5); AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6); AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7); syslog_data.priorities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS); AddEntry(syslog_data.priorities,"emerg",LOG_EMERG); AddEntry(syslog_data.priorities,"alert",LOG_ALERT); AddEntry(syslog_data.priorities,"crit",LOG_CRIT); AddEntry(syslog_data.priorities,"err",LOG_ERR); AddEntry(syslog_data.priorities,"error",LOG_ERR); AddEntry(syslog_data.priorities,"warning",LOG_WARNING); AddEntry(syslog_data.priorities,"notice",LOG_NOTICE); AddEntry(syslog_data.priorities,"info",LOG_INFO); AddEntry(syslog_data.priorities,"debug",LOG_DEBUG); } /* * sys_log * * Usage: * sys_log ?-facility f -options o -ident i? priority message * * facility - kernel, cron, authpriv, mail, local0, local1, daemon, local2, * news, local3, local4, local5, local6, syslog, local7, auth, uucp, lpr, user * options - list with any of { CONS NDELAY PERROR PID ODELAY NOWAIT } * ident - ident is prepended to every message, and is typically the program name * priority - info, alert, emerg, err, notice, warning, error, crit, debug * * Returns: * nothing */ static int SysLog(ClientData data,Tcl_Interp *interp,int argc,char **argv) { int i,priority = 0; Tcl_HashEntry *entry; if(argc < 1) { Tcl_AppendResult(interp,argv[0]," ?-facility f -options o -ident i? priority message",0); return TCL_ERROR; } for(i = 1;i < argc-1;) { if(!strcmp(argv[i],"-facility")) { entry = Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]); if(!entry) { Tcl_AppendResult(interp,"wrong facility ",argv[i+1],": available facilities: ",NULL); ListHash(interp,syslog_data.facilities); return TCL_ERROR; } syslog_data.facility = (int)Tcl_GetHashValue(entry); closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-options")) { syslog_data.options = 0; if(strstr(argv[i+1],"CONS")) syslog_data.options |= LOG_CONS; if(strstr(argv[i+1],"NDELAY")) syslog_data.options |= LOG_NDELAY; if(strstr(argv[i+1],"PERROR")) syslog_data.options |= LOG_PERROR; if(strstr(argv[i+1],"PID")) syslog_data.options |= LOG_PID; if(strstr(argv[i+1],"ODELAY")) syslog_data.options |= LOG_ODELAY; if(strstr(argv[i+1],"NOWAIT")) syslog_data.options |= LOG_NOWAIT; closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-ident")) { memset(syslog_data.ident,0,sizeof(syslog_data.ident)); strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1); closelog(); syslog_data.opened = 0; i += 2; continue; } break; } if(i < argc) { entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]); if(!entry) { Tcl_AppendResult(interp,"wrong level ",argv[i],": available levels: ",NULL); ListHash(interp,syslog_data.priorities); return TCL_ERROR; } priority = (int)Tcl_GetHashValue(entry); i++; } if(i < argc) { if(!syslog_data.opened) { openlog(syslog_data.ident,syslog_data.options,syslog_data.facility); syslog_data.opened = 1; } syslog(priority,argv[i]); return TCL_OK; } return TCL_OK; } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 16:44 Message: Logged In: YES user_id=87254 Oops, looks like I forgot to attach the patch... Here it is. Yes, the server will attempt to restart a number of times before giving up. I think this is as appropriate for config errors as it is for others, then there doesn't need to be a dual-standard fatal error. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 11:12 Message: Logged In: YES user_id=95086 I'd rather wait for Stephens code and see what's done there and then make final changes and commit. It should not take long. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 11:07 Message: Logged In: YES user_id=184124 Then i do not see any reason not committing it, we can have discussions about it for another couple of years, but until we have it and use it, it is just discussions. We aready agreed on having it in the core. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 11:00 Message: Logged In: YES user_id=95086 Stephen, You mind uploading your patch changes so I can peek into it? I think we should soon settle on some solution and move on. Vlad, The patch already does that. After 16 unsuccessfull restarts the watchdog exits. Between every restart we wait 1, 2, 4, 8, 16, 32, .... 16384 seconds and then exit. This would be: 32767 seconds or about 9 hours. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 07:53 Message: Logged In: YES user_id=184124 I would say, repeat configured number of time and then exit. This way all socket related problems will be cleared after couple of restartes, config problems will make server exit . ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 07:41 Message: Logged In: YES user_id=95086 Well, because I could not differentiate between config and later runtime errors, I thought its wiser to abort rather to repeatedly restart broken server. One possibility is to add different Ns_Fatal call like: Ns_FatalEx(int exitcode, char *fmt, ...); Or? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 07:12 Message: Logged In: YES user_id=87254 Are you sure Ns_Fatal should not restart the server? Many of the fatal errors are caused by bad configuration and I can see why you might want the server to exit immediately. But there are also runtime fatal errors, and here I'd expect the server to be restarted. Some config errors are due to external factors like missing directories or file system permissions and it would be nice for the server to come back up as soon as the issue resolved itself. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 02:52 Message: Logged In: YES user_id=95086 Calling watchdog before or after chroot has no real implications I believe. Also, when the server exits with Ns_Fatal, it hsould not be restarted, I will look into your changes today. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-10 23:27 Message: Logged In: YES user_id=87254 Oh yeah, and I moved the watchdog stuff eaven earlier, before the prebind and chroot. Hmm, now that I think about it it's only the prebind that really needs to happen after the watchdog is started, to ensure the sockets are always in a sane state... ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-10 23:17 Message: Logged In: YES user_id=87254 Am I reading this right? It looks like if the server calls Ns_Fatal() it will exit with code 1, but 0 and 1 are treated as OK exit codes and the server will not be restarted. I've attached a patch which changes the above, fixes the pid problem in a different way because I completely forgot you already posted below about that small glitch, use Ns_ParseObjv(), adds the -w switch, and a couple of small name changes. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 13:09 Message: Logged In: YES user_id=95086 This is correct. As soon as we agree on some implementation, I will put the -i processing and avoid starting watchdog for inittab starts. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 12:48 Message: Logged In: YES user_id=184124 There still should be possibility to run nsd from inittab, so when -i switch is given, no watchdog should be running, let /sbin/init to handle restarts ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 09:19 Message: Logged In: YES user_id=95086 small glich: After applying the patch, change nsmain.c from: nsconf.pid = serverPid: to nsconf.pid = getpid(); otherwise the pid file will contain the bogus server pid if the watchdog restarted it later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 08:26 Message: Logged In: YES user_id=95086 Another try... Important to note: SIGKILL (signal 9) cannot be handled hence if somebody kills the watchdog with SIGKILL, the server will be left lingering w/o the watchdog. This is important to know. I do not see any possibility how to recover in such cases (i.e. how to stop the server). Apart from that, all objections from Stephen are taken into account. Please try again. A new copy of watchdog.patch file is attached. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-07 22:20 Message: Logged In: YES user_id=184124 I just found this code in my old server sources, i just chnaged internal name to Ns/NS, used to run pretty stable. ------------------------------------------------------------------- #define NS_EXIT 99 static void NsTerminate(int sig) { printf("NS[%d] signal %d received...",getpid(),sig); // Kill the server with the same signal if(nsPid > 0) kill(nsPid,sig); // Exit in case of fatal signal if(sig != SIGHUP) { printf("NS[%d] terminating...",getpid()); exit(0); } // Reassign signal handler signal(sig,NsTerminate); } void NsWatchdog(int argc, char *argv[], char *envp[]) { int failcount = 0; time_t start; int status; pid_t pid; signal(SIGTTOU,SIG_IGN); signal(SIGTTIN,SIG_IGN); signal(SIGTSTP,SIG_IGN); signal(SIGPIPE,SIG_IGN); signal(SIGQUIT,SIG_IGN); signal(SIGHUP,NsTerminate); signal(SIGINT,NsTerminate); signal(SIGTERM,NsTerminate); // Go background if((pid = fork())) { if(pid < 0) err_logger("warpConfigure: fork: %s",strerror(errno)); exit(0); } setsid(); for(;;) { // Execute the real server nsPid = fork(); // Child, continue as server if(nsPid == 0) { exit(nsMain(argc, argv, ServerInit)); } /* parent, behaves like a guardian */ time(&start); printf("NS[%d] server process started",getpid()); pid = waitpid(-1, &status, 0); if(WIFEXITED(status)) printf("NS[%d] child process exited with status %d",pid,WEXITSTATUS(status)); else if(WIFSIGNALED(status)) printf("NS[%d] child process exited due to signal %d",pid,WTERMSIG(status)); else printf("NS[%d] child process exited", pid); // Special exit code if(WIFEXITED(status)) { if(WEXITSTATUS(status) == NS_EXIT) { printf("NS[%d] child configuration error, exiting",getpid()); exit(0); } else if(WEXITSTATUS(status) == SIGHUP) { } else { if(time(0) - start < 10) failcount++; if(failcount == 10) { printf("Exiting due to repeated, frequent failures"); exit(1); } } } sleep(3); } } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-07 21:23 Message: Logged In: YES user_id=87254 NsWatchdog() is called after the server drops root privs, so both the watchdog and the server run as the defined user. What happens if the server dies and on restart needs to rebind privileged ports? A ps listing shows two running processes, parent and child. If I kill either one, the watchdog dies, the server continues to run. If I kill -9 the parent, the child continues to run. If I kill -9 the child, the server is restarted. Something seems not quite right here... I'm a bit confused about how the code works. For example, NsWatchdog() seems to ignore all of it's arguments? Here's the code which calls it: if (mode == 0) { i = ns_fork(); if (i < 0) { Ns_Fatal("nsmain: fork() failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } setsid(); i = NsWatchdog(argc, argv, initProc); if (i < 0) { Ns_Fatal("nsmain: watchdog failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } nsconf.pid = getpid(); } NsWatchdog() says it returns the worker pid, it also sets the global variable wpid. The code above ignores the returned value, and the global variable, and instead calls getpid()... The variable 'i' is existing code, but still somehow doesn't seem suitable... Could that Ns_Fatal() above be moved into the NsWatchdog() function? I think maybe some comment is needed here. The code structure is very like that just above where ns_fork() is called, but this function will return *multiple* times, right? This is kind of suprising, and an extra twist on the already confusing fork() semantics (or maybe it's just me who gets confused by fork...). A return value of 0 here is a 'request for orderly shutdown', right? How about some more logging to syslog? For example, distinguish between start and restart. Mention when the MAX_SLEEP_PERIOD has been reached, etc. Couple of small things: Can we refer to 'the server', rather than 'the worker'? Worker and Watchdog begin with 'w', and so does the global variable wpid... Maybe serverPid? NsWatchdog is a static function, it doesn't need the Ns prefix. In NsWatchdog(), the variable 'run' should be something like 'nretries', 'nap' should be something more like 'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS. The comment for WaitForWorker() is misslabeled. Should SigHandler() be called WatchdogSigHandler()? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-06 16:03 Message: Logged In: YES user_id=184124 Looks good to me ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 05:06 Message: Logged In: YES user_id=95086 Ah, correction: The restart option sends SIGINT to the worker process which causes the watchdog to restart it. And, the patchfile is now attached! ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 05:03 Message: Logged In: YES user_id=95086 Here is the patch. I have added "-restart" option to "ns_shutdown". It is rather clumsy to parse but should do. We should rewrite this with your args parsing routine. The restart option sends SIGTERM to the worker process which causes the watchdog to restart it. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 16:46 Message: Logged In: YES user_id=87254 I don't think this has to hide behind a config option. It's either a good idea or it's not. Sounds good to me. Is there a patch? I'm wondering about some of the implementation details... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-17 10:53:36
|
Feature Requests item #1159307, was opened at 2005-03-08 12:41 Message generated for change (Comment added) made by sdeasey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159307&group_id=130646 Category: None Group: None Status: Closed Resolution: None Priority: 5 Submitted By: Vlad Seryakov (seryakov) Assigned to: Vlad Seryakov (seryakov) Summary: Move nsext, nspd from core to modules directory Initial Comment: Last time i used those modules was when i used sybase driver, actually those things were created for sybase drivers only. Except that, all other DB modules uses native access and even for sybase FreeTSD exists with working module(i use it for SQL Server access). No point keeping this. ---------------------------------------------------------------------- >Comment By: Stephen Deasey (sdeasey) Date: 2005-03-17 03:53 Message: Logged In: YES user_id=87254 Oh, I forgot that cvs doesn't let you delete old directories. No big deal, SF removed them for us: http://sourceforge.net/tracker/index.php?func=detail&aid=1164303&group_id=1&atid=200001 ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-13 10:17 Message: Logged In: YES user_id=184124 moved ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-12 18:41 Message: Logged In: YES user_id=87254 You can file a SourceForge request ticket to get them to move stuff about, but I don't think this is neccessary. We have no history to preserve. Just remove these module from their current location and import them fresh into a modules directory. So, /cvsroot/naviserver/naviserver is the current location of the server, and /cvsroot/naviserver/modules/nsext will be the location of the nsext module. Keeping all non-core modules in their own directory will make it easier to check out all at once, generate a website, etc. I think the Makefile will need touching up to build out of core. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-12 17:48 Message: Logged In: YES user_id=184124 We do not have contrib or modules directory, but yes, we move it from the core. I am not sure how to do this with CVS. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-12 17:32 Message: Logged In: YES user_id=87254 If by "get rid of" you mean extract into a non-core module, I'm all for this. Sounds good to me. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 15:51 Message: Logged In: YES user_id=95086 I have written a Tcl threading extension and part of it are thread shared arrays (aka nsv's) with a bind option to external key/value databases (like gdbm). So we actually store all things in thread shared arrays which are bound to a persistent gdbm databases on the filesystem. It's all transparent to the application. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 13:21 Message: Logged In: YES user_id=184124 Not related to nsext, just out of curiosity, how do you keep any state especially between multiple servers? ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 13:13 Message: Logged In: YES user_id=95086 We do not use any database modules at all hence I do not really care. I do think that any important database vendor has thread-safe libraries nowdays so the task of writing an in-process driver should not be a problem any more. Besides, AFAIK, all popular databases have already a corresponding in-process driver implementation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159307&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-15 13:54:11
|
Feature Requests item #1156875, was opened at 2005-03-04 20:03 Message generated for change (Comment added) made by vasiljevic You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Watchdog process restarts failed server Initial Comment: We have been using this for quite some time and it proved extremely useful. We doublefork the nsd process and make the first forked instance control the second. The first one (the watchdog) reacts on exit codes and signals caught during the watch and correspondingly restarts the second instance (the worker). Also, we have added the the "-restart" option to the "ns_shutdown" command. This just sends the SIGINT to the worker process. The watchdog is handling this signal and respawns the worker automatically. During operation, the watchdog logs events and their cause into the system log file. This looks like: Feb 28 04:00:05 Develop nsd[19400]: worker: started. Mar 1 04:00:13 Develop nsd[4475]: watchdog: worker 19400 exited (2). Mar 1 04:00:15 Develop nsd[21290]: worker: started. Mar 1 04:00:18 Develop nsd[14705]: watchdog: worker 19399 exited (2). Mar 1 04:00:20 Develop nsd[21300]: worker: started. We have done all the changes with "--enable-watchdog" so anybody who needs this feature will have to compile with this option. ---------------------------------------------------------------------- >Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-15 14:54 Message: Logged In: YES user_id=95086 Stephen, why do you call _exit(0) instead of returning control to caller routine? Also.. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. Restart time doubling is limited to 64 seconds (insane, should rather be 60 seconds, heh) so it should not double after reaching 60 secs. After that time we constantly wait 60 secs between restarts. With the server retry of 16 times and we will wait 1, 2, 4, 8, 16, 32, 64 (i.e. total of 127) seconds and after that 8 times 64 secs (= 512 total ) secs and eventually abort. All together, that will be 640 secs = about 10 minutes wait/retry. After that it will exit. I think if we extend retry count from 16 to 256 it will be pretty enough (about 4 hours) where one can fix the failure cause. Otherwise, I see syntactic problems of how to actually specify the retries parameter. Of curse, you can do something like bin/nsd -w -r 16 but the "-r" (restart) would really have no meaning w/o -w and I do not like this very much indeed. I'm really for bumping the retry count higher (127 or 256 tries). ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-15 01:50 Message: Logged In: YES user_id=184124 Last time addition: Can we make syslog function public, Ns_Syslog? Also Tcl interface to it would be usefull as well, below is syslog from my nssys module, just a suggestion, no rush with this: static struct syslog_data { int opened; int facility; int options; char ident[32]; Tcl_HashTable *priorities; Tcl_HashTable *facilities; } syslog_data; /* Syslog initialization */ void SyslogInit() { memset(&syslog_data,0,sizeof(syslog_data)); syslog_data.options = LOG_PID; syslog_data.facility = LOG_USER; if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY))) strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1); syslog_data.facilities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS); AddEntry(syslog_data.facilities,"auth",LOG_AUTH); AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV); AddEntry(syslog_data.facilities,"cron",LOG_CRON); AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON); AddEntry(syslog_data.facilities,"kernel",LOG_KERN); AddEntry(syslog_data.facilities,"lpr",LOG_LPR); AddEntry(syslog_data.facilities,"mail",LOG_MAIL); AddEntry(syslog_data.facilities,"news",LOG_NEWS); AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG); AddEntry(syslog_data.facilities,"user",LOG_USER); AddEntry(syslog_data.facilities,"uucp",LOG_UUCP); AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0); AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1); AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2); AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3); AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4); AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5); AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6); AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7); syslog_data.priorities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS); AddEntry(syslog_data.priorities,"emerg",LOG_EMERG); AddEntry(syslog_data.priorities,"alert",LOG_ALERT); AddEntry(syslog_data.priorities,"crit",LOG_CRIT); AddEntry(syslog_data.priorities,"err",LOG_ERR); AddEntry(syslog_data.priorities,"error",LOG_ERR); AddEntry(syslog_data.priorities,"warning",LOG_WARNING); AddEntry(syslog_data.priorities,"notice",LOG_NOTICE); AddEntry(syslog_data.priorities,"info",LOG_INFO); AddEntry(syslog_data.priorities,"debug",LOG_DEBUG); } /* * sys_log * * Usage: * sys_log ?-facility f -options o -ident i? priority message * * facility - kernel, cron, authpriv, mail, local0, local1, daemon, local2, * news, local3, local4, local5, local6, syslog, local7, auth, uucp, lpr, user * options - list with any of { CONS NDELAY PERROR PID ODELAY NOWAIT } * ident - ident is prepended to every message, and is typically the program name * priority - info, alert, emerg, err, notice, warning, error, crit, debug * * Returns: * nothing */ static int SysLog(ClientData data,Tcl_Interp *interp,int argc,char **argv) { int i,priority = 0; Tcl_HashEntry *entry; if(argc < 1) { Tcl_AppendResult(interp,argv[0]," ?-facility f -options o -ident i? priority message",0); return TCL_ERROR; } for(i = 1;i < argc-1;) { if(!strcmp(argv[i],"-facility")) { entry = Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]); if(!entry) { Tcl_AppendResult(interp,"wrong facility ",argv[i+1],": available facilities: ",NULL); ListHash(interp,syslog_data.facilities); return TCL_ERROR; } syslog_data.facility = (int)Tcl_GetHashValue(entry); closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-options")) { syslog_data.options = 0; if(strstr(argv[i+1],"CONS")) syslog_data.options |= LOG_CONS; if(strstr(argv[i+1],"NDELAY")) syslog_data.options |= LOG_NDELAY; if(strstr(argv[i+1],"PERROR")) syslog_data.options |= LOG_PERROR; if(strstr(argv[i+1],"PID")) syslog_data.options |= LOG_PID; if(strstr(argv[i+1],"ODELAY")) syslog_data.options |= LOG_ODELAY; if(strstr(argv[i+1],"NOWAIT")) syslog_data.options |= LOG_NOWAIT; closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-ident")) { memset(syslog_data.ident,0,sizeof(syslog_data.ident)); strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1); closelog(); syslog_data.opened = 0; i += 2; continue; } break; } if(i < argc) { entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]); if(!entry) { Tcl_AppendResult(interp,"wrong level ",argv[i],": available levels: ",NULL); ListHash(interp,syslog_data.priorities); return TCL_ERROR; } priority = (int)Tcl_GetHashValue(entry); i++; } if(i < argc) { if(!syslog_data.opened) { openlog(syslog_data.ident,syslog_data.options,syslog_data.facility); syslog_data.opened = 1; } syslog(priority,argv[i]); return TCL_OK; } return TCL_OK; } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-15 00:44 Message: Logged In: YES user_id=87254 Oops, looks like I forgot to attach the patch... Here it is. Yes, the server will attempt to restart a number of times before giving up. I think this is as appropriate for config errors as it is for others, then there doesn't need to be a dual-standard fatal error. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 19:12 Message: Logged In: YES user_id=95086 I'd rather wait for Stephens code and see what's done there and then make final changes and commit. It should not take long. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 19:07 Message: Logged In: YES user_id=184124 Then i do not see any reason not committing it, we can have discussions about it for another couple of years, but until we have it and use it, it is just discussions. We aready agreed on having it in the core. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 19:00 Message: Logged In: YES user_id=95086 Stephen, You mind uploading your patch changes so I can peek into it? I think we should soon settle on some solution and move on. Vlad, The patch already does that. After 16 unsuccessfull restarts the watchdog exits. Between every restart we wait 1, 2, 4, 8, 16, 32, .... 16384 seconds and then exit. This would be: 32767 seconds or about 9 hours. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 15:53 Message: Logged In: YES user_id=184124 I would say, repeat configured number of time and then exit. This way all socket related problems will be cleared after couple of restartes, config problems will make server exit . ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 15:41 Message: Logged In: YES user_id=95086 Well, because I could not differentiate between config and later runtime errors, I thought its wiser to abort rather to repeatedly restart broken server. One possibility is to add different Ns_Fatal call like: Ns_FatalEx(int exitcode, char *fmt, ...); Or? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 15:12 Message: Logged In: YES user_id=87254 Are you sure Ns_Fatal should not restart the server? Many of the fatal errors are caused by bad configuration and I can see why you might want the server to exit immediately. But there are also runtime fatal errors, and here I'd expect the server to be restarted. Some config errors are due to external factors like missing directories or file system permissions and it would be nice for the server to come back up as soon as the issue resolved itself. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 10:52 Message: Logged In: YES user_id=95086 Calling watchdog before or after chroot has no real implications I believe. Also, when the server exits with Ns_Fatal, it hsould not be restarted, I will look into your changes today. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 07:27 Message: Logged In: YES user_id=87254 Oh yeah, and I moved the watchdog stuff eaven earlier, before the prebind and chroot. Hmm, now that I think about it it's only the prebind that really needs to happen after the watchdog is started, to ensure the sockets are always in a sane state... ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 07:17 Message: Logged In: YES user_id=87254 Am I reading this right? It looks like if the server calls Ns_Fatal() it will exit with code 1, but 0 and 1 are treated as OK exit codes and the server will not be restarted. I've attached a patch which changes the above, fixes the pid problem in a different way because I completely forgot you already posted below about that small glitch, use Ns_ParseObjv(), adds the -w switch, and a couple of small name changes. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 21:09 Message: Logged In: YES user_id=95086 This is correct. As soon as we agree on some implementation, I will put the -i processing and avoid starting watchdog for inittab starts. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 20:48 Message: Logged In: YES user_id=184124 There still should be possibility to run nsd from inittab, so when -i switch is given, no watchdog should be running, let /sbin/init to handle restarts ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 17:19 Message: Logged In: YES user_id=95086 small glich: After applying the patch, change nsmain.c from: nsconf.pid = serverPid: to nsconf.pid = getpid(); otherwise the pid file will contain the bogus server pid if the watchdog restarted it later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 16:26 Message: Logged In: YES user_id=95086 Another try... Important to note: SIGKILL (signal 9) cannot be handled hence if somebody kills the watchdog with SIGKILL, the server will be left lingering w/o the watchdog. This is important to know. I do not see any possibility how to recover in such cases (i.e. how to stop the server). Apart from that, all objections from Stephen are taken into account. Please try again. A new copy of watchdog.patch file is attached. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 06:20 Message: Logged In: YES user_id=184124 I just found this code in my old server sources, i just chnaged internal name to Ns/NS, used to run pretty stable. ------------------------------------------------------------------- #define NS_EXIT 99 static void NsTerminate(int sig) { printf("NS[%d] signal %d received...",getpid(),sig); // Kill the server with the same signal if(nsPid > 0) kill(nsPid,sig); // Exit in case of fatal signal if(sig != SIGHUP) { printf("NS[%d] terminating...",getpid()); exit(0); } // Reassign signal handler signal(sig,NsTerminate); } void NsWatchdog(int argc, char *argv[], char *envp[]) { int failcount = 0; time_t start; int status; pid_t pid; signal(SIGTTOU,SIG_IGN); signal(SIGTTIN,SIG_IGN); signal(SIGTSTP,SIG_IGN); signal(SIGPIPE,SIG_IGN); signal(SIGQUIT,SIG_IGN); signal(SIGHUP,NsTerminate); signal(SIGINT,NsTerminate); signal(SIGTERM,NsTerminate); // Go background if((pid = fork())) { if(pid < 0) err_logger("warpConfigure: fork: %s",strerror(errno)); exit(0); } setsid(); for(;;) { // Execute the real server nsPid = fork(); // Child, continue as server if(nsPid == 0) { exit(nsMain(argc, argv, ServerInit)); } /* parent, behaves like a guardian */ time(&start); printf("NS[%d] server process started",getpid()); pid = waitpid(-1, &status, 0); if(WIFEXITED(status)) printf("NS[%d] child process exited with status %d",pid,WEXITSTATUS(status)); else if(WIFSIGNALED(status)) printf("NS[%d] child process exited due to signal %d",pid,WTERMSIG(status)); else printf("NS[%d] child process exited", pid); // Special exit code if(WIFEXITED(status)) { if(WEXITSTATUS(status) == NS_EXIT) { printf("NS[%d] child configuration error, exiting",getpid()); exit(0); } else if(WEXITSTATUS(status) == SIGHUP) { } else { if(time(0) - start < 10) failcount++; if(failcount == 10) { printf("Exiting due to repeated, frequent failures"); exit(1); } } } sleep(3); } } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-08 05:23 Message: Logged In: YES user_id=87254 NsWatchdog() is called after the server drops root privs, so both the watchdog and the server run as the defined user. What happens if the server dies and on restart needs to rebind privileged ports? A ps listing shows two running processes, parent and child. If I kill either one, the watchdog dies, the server continues to run. If I kill -9 the parent, the child continues to run. If I kill -9 the child, the server is restarted. Something seems not quite right here... I'm a bit confused about how the code works. For example, NsWatchdog() seems to ignore all of it's arguments? Here's the code which calls it: if (mode == 0) { i = ns_fork(); if (i < 0) { Ns_Fatal("nsmain: fork() failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } setsid(); i = NsWatchdog(argc, argv, initProc); if (i < 0) { Ns_Fatal("nsmain: watchdog failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } nsconf.pid = getpid(); } NsWatchdog() says it returns the worker pid, it also sets the global variable wpid. The code above ignores the returned value, and the global variable, and instead calls getpid()... The variable 'i' is existing code, but still somehow doesn't seem suitable... Could that Ns_Fatal() above be moved into the NsWatchdog() function? I think maybe some comment is needed here. The code structure is very like that just above where ns_fork() is called, but this function will return *multiple* times, right? This is kind of suprising, and an extra twist on the already confusing fork() semantics (or maybe it's just me who gets confused by fork...). A return value of 0 here is a 'request for orderly shutdown', right? How about some more logging to syslog? For example, distinguish between start and restart. Mention when the MAX_SLEEP_PERIOD has been reached, etc. Couple of small things: Can we refer to 'the server', rather than 'the worker'? Worker and Watchdog begin with 'w', and so does the global variable wpid... Maybe serverPid? NsWatchdog is a static function, it doesn't need the Ns prefix. In NsWatchdog(), the variable 'run' should be something like 'nretries', 'nap' should be something more like 'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS. The comment for WaitForWorker() is misslabeled. Should SigHandler() be called WatchdogSigHandler()? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-07 00:03 Message: Logged In: YES user_id=184124 Looks good to me ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 13:06 Message: Logged In: YES user_id=95086 Ah, correction: The restart option sends SIGINT to the worker process which causes the watchdog to restart it. And, the patchfile is now attached! ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 13:03 Message: Logged In: YES user_id=95086 Here is the patch. I have added "-restart" option to "ns_shutdown". It is rather clumsy to parse but should do. We should rewrite this with your args parsing routine. The restart option sends SIGTERM to the worker process which causes the watchdog to restart it. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-05 00:46 Message: Logged In: YES user_id=87254 I don't think this has to hide behind a config option. It's either a good idea or it's not. Sounds good to me. Is there a patch? I'm wondering about some of the implementation details... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 |
From: SourceForge.net <no...@so...> - 2005-03-15 00:50:39
|
Feature Requests item #1156875, was opened at 2005-03-04 19:03 Message generated for change (Comment added) made by seryakov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Zoran Vasiljevic (vasiljevic) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Watchdog process restarts failed server Initial Comment: We have been using this for quite some time and it proved extremely useful. We doublefork the nsd process and make the first forked instance control the second. The first one (the watchdog) reacts on exit codes and signals caught during the watch and correspondingly restarts the second instance (the worker). Also, we have added the the "-restart" option to the "ns_shutdown" command. This just sends the SIGINT to the worker process. The watchdog is handling this signal and respawns the worker automatically. During operation, the watchdog logs events and their cause into the system log file. This looks like: Feb 28 04:00:05 Develop nsd[19400]: worker: started. Mar 1 04:00:13 Develop nsd[4475]: watchdog: worker 19400 exited (2). Mar 1 04:00:15 Develop nsd[21290]: worker: started. Mar 1 04:00:18 Develop nsd[14705]: watchdog: worker 19399 exited (2). Mar 1 04:00:20 Develop nsd[21300]: worker: started. We have done all the changes with "--enable-watchdog" so anybody who needs this feature will have to compile with this option. ---------------------------------------------------------------------- >Comment By: Vlad Seryakov (seryakov) Date: 2005-03-15 00:50 Message: Logged In: YES user_id=184124 Last time addition: Can we make syslog function public, Ns_Syslog? Also Tcl interface to it would be usefull as well, below is syslog from my nssys module, just a suggestion, no rush with this: static struct syslog_data { int opened; int facility; int options; char ident[32]; Tcl_HashTable *priorities; Tcl_HashTable *facilities; } syslog_data; /* Syslog initialization */ void SyslogInit() { memset(&syslog_data,0,sizeof(syslog_data)); syslog_data.options = LOG_PID; syslog_data.facility = LOG_USER; if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY))) strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1); syslog_data.facilities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS); AddEntry(syslog_data.facilities,"auth",LOG_AUTH); AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV); AddEntry(syslog_data.facilities,"cron",LOG_CRON); AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON); AddEntry(syslog_data.facilities,"kernel",LOG_KERN); AddEntry(syslog_data.facilities,"lpr",LOG_LPR); AddEntry(syslog_data.facilities,"mail",LOG_MAIL); AddEntry(syslog_data.facilities,"news",LOG_NEWS); AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG); AddEntry(syslog_data.facilities,"user",LOG_USER); AddEntry(syslog_data.facilities,"uucp",LOG_UUCP); AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0); AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1); AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2); AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3); AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4); AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5); AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6); AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7); syslog_data.priorities = (Tcl_HashTable *)Tcl_Alloc(sizeof(Tcl_HashTable)); Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS); AddEntry(syslog_data.priorities,"emerg",LOG_EMERG); AddEntry(syslog_data.priorities,"alert",LOG_ALERT); AddEntry(syslog_data.priorities,"crit",LOG_CRIT); AddEntry(syslog_data.priorities,"err",LOG_ERR); AddEntry(syslog_data.priorities,"error",LOG_ERR); AddEntry(syslog_data.priorities,"warning",LOG_WARNING); AddEntry(syslog_data.priorities,"notice",LOG_NOTICE); AddEntry(syslog_data.priorities,"info",LOG_INFO); AddEntry(syslog_data.priorities,"debug",LOG_DEBUG); } /* * sys_log * * Usage: * sys_log ?-facility f -options o -ident i? priority message * * facility - kernel, cron, authpriv, mail, local0, local1, daemon, local2, * news, local3, local4, local5, local6, syslog, local7, auth, uucp, lpr, user * options - list with any of { CONS NDELAY PERROR PID ODELAY NOWAIT } * ident - ident is prepended to every message, and is typically the program name * priority - info, alert, emerg, err, notice, warning, error, crit, debug * * Returns: * nothing */ static int SysLog(ClientData data,Tcl_Interp *interp,int argc,char **argv) { int i,priority = 0; Tcl_HashEntry *entry; if(argc < 1) { Tcl_AppendResult(interp,argv[0]," ?-facility f -options o -ident i? priority message",0); return TCL_ERROR; } for(i = 1;i < argc-1;) { if(!strcmp(argv[i],"-facility")) { entry = Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]); if(!entry) { Tcl_AppendResult(interp,"wrong facility ",argv[i+1],": available facilities: ",NULL); ListHash(interp,syslog_data.facilities); return TCL_ERROR; } syslog_data.facility = (int)Tcl_GetHashValue(entry); closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-options")) { syslog_data.options = 0; if(strstr(argv[i+1],"CONS")) syslog_data.options |= LOG_CONS; if(strstr(argv[i+1],"NDELAY")) syslog_data.options |= LOG_NDELAY; if(strstr(argv[i+1],"PERROR")) syslog_data.options |= LOG_PERROR; if(strstr(argv[i+1],"PID")) syslog_data.options |= LOG_PID; if(strstr(argv[i+1],"ODELAY")) syslog_data.options |= LOG_ODELAY; if(strstr(argv[i+1],"NOWAIT")) syslog_data.options |= LOG_NOWAIT; closelog(); syslog_data.opened = 0; i += 2; continue; } if(!strcmp(argv[i],"-ident")) { memset(syslog_data.ident,0,sizeof(syslog_data.ident)); strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1); closelog(); syslog_data.opened = 0; i += 2; continue; } break; } if(i < argc) { entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]); if(!entry) { Tcl_AppendResult(interp,"wrong level ",argv[i],": available levels: ",NULL); ListHash(interp,syslog_data.priorities); return TCL_ERROR; } priority = (int)Tcl_GetHashValue(entry); i++; } if(i < argc) { if(!syslog_data.opened) { openlog(syslog_data.ident,syslog_data.options,syslog_data.facility); syslog_data.opened = 1; } syslog(priority,argv[i]); return TCL_OK; } return TCL_OK; } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 23:44 Message: Logged In: YES user_id=87254 Oops, looks like I forgot to attach the patch... Here it is. Yes, the server will attempt to restart a number of times before giving up. I think this is as appropriate for config errors as it is for others, then there doesn't need to be a dual-standard fatal error. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 18:12 Message: Logged In: YES user_id=95086 I'd rather wait for Stephens code and see what's done there and then make final changes and commit. It should not take long. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 18:07 Message: Logged In: YES user_id=184124 Then i do not see any reason not committing it, we can have discussions about it for another couple of years, but until we have it and use it, it is just discussions. We aready agreed on having it in the core. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 18:00 Message: Logged In: YES user_id=95086 Stephen, You mind uploading your patch changes so I can peek into it? I think we should soon settle on some solution and move on. Vlad, The patch already does that. After 16 unsuccessfull restarts the watchdog exits. Between every restart we wait 1, 2, 4, 8, 16, 32, .... 16384 seconds and then exit. This would be: 32767 seconds or about 9 hours. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-14 14:53 Message: Logged In: YES user_id=184124 I would say, repeat configured number of time and then exit. This way all socket related problems will be cleared after couple of restartes, config problems will make server exit . ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 14:41 Message: Logged In: YES user_id=95086 Well, because I could not differentiate between config and later runtime errors, I thought its wiser to abort rather to repeatedly restart broken server. One possibility is to add different Ns_Fatal call like: Ns_FatalEx(int exitcode, char *fmt, ...); Or? ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-14 14:12 Message: Logged In: YES user_id=87254 Are you sure Ns_Fatal should not restart the server? Many of the fatal errors are caused by bad configuration and I can see why you might want the server to exit immediately. But there are also runtime fatal errors, and here I'd expect the server to be restarted. Some config errors are due to external factors like missing directories or file system permissions and it would be nice for the server to come back up as soon as the issue resolved itself. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-14 09:52 Message: Logged In: YES user_id=95086 Calling watchdog before or after chroot has no real implications I believe. Also, when the server exits with Ns_Fatal, it hsould not be restarted, I will look into your changes today. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 06:27 Message: Logged In: YES user_id=87254 Oh yeah, and I moved the watchdog stuff eaven earlier, before the prebind and chroot. Hmm, now that I think about it it's only the prebind that really needs to happen after the watchdog is started, to ensure the sockets are always in a sane state... ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-11 06:17 Message: Logged In: YES user_id=87254 Am I reading this right? It looks like if the server calls Ns_Fatal() it will exit with code 1, but 0 and 1 are treated as OK exit codes and the server will not be restarted. I've attached a patch which changes the above, fixes the pid problem in a different way because I completely forgot you already posted below about that small glitch, use Ns_ParseObjv(), adds the -w switch, and a couple of small name changes. I don't want to go overboard with command line switches, but an option to specify not to give up trying to restart the server would be nice. If you have that, you also need to turn off the restart timeout doubling at a certain point. I don't know what the cleanest way to do that is. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 20:09 Message: Logged In: YES user_id=95086 This is correct. As soon as we agree on some implementation, I will put the -i processing and avoid starting watchdog for inittab starts. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 19:48 Message: Logged In: YES user_id=184124 There still should be possibility to run nsd from inittab, so when -i switch is given, no watchdog should be running, let /sbin/init to handle restarts ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 16:19 Message: Logged In: YES user_id=95086 small glich: After applying the patch, change nsmain.c from: nsconf.pid = serverPid: to nsconf.pid = getpid(); otherwise the pid file will contain the bogus server pid if the watchdog restarted it later. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-08 15:26 Message: Logged In: YES user_id=95086 Another try... Important to note: SIGKILL (signal 9) cannot be handled hence if somebody kills the watchdog with SIGKILL, the server will be left lingering w/o the watchdog. This is important to know. I do not see any possibility how to recover in such cases (i.e. how to stop the server). Apart from that, all objections from Stephen are taken into account. Please try again. A new copy of watchdog.patch file is attached. ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-08 05:20 Message: Logged In: YES user_id=184124 I just found this code in my old server sources, i just chnaged internal name to Ns/NS, used to run pretty stable. ------------------------------------------------------------------- #define NS_EXIT 99 static void NsTerminate(int sig) { printf("NS[%d] signal %d received...",getpid(),sig); // Kill the server with the same signal if(nsPid > 0) kill(nsPid,sig); // Exit in case of fatal signal if(sig != SIGHUP) { printf("NS[%d] terminating...",getpid()); exit(0); } // Reassign signal handler signal(sig,NsTerminate); } void NsWatchdog(int argc, char *argv[], char *envp[]) { int failcount = 0; time_t start; int status; pid_t pid; signal(SIGTTOU,SIG_IGN); signal(SIGTTIN,SIG_IGN); signal(SIGTSTP,SIG_IGN); signal(SIGPIPE,SIG_IGN); signal(SIGQUIT,SIG_IGN); signal(SIGHUP,NsTerminate); signal(SIGINT,NsTerminate); signal(SIGTERM,NsTerminate); // Go background if((pid = fork())) { if(pid < 0) err_logger("warpConfigure: fork: %s",strerror(errno)); exit(0); } setsid(); for(;;) { // Execute the real server nsPid = fork(); // Child, continue as server if(nsPid == 0) { exit(nsMain(argc, argv, ServerInit)); } /* parent, behaves like a guardian */ time(&start); printf("NS[%d] server process started",getpid()); pid = waitpid(-1, &status, 0); if(WIFEXITED(status)) printf("NS[%d] child process exited with status %d",pid,WEXITSTATUS(status)); else if(WIFSIGNALED(status)) printf("NS[%d] child process exited due to signal %d",pid,WTERMSIG(status)); else printf("NS[%d] child process exited", pid); // Special exit code if(WIFEXITED(status)) { if(WEXITSTATUS(status) == NS_EXIT) { printf("NS[%d] child configuration error, exiting",getpid()); exit(0); } else if(WEXITSTATUS(status) == SIGHUP) { } else { if(time(0) - start < 10) failcount++; if(failcount == 10) { printf("Exiting due to repeated, frequent failures"); exit(1); } } } sleep(3); } } ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-08 04:23 Message: Logged In: YES user_id=87254 NsWatchdog() is called after the server drops root privs, so both the watchdog and the server run as the defined user. What happens if the server dies and on restart needs to rebind privileged ports? A ps listing shows two running processes, parent and child. If I kill either one, the watchdog dies, the server continues to run. If I kill -9 the parent, the child continues to run. If I kill -9 the child, the server is restarted. Something seems not quite right here... I'm a bit confused about how the code works. For example, NsWatchdog() seems to ignore all of it's arguments? Here's the code which calls it: if (mode == 0) { i = ns_fork(); if (i < 0) { Ns_Fatal("nsmain: fork() failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } setsid(); i = NsWatchdog(argc, argv, initProc); if (i < 0) { Ns_Fatal("nsmain: watchdog failed: '%s'", strerror(errno)); } if (i > 0) { return 0; } nsconf.pid = getpid(); } NsWatchdog() says it returns the worker pid, it also sets the global variable wpid. The code above ignores the returned value, and the global variable, and instead calls getpid()... The variable 'i' is existing code, but still somehow doesn't seem suitable... Could that Ns_Fatal() above be moved into the NsWatchdog() function? I think maybe some comment is needed here. The code structure is very like that just above where ns_fork() is called, but this function will return *multiple* times, right? This is kind of suprising, and an extra twist on the already confusing fork() semantics (or maybe it's just me who gets confused by fork...). A return value of 0 here is a 'request for orderly shutdown', right? How about some more logging to syslog? For example, distinguish between start and restart. Mention when the MAX_SLEEP_PERIOD has been reached, etc. Couple of small things: Can we refer to 'the server', rather than 'the worker'? Worker and Watchdog begin with 'w', and so does the global variable wpid... Maybe serverPid? NsWatchdog is a static function, it doesn't need the Ns prefix. In NsWatchdog(), the variable 'run' should be something like 'nretries', 'nap' should be something more like 'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS. The comment for WaitForWorker() is misslabeled. Should SigHandler() be called WatchdogSigHandler()? ---------------------------------------------------------------------- Comment By: Vlad Seryakov (seryakov) Date: 2005-03-06 23:03 Message: Logged In: YES user_id=184124 Looks good to me ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 12:06 Message: Logged In: YES user_id=95086 Ah, correction: The restart option sends SIGINT to the worker process which causes the watchdog to restart it. And, the patchfile is now attached! ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2005-03-05 12:03 Message: Logged In: YES user_id=95086 Here is the patch. I have added "-restart" option to "ns_shutdown". It is rather clumsy to parse but should do. We should rewrite this with your args parsing routine. The restart option sends SIGTERM to the worker process which causes the watchdog to restart it. ---------------------------------------------------------------------- Comment By: Stephen Deasey (sdeasey) Date: 2005-03-04 23:46 Message: Logged In: YES user_id=87254 I don't think this has to hide behind a config option. It's either a good idea or it's not. Sounds good to me. Is there a patch? I'm wondering about some of the implementation details... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646 |