naviserver-tracker Mailing List for NaviServer (Page 9)

NaviServer, a high performance web server written in C and Tcl

Brought to you by: eide, gneumann, itannir, sdeasey, and 2 others

naviserver-tracker — SourceForge Tracker Notifications

You can subscribe to this list here.

2005	_Jan	_Feb (61)	_Mar (153)	_Apr (39)	_May (10)	_Jun (15)	_Jul (15)	_Aug (2)	_Sep	_Oct (17)	_Nov (2)	_Dec (13)
2006	_Jan (18)	_Feb (9)	_Mar	_Apr	_May	_Jun	_Jul (7)	_Aug (1)	_Sep (2)	_Oct	_Nov (1)	_Dec
2007	_Jan (8)	_Feb (3)	_Mar	_Apr	_May (2)	_Jun	_Jul (2)	_Aug	_Sep	_Oct	_Nov	_Dec
2008	_Jan	_Feb	_Mar	_Apr (6)	_May (1)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 7 8 9 10 11 .. 16 > >> (Page 9 of 16)

[ naviserver-Feature Requests-1156899 ] Add optional "numbytes" arg to ns_conncptofp

From: SourceForge.net <no...@so...> - 2005-03-18 07:50:22

Feature Requests item #1156899, was opened at 2005-03-04 20:30
Message generated for change (Comment added) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646

Category: Tcl-API
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add optional "numbytes" arg to ns_conncptofp

Initial Comment:
The current ns_conncptofp will copy the content to the 
file. The entire content. What is missing is to tell to the 
command to copy only part of the content. 
The proposed API would be: 
 
 ns_conncptofp channel ?numbytes? 
 
The numbytes can be: 
 ommited (procedure does what it did before) 
  >0 (will copy somany bytes up to contentLength or  
  end-of-data whatever comes first) 
 0 (will do nothing) 
 -1 (will behave as when argument is ommited)  
 
This command is one of those which support the 
old-compat connId argument. So the actual API is 
as of today: 
 
  ns_conncptofp ?connId? channel 
 
I would vote to scrap the connId parsing. 

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-18 08:50

Message:
Logged In: YES 
user_id=95086

Good point. Thanks for looking into this. I will now look into 
the reqPtr->avail instead of contentLength. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-18 01:39

Message:
Logged In: YES 
user_id=87254


This change introduces a new side-effect:  the contentLength
is adjusted down by the number of bytes coppied.  Does this
need to happen?  The C API doesn't do this.

I think reqPtr->avail needs to be checked rather then
connPtr->contentLength.  reqPtr->next and reqPtr->avail will
be managed by the C API so no need to adjust them here. 

Also, as the Tcl API may return less than you asked for
rather than error out like the C API, should it return the
number of bytes copied?



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 21:24

Message:
Logged In: YES 
user_id=95086

Added into CVS.

    ns_conncptofp ?-bytes tocopy? channel
 
The command will silently accept <=0 bytes and revert to
copying all that is there. Also, if more than available is
requested
it will silently copy all that is there. Otherwise, -bytes
bytes will 
be copied to the channel. Channel must be opened for writing
otherwise error is thrown.

Oh yes, Stephen, I love the arg parsing code This is really
a relief. After years any years of parsing... heh..


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 00:30

Message:
Logged In: YES 
user_id=87254


This seems reasonable.

Should it look like this though?
    ns_conncptofp ?-bytes tocopy? channel

The reason it is attractive to drop the connid arg is that
with two optional args it becomes a pain to parse.  I'm also
thinking of how ns_sendmail has been extended over time by
appending optional args.

The C version of this command returns an error if you ask
for more bytes than are available.  Should the Tcl version
silently truncate?  Maybe it should return the number of
bytes actually copied?

Does it make sense to ask for 0 (zero) bytes?  Maybe bytes
<= 0 should revert to default behaviour which is to copy all
that's available?

The existing error message "could not copy content (likely
client disconnect)" is wrong.  This might have been the case
in 3.x, but not 4.x.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646

[ naviserver-Feature Requests-1145957 ] Cookie support for C and Tcl

From: SourceForge.net <no...@so...> - 2005-03-18 04:55:50

Feature Requests item #1145957, was opened at 2005-02-22 03:13
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1145957&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Stephen Deasey (sdeasey)
Assigned to: Stephen Deasey (sdeasey)
Summary: Cookie support for C and Tcl

Initial Comment:

Atached is a patch which implements a simple cookie API
for both C and Tcl.  Here's what it looks like (the
formating will probably get mangled, sorry):

void Ns_ConnSetCookie(Ns_Conn *conn,  char *name, char
*value, int maxage);
void Ns_ConnSetSecureCookie(Ns_Conn *conn,  char *name,
char *value, int maxage);
void Ns_ConnSetCookieEx(Ns_Conn *conn,  char *name,
char *value, int maxage, char *domain, char *path, int
secure);
void Ns_ConnDeleteCookie(Ns_Conn *conn, char *name,
char *domain, char *path);
void Ns_ConnDeleteSecureCookie(Ns_Conn *conn, char
*name, char *domain, char *path);
char *Ns_ConnGetCookie(Ns_DString *dest, Ns_Conn *conn,
char *name);

ns_setcookie ?-secure bool? ?-domain d? ?-path p?
?-maxage secs? name data
ns_getcookie name ?default?
ns_deletecookie ?-secure bool? ?-domain d? ?-path p? name


As you can see it only tackles the basics, no comment
or version field.  Although I've never personaly had to
use those features.  

Cookies are a pre-requisite for the authentication
module which exercises the new autht/authz API I
haven't written yet...

Obviously, the string parsing in the GetCookie code
will have to be very carefully looked at before this
sees the Internet.  But in general, does this look like
it's on the right track?



----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-18 04:55

Message:
Logged In: YES 
user_id=184124

I uploaded modified patch which looks into all Cookies headers.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 23:25

Message:
Logged In: YES 
user_id=87254


Good call.  I've been using a simillar algorithm in Tcl only
and it works fine, but I guess that's because I've only been
storing session keys etc. which are quite small.  I'll have
to examine all the cookie headers.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-24 16:05

Message:
Logged In: YES 
user_id=184124

Looks good, the only issue i have, some browser can send
more than one Cookie header and you are looking only in the
first header. Also, this is new API, no problem adding it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1145957&group_id=130646

[ naviserver-Feature Requests-1159259 ] Add nscache module functionality

From: SourceForge.net <no...@so...> - 2005-03-18 04:33:53

Feature Requests item #1159259, was opened at 2005-03-08 18:30
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159259&group_id=130646

Category: Tcl-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Stephen Deasey (sdeasey)
Summary: Add nscache module functionality

Initial Comment:
This is really what needs to be done sine the server itself
does support the caches and what lacks is the Tcl access
to them. The nscache module (or its equivalent) deserves
to be part of the server distribution. 

----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-18 04:33

Message:
Logged In: YES 
user_id=184124

For a start we can just add tclcache.c into the core and
leter improve/rewrite caching if needed. Currenlty i am
pretty happy with this module functionality as is.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 19:48

Message:
Logged In: YES 
user_id=184124

Yes, i agree with it

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159259&group_id=130646

[ naviserver-Feature Requests-1151137 ] New driver API and Udp module

From: SourceForge.net <no...@so...> - 2005-03-18 04:31:03

Feature Requests item #1151137, was opened at 2005-02-24 16:01
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1151137&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Vlad Seryakov (seryakov)
Assigned to: Vlad Seryakov (seryakov)
Summary: New driver API and Udp module

Initial Comment:
Hi guys,

Attached is minor driver extensions which do not change
existing drivers  but add new functionality. There are
some cosmetic changes, like moving some fields in the
Ns_Sock/Ns_Driver structres so they can be accessed
publically and made some private functions public but
functionality preserved as before.

I included udp driver as an example of new API, and
also added ns_sha1 command in the tclmisc.c, it is just
one command and it is uses practically everywhere.

-------------------------------------------------------------------------
To test udp driver i use new ns_udp command:

ossweb:nscp 8> ns_udp send 127.0.0.1 5060 "GET /
HTTP/1.0\n\n"

HTTP/1.0 200 OK
MIME-Version: 1.0
Date: Thu, 24 Feb 2005 05:39:50 GMT
Server: NaviServer/4.0.10
Content-Type: text/html; charset=iso-8859-1
Content-Length: 661
Connection: close

<HEAD><TITLE>Seryakov's Family Intranet</TITLE></HEAD>
.....


----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-18 04:31

Message:
Logged In: YES 
user_id=184124

Do we have any conclusions about this? Soon i will have to
develop some application with support for SIP over UDP, i
would like to do it on Naviserver and not on my hacked AS.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 16:44

Message:
Logged In: YES 
user_id=184124

And again, i have nothing against it, i said it before, what
i am offering is a littl ebit different, for network based
protocols driver stuff is kay, i just want to be able to
submit conn threads from within the server, not through the
main driver.
If i have already request data, through shared memory, file
or other way, i want to submit wroker thread directly, not
by connecting to myself and submitting data using HTTP or
other protocol i will have to implement just to queue te
request. This is for other applications, not HTTP/Web based,
more like messaging/authentication servers and alike.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 14:41

Message:
Logged In: YES 
user_id=184124

Can you provide what you have to modify in driver/nssock, i
think that will not be easy and will require too many
modifications.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 10:08

Message:
Logged In: YES 
user_id=87254


What do you think of my suggestion that the nssock module
and nsd/driver.c could be easily modified to handle udp and
unix domain sockets when specified in the config file?

A change like that combined with the work you've already
done to the binder would mean no development is required to
handle all socket types, and would be transparent to the
upper layers.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-26 17:10

Message:
Logged In: YES 
user_id=184124

Stephan,
I am suggesting a compromise, let's put ParseProc and
DriverSockQueu patches, they can coexists easily but will
offer two way to add new drivers. Once we have new drivers
working we will see what can be modified/adjusted, for now
we do not have anything except my several drivers i use in
the production and i use my own loop for smtp driver to keep
processing and state machine in one place.
Again, i have nothing against your method, i used to do
similar in my previous versions but now i need to be
completely free to implement aditional driver and i think we
should have this ability.

Zoran, what do you think?

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 22:57

Message:
Logged In: YES 
user_id=184124

Try to implement HTTP over UDP as i provided in the patch,
you will need to modify core driver again, this is one example.

Next, if i want to implement small driver that do not
interact with web part of aolserver, no filters, it just
need to do one small thing, it should be fast but will
handle very many requests. I do not need filters, callbacks,
i need my own main loop. 
I do not understand why you are insisting that all drivers
should go through http driver's main loop. If i implement
dns server and web interface to it, why they should compete
in the same driver's main loop and go through registered
filters if they share only backend database structure?

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-02-25 22:22

Message:
Logged In: YES 
user_id=87254


You don't have to use registered filters or procs.  It's an
option for protocol implementations where the initial accept
and parsing take place in the driver thread, and processing
takes place in a conn thread.

Once you submit to the queue however, either a filter, a
registered proc, or a registered proxy proc must handle the
request.  It makes no difference whether you submit
explicitly with Ns_DriverSockQueue or implicitly after your
Ns_ParseProc completes.

How much of the request structure you fill in is up to you,
it's all optional.  If you choose not to use it there is no
overhead.  It doesn't have to be a text based protocol for
this to make sense.  You might fill in the request structure
if you wanted to enable people to handle different request
types via C or Tcl.  Or, you could store binary data in
Ns_Cls storage and access it from registered procs etc.

In the case where you want to handle connection
accept/parse/reponse processing in the same thread, nothing
needs to be added.  You can do that today with
Ns_SockCallback (or by placing all your code in an
Ns_ParseProc and returning the response from there)

Can you be more specific about the overhead of using an
Ns_ParseProc?  Exactly what memory is allocated, what code
is run that should not..?

What specifically can you not do with the Ns_ParseProc
interface that you need the Ns_Driver* routines for?  I'm
looking for concrete examples.

As far as I can tell, with Ns_ParseProc you write less code,
get more options, and it takes advantage of infrastructure
to give you more speed.  I must be missing something, but
you'll have to explain it to me in more detail because I
don't get it.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 20:43

Message:
Logged In: YES 
user_id=184124

Okay, the biggest problem with your way i see that i am
enforced to go through http driver, even with my parse proc,
i need to know how it works to implement my driver so
correct hooks/filters will be called. If i want skip some
parts i am not able to do it, http driver works only one way. 
Another issue is if something will not fit into current
driver, additional hook needs to be introduced and core
driver needs to be modified again.

If my driver fits http-like/text mode paradigm, reusing http
driver is the easiest way, but if i need something specific,
all extra step to mimic the connection as http request, so 
all other parts of the http driver will not fail is just
unnecesssary extra efforts.

And i you mentioned, using Ns_DriverSockQueue is low level
function and requires from the developer full attention for
bulding the driver, but if this is what i want, why not. I
can spawn my own thread if some callbacks are long running
Tcl scripts and queue connection from my own loop. If i omly
want to received packets, decode them and submit connection,
i do not need all http driver infrusturcture, why to enforce
using it?

So we can have at least 3 ways for supporting multiprotocols:
- standard callbacks, completely different threads, no
connection pooling (Exists now)

- Ns_DriverSockQueue, new thread or callback thread, reusing
connection pooling. All filters/traces are reused as well

- driver Parse proc in the http driver to reuse drivers
thread and connection pooling with all filters/traces

All 3 methiods can co-exists and do not interfere with each
other, that's my point, and all they add just couple extra
functions. I would implement all methods, i have 3 drivers i
need to run: smtp, dns, snmp, can add imap in the future

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-02-25 20:31

Message:
Logged In: YES 
user_id=87254


"Each way has its own drawbacks..."

That's what I'm asking.  What are the drawbacks for each
approach we've come up with so far?

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 15:23

Message:
Logged In: YES 
user_id=184124

I am not arguing with you about your ParseProc driver
extension, it is very usefull and makes driver more flexible.

I just want to add couple of new API functions that will allow
developer to queue connections from any place, that's it.
How drivers will be written and how developer will decide to
handle it is up to each particular developer. But nobody
will be locked up in only one way of doing it.

Each way has its own drawbacks so we have multiple choices
and developer will decide for itself how it should be.

Both our ways do not mutually exclusive.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-02-25 08:28

Message:
Logged In: YES 
user_id=87254


Ah I see you're right, an extra thread is not created for
each incoming request.  'udpThread' is perhaps not the best
name for the socket callback though... :-)

I still don't see the advantage of using Ns_SockCalback.  A
single thread is created by the server to handle all
registered callbacks, including those from ns_sockcallback.
 At runtime, while this single thread is running tcl code to
handle one callback, your dirver callback cannot run, no new
connections will be accepted, etc.  Why is it better to use
the Ns_SockCallback thread rather than a driver thread?

Maybe I'm reading this wrong, but how do you handle the case
where the request arrives as more than one packet? 
Ns_DriverSockRead() is called from the 'udpThread' socket
callback.  The only return value checked is NS_OK, but
couldn't this also be SOCK_MORE?

How would you handle things like keepalive?  I think you'd
have to reimplement that in the Ns_SockCallback thread.

Re the proxy stuff, Ns_RegisterRequest() and
Ns_RegisterProxyRequest() seem very simillar.  With
Ns_RegisterRequest(), filters are run and you get the choice
of using C or Tcl.  Ns_RegisterProxyRequest() offers no
advantage that I can see -- you still need a complete
Request structure, even if you just ignore it.


The comm driver initialization should probably be changed to
automatically handle unix domain sockets.  You add /foo
rather that 127.0.0.1 in the config file and it knows to
create the correct type of socket.  nssock and nsopenssl
wouldn't have to be modified at all.

UDP is different.  There are two types of protocols: single
packet, such as DNS or RADIUS; multi packet, as used in some
streaming media and p2p protocols.

I think for the single packet case we again might want to
modify the driver code to automatically handle it.  No
read-ahead is necessary, there's only one packet per
request, so it's placed in the request buffer and passed on
to the next stage, which is parsing.  Here, a return code of
SOCK_MORE would be illegal.

Every multi-packet UDP protocol will require custom
framing/sequencing and the developer will have to create a
new socket driver.

Taking RADIUS as an example, which is a single packet UDP
protocol, you'd create a very simple Ns_ParseProc who's only
job is to check that the number of bytes specified in the
header actually arrived, and return SOCK_READY.  A default
request structure is created for you so the only other thing
you have to do is Ns_RegisterRequest() for the '/' URL. 
Within your request proc, call Ns_ConnContent() and parse
the buffer.

Now, you do have a number of other options to make this more
flexible.  You could parse the request in your Ns_ParseProc
and then fill out the request structure.  e.g. the different
RADIUS message types could be expressed as HTTP verbs.  This
buys you flexibility.  Now you can Ns_RegisterRequest() a
different routine for each RADIUS message type, and someone
reusing your code can override your default implementation.
 You can also handle some message types in C, and others in Tcl.

You might decide to put some useful information about the
RADIUS request in the URL line.  Now you get logging for free.

You might decide to parse the key/value pairs from the
RADIUS request into query vars or headers.  Now you don't
have to write a bunch of RADIUS specific Tcl commands to
access the data, should you want to handle that from Tcl. 
ns_queryget or ns_conn headers  will work.

How far you want to go is up to you, the writer of the
protocol module.  But at the low end all you need is an
Ns_ParseProc and usually a single registered request proc,
which is a lot less code to write than the Ns_Driver* stuff,
I think.

I'm sure it's not perfect!  So what's wrong with it? Why
would it suck to write a RADIUS protocol module as I've just
described, to take one example?





----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 05:51

Message:
Logged In: YES 
user_id=184124

I think we need to collect all solutions and then see where
are going, i am holding to port all my old/running modules
because i do not know how naviserver will handle foreign
protocols.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 05:48

Message:
Logged In: YES 
user_id=184124

yes, using proxy can solve immediate requirements without
hacking NsConnProc by adding hooks to call driver specific C
functions.

If i need my smtp server and main loop is in C, i need
somehow call it
in the connection thread. Using registered proxy function i
can do it now, i do not need filter/traces. This is for
completely new modules.

I can implement main loop in the module as Tcl command and
then call it in the connection filter, it is possible, it
will just require many different parts to be in place and
still filter should be registered as Tcl proc which will
call another driver main loop.

sometimes low level stuff makes things easier and simpler:-)))

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 05:45

Message:
Logged In: YES 
user_id=184124

In all my installations i need sha module but it wasn't in
the distribution, so i need to download it or repackage
aolserver to include nssha1. That is my point, it is just
one simple function use more often than something like
ns_jpegsize or so.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-02-25 05:42

Message:
Logged In: YES 
user_id=87254


Well I don't know about the wisdom of adding sha1 to the
core at this stage  :-)

I see your point though, encryption and hashing functions
are almost universally required for systems/server work. 
Maybe we need to consider adding a new encryption module to
the core distribution.  Like nsdb it would export a C API
via libnscypher.so (or whatever) as well as the Tcl module
nscypher.so.

Times have changed and things like the openssl libraries are
common on all platforms so it's not the big deal it used to
be to add such dependiencies.

Such a module should make implementing SSL in nsopenssl easier.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-25 05:41

Message:
Logged In: YES 
user_id=184124

No, i do not spawn new thread, i use callback feature of the
server, when socket is ready, server calls provided callback
ad that callback just submits the socket to connection pool.
if pool is full, Ns_DriverSockQueue will return NS_TIMEOUT,
you can retry.
Yes, it is low-level, for high level, HTTP driver provides a
lot of functionality, it could be extended like you did, but
still it is HTTP driver hacked. If i want completely new
driver, like RADIUS server, http driver will not help me, i
need low level stuff, and it is there already.

To reuse resource limiting, i added Ns_DriverSockQueue
function, so
new conections can be queue instead of creating new threads.

I do not have anything against your patch for extending
current driver, it is very usefull. i am offering new API
for low level drivers.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-02-25 05:36

Message:
Logged In: YES 
user_id=87254


The proxy stuff is too late in the cycle to do much.  The
request has to be fully parsed by then (for read-ahead).

If a proxy function is available, then all filters, the auth
phase, registered procs, cleanup procs etc. are bypassed.  A
lot of that stuff can be very useful for non-HTTP protocols.

I don't think handling stuff via a proxy function buys you much.



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-02-25 05:32

Message:
Logged In: YES 
user_id=87254


I don't understand what you're trying to acheive here (well,
apart from multi-protocol... :-).  The newly exposed
Ns_Driver* entry points are quite low level, and so the
implementor of the new protocol is left to do a lot of the
heavy lifting.

For example, the way I read it you have to create your own
listen socket and register a callback.  Every time a new
reaquest comes in a thread is spawned to handle it.  From
that thread you then submit the parsed request to one of the
conn threads.

Excessive thread creation and message passing between
threads is not going to perform well.  And it seems you have
to write more code than e.g. the example POP3 driver I
posted some time ago.

You're also not taking advantage of the other facilities
that the server offers.  What happens if 1000 connections
arrive, do you spawn 1000 threads?  You could of course code
up some limit checks, but this already exists.  What if a
client sends you a continuous stream of data, 2GB...  etc.

By using the driver hooks to provide the new protocol
parser, you deny yourself the opportunity to use something
like the nsopenssl module.  This should work just fine for
protocols like SMTP, IMAP, POP3 and probably others.


Anyway, I think one of the most carefully coded aspects of
the server is it's attention to resource usage.  That goes
for IO, context switching, memory, etc.  It's espescialy
nice that most of the time you're not even aware that all
this work is being done for you.  I'd like new protocol
drivers to be able to transparently take advantage of that.

Could you take a look at my old POP3 demo driver?  It's the
attachment nspopd-0.3.tar.bz2 over here:

http://sourceforge.net/tracker/index.php?func=detail&aid=973010&group_id=3152&atid=353152

It's not obvious that anything interesting is going on, so
it's not much to look at.  But actually, conn socket
read-ahead is happening eficiently in one thread with async
IO, the conn threads are treated as a precious resource
(heavy-weight Tcl interps) and are allocated at the last
minute, there's an easy API in C and Tcl to implement the
actual reading of data from the INBOX (could be from the
file system, db etc.).

You've got a lot of experience writing servers, what do you
think is wrong with this model?  What can it not do, or what
could it do better?



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-02-24 21:29

Message:
Logged In: YES 
user_id=184124

Another thing, once we can submit connections from any
place, no need to build any drivers, even in C, i can
register new proxy proc and set protocol field in my
request, so when submitted, connection will run registered
proxy proc.
for example: 
 in my smtp driver/module, i create driver, register proxy
for smtp: protocol, register callback for the socket. Once
connection accepted, in my module i submit that connection
to the queue with request-protocl set to smtp:.
queue.c will call my proxy handler, which is C function.

No need to add anything else. This way even standard
aolserver can be extended without touching precious http
driver thread. Sorry for sarcasm.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1151137&group_id=130646

[ naviserver-Feature Requests-1165562 ] Scrap aolpress handling

From: SourceForge.net <no...@so...> - 2005-03-18 04:28:45

Feature Requests item #1165562, was opened at 2005-03-17 20:50
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Scrap aolpress handling

Initial Comment:
In Ns_ConnSetRequiredHeaders() there is a provision to add
NaviServer/2.0 to headers. I doubt that anybody will ever 
need this any more. I'd scrap this entirely.

----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-18 04:28

Message:
Logged In: YES 
user_id=184124

There are also places in server.c , nsd.h and nsdconf.h with
aolpress. Let's remove them

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-18 00:21

Message:
Logged In: YES 
user_id=87254


There's more aolpress junk in tcl/fastpath.tcl which can be
removed at the same time.

I think this closed-source client is officially dead...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646

[ naviserver-Feature Requests-1159471 ] Virtual Hosting

From: SourceForge.net <no...@so...> - 2005-03-18 04:26:55

Feature Requests item #1159471, was opened at 2005-03-09 00:40
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159471&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Vlad Seryakov (seryakov)
Assigned to: Vlad Seryakov (seryakov)
Summary: Virtual Hosting

Initial Comment:
Okay, i did some digging and testing, looks working and
much simpler. I tried to simplify default AS 4.x
virtual hosting and added pageroot virtual hosting,
simple way to use different pagetroots on the same server.

The change vor default virtual hosting is: no
defaultserver anymore, the server who registered
virtual hosts is default server, so nssock is loaded in
the default server, other than that virtual servers are
defined the same way.

One thing i left is to chamge ns_info pageroot to use
Ns_GetConn() and then if exists use connPtr->pageroot,
but this is simple change if you approve current
virtual hosting patch.

Here is the nsd.tcl config example:

ns_section      "ns/server/${server}/module/nssock/servers"
ns_param        test                    vlad.seryakov.com
ns_param        test                   
vlad.seryakov.com:80

ns_section     
"ns/server/${server}/module/nssock/pageroots"
ns_param        ${home}/html/test       vlad.seryakov.com
ns_param        ${home}/html/test      
vlad.seryakov.com:80



----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-18 04:26

Message:
Logged In: YES 
user_id=184124

On the other hand if it will be possible to configure/rename
"servers/" and "pages/" parts or omit them completely i
would be happy with this patch.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 14:48

Message:
Logged In: YES 
user_id=184124

The problem i see with this patch, it will break all my AS
installations, because it assumes that every servers is under
servers/ direcotory and pageroot is awlays named pages/.

I my on-server installs i do not have this complex directory
structure, i have html/ under ns_info home and pageroot is
set in the nsd.tcl.

Also, what is the point of ServerRoot dir, we have PageRoot
already, if it is set into absolute path other apps use
ns_info pageroot? As i understand, under ServerRoot there is
only pages difrectory? Why not to use pageroot directly?



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 14:05

Message:
Logged In: YES 
user_id=87254


Oh, forgot to mention.  It depends on the Tcl Callbacks patch.



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 14:03

Message:
Logged In: YES 
user_id=87254


I've added the patch naviserver-4.0.10-server-root-vhost.patch.

It adds the following new routines:

  Ns_ServerPath()
  Ns_PagePath()
  Ns_SetServerRootProc()
  Ns_ConnLocationAppend()
  Ns_SetConnLocationProc()

  NsServerRoot()
  NsPageRoot()

  ns_serverpath ?pathSegment ...?
  ns_pagepath ?pathSegment ...?
  ns_serverrootproc script ?arg?
  ns_locationproc script ?arg?

And the following new configuration options:

  ns_section "ns/server/${servername}/vhost"
  ns_param   enabled false
  ns_param   prefix ""
  ns_param   pagedir pages # overides fastpath/pageroot
  ns_param   stripwww true
  ns_parma   stripport true

This version of host header based virtual hosting which
depends on the existance of the pages directory is a
superset of the functionality provided by a static
configuration.

Applications which call Ns_ConnLocation() will need to be
modified to call Ns_ConnLocationAppend() or they will not be
vhost aware.

I've reinstated the depreciated proc
Ns_SetConnLocationProc() and changed it's signature.  It
dissapeared from ns.h almost 5 years ago, then reappreared
~2.5 years ago.  I don't think this will be a problem.  In
turn, I've depreciated the Ns_SetLocationProc().

It compiles and the server runs.  I haven't had time to test
it.  Does this look acceptable?



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-13 00:41

Message:
Logged In: YES 
user_id=184124

Just to consider the possibility, instead of mallocing
pageroot/location, by default it can use
sockPtr->pageroot/sockPtr->location, then when ns_conn
pageroot newpageroot called, it will set connPtr->pageroot
with malloced string and ns_conn pageroot will check and
return it instead of sockPtr->pageroot. This way, no
overhead at conn queue and still new pageroot/location can
be set in Tcl

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-13 00:02

Message:
Logged In: YES 
user_id=184124

That is the probem with nssock, it is actually an extender
of the driver but still somehow kept independent. nssock
itself is useless without driver and only used to bind to
more than one address for different servers. it could be
moved in the core but the problem will be how to define more
than once instance.
malloc are overhead indeed, but once copied they can be used
independently  and canbe set in Tcl by using ns_conn
location newLocation or
ns_conn pageroot newPageroot. In this case they should be a
copies.

Just do not make mass virtual hosting the only virtual
hosting way, being able to change pageroot in the Tcl/C give
developer more flexibility if required. For simple cases,
mass virtual hosting is okay.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-12 23:48

Message:
Logged In: YES 
user_id=87254


I'm not keen at all on adding virtual hosting to the nssock
driver.  There's nothing HTTP specific about the nssock
dirver and I'd like to keep it that way.

There are a couple of problems with the other proposed
solution.  The paired functions
Ns_SetLocationProc()/Ns_ConnSetLocationProc() etc. seems
excessive, and the enforced malloc()ing at runtime of the
location and pageroot strings is an unwelcome overhead.

Using Tls storage is clever but pretty ugly.  I think
dstrings are the way to go here.  I'm not sure Tls is safe
in this implementation.  The same dstring is used for
location and pageroot strings, so it depends what the caller
decides to do with the result and in what order, whether or
not one overwrites the other.

I would like to explore adding mas virtual hosting into the
core.  Let me work up a patch, I think I can get to this
this weekend...



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-12 21:57

Message:
Logged In: YES 
user_id=184124

Attached is nssock with virtual hosting patch

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-12 20:39

Message:
Logged In: YES 
user_id=184124

Actually, vhost can be combined with nssock, if options
given it will enable virtual hosting, if not works as
regular sock driver.
This way it is always with the core and at the same time
independent.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-10 21:38

Message:
Logged In: YES 
user_id=184124

If loaded, vhost module works as Stephan suggested, strips
port/host and usesd pageroot if other root is not specified.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-10 21:37

Message:
Logged In: YES 
user_id=184124

There is new patch with Stephans corrections/additions.
I think we can provide core module for virtual hosting, i
called it nsvhost and we can extend this module to do all
sorts of hosting.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-10 18:53

Message:
Logged In: YES 
user_id=184124

And keeping port and www. is sometimes necessary, you can do
virtual hosting by port only, IDT does that for example,
and many sites work without www. prefix, just stripping them
by default may not be appropriate.

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-10 18:24

Message:
Logged In: YES 
user_id=184124

>Looks good.  However, isn't ns_conn pageroot >unneccessary?

For templating engines ns_info pageroot is the only way to
figure out the root, UrlToFile works for fastpath only.

>Would it be better to have the servers and pageroot config
>in one section, to avoid duplication?

They are mutually exclusive, that's why i put them in
different sections, if full virtual server is set, pageroots
are ignored, this is for AOL like virtual hosting with
different servers(rare situation though). So in most cases
pageroots will be used only, thus only one simple syntax.

As for PageRoot and ServerRoot, i think this is a little
cofusing. Currently, pageroot returns full path and whoever
calls pageroot assumes that it will return full path. 

Virtual hosting using directories as hostnames is what i am
currently using with vhost module and i think it can be
included as a standard feature for easy virtualhosting
solutions. 
I do not think this should be the ONLY virtual hosting
solution, everybody can write their own modules using
SetLocation/SetPageRoot procs or register filter which will
set pageroot for each connection.

Let me prepare another patch-set with your suggestions
included .

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-10 08:00

Message:
Logged In: YES 
user_id=87254


Looks good.  However, isn't ns_conn pageroot unneccessary?

Would it be better to have the servers and pageroot config
in one section, to avoid duplication?

ns_section "ns/server/${server}/module/nssock/servers"
ns_param example.com        exampleserver
ns_param foo.example.com   "exampleserver ${home}/whatever"

Here, the first entry maps the example.com host to the
exampleserver and uses it's default pageroot.  The second
entry supplies a new pageroot.


How about mass virtual hosting, i.e. where you don't have to
explicitly configure each host header to pageroot mapping,
but construct the pageroot from the host heafer at runtime?

I've attached the file nsmassvhost.c which implements the
above.  It uses the hooks Ns_PageRootProc and
Ns_LocationProc which would be unneccessary if the
functionality was included as standard.

It trims the port and any leading 'www.' from the host
header.  It would be nice to have this for the static
mapping also, as at the moment to be robust you often need 4
mappings for each virtual host.

It also uses the function Ns_ServerPath().  The idea here is
to introduce the concept of the virtual server root as a
distinct location in the file system, where the pageroot is
a location below that.

I want to change this so that the serverroot is dynamic and
based on the host header (when configured), and the pageroot
is simply the serverroot with "/pages" (or whatever is
configured) appended.

It would look something like this:

/srv/server1/pages
/srv/server1/example.com/pages
/srv/server1/example.com/cache

The first path is the default or non-virtual hosted case. 
server1 is a server defind in the config file and has it's
own private tcl library.  The second path is the pageroot of
a virtual host.  The third is an example of some data which
is specific to the example.com virtual host.

So, without virtual hosts:
  Ns_ServerRoot() -> /srv/server1
  Ns_PageRoot() -> /srv/server1/pages

With virtual hosts (and called in the context of a conn thread):
  Ns_ServerRoot() -> /srv/server1/example.com
  Ns_PageRoot() -> /srv/server1/example.com/pages

The advantage of this system is that you don't have to
restart your server every time you add or remove a virtual
host.  There is also a convenient location to store data
associated with both virtual servers and virtual hosts. 
Easy to backup, remove, etc.

I haven't had time to look at how this would be integrated
into what you've got here, maybe at the weekend.  Feel free
to take a shot at though :-)

Does the scheme outlined above make sense to you?




----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159471&group_id=130646

[ naviserver-Feature Requests-1156899 ] Add optional "numbytes" arg to ns_conncptofp

From: SourceForge.net <no...@so...> - 2005-03-18 00:39:53

Feature Requests item #1156899, was opened at 2005-03-04 12:30
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646

Category: Tcl-API
Group: None
>Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add optional "numbytes" arg to ns_conncptofp

Initial Comment:
The current ns_conncptofp will copy the content to the 
file. The entire content. What is missing is to tell to the 
command to copy only part of the content. 
The proposed API would be: 
 
 ns_conncptofp channel ?numbytes? 
 
The numbytes can be: 
 ommited (procedure does what it did before) 
  >0 (will copy somany bytes up to contentLength or  
  end-of-data whatever comes first) 
 0 (will do nothing) 
 -1 (will behave as when argument is ommited)  
 
This command is one of those which support the 
old-compat connId argument. So the actual API is 
as of today: 
 
  ns_conncptofp ?connId? channel 
 
I would vote to scrap the connId parsing. 

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 17:39

Message:
Logged In: YES 
user_id=87254


This change introduces a new side-effect:  the contentLength
is adjusted down by the number of bytes coppied.  Does this
need to happen?  The C API doesn't do this.

I think reqPtr->avail needs to be checked rather then
connPtr->contentLength.  reqPtr->next and reqPtr->avail will
be managed by the C API so no need to adjust them here. 

Also, as the Tcl API may return less than you asked for
rather than error out like the C API, should it return the
number of bytes copied?



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 13:24

Message:
Logged In: YES 
user_id=95086

Added into CVS.

    ns_conncptofp ?-bytes tocopy? channel
 
The command will silently accept <=0 bytes and revert to
copying all that is there. Also, if more than available is
requested
it will silently copy all that is there. Otherwise, -bytes
bytes will 
be copied to the channel. Channel must be opened for writing
otherwise error is thrown.

Oh yes, Stephen, I love the arg parsing code This is really
a relief. After years any years of parsing... heh..


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 16:30

Message:
Logged In: YES 
user_id=87254


This seems reasonable.

Should it look like this though?
    ns_conncptofp ?-bytes tocopy? channel

The reason it is attractive to drop the connid arg is that
with two optional args it becomes a pain to parse.  I'm also
thinking of how ns_sendmail has been extended over time by
appending optional args.

The C version of this command returns an error if you ask
for more bytes than are available.  Should the Tcl version
silently truncate?  Maybe it should return the number of
bytes actually copied?

Does it make sense to ask for 0 (zero) bytes?  Maybe bytes
<= 0 should revert to default behaviour which is to copy all
that's available?

The existing error message "could not copy content (likely
client disconnect)" is wrong.  This might have been the case
in 3.x, but not 4.x.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646

[ naviserver-Feature Requests-1165562 ] Scrap aolpress handling

From: SourceForge.net <no...@so...> - 2005-03-18 00:21:35

Feature Requests item #1165562, was opened at 2005-03-17 13:50
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Scrap aolpress handling

Initial Comment:
In Ns_ConnSetRequiredHeaders() there is a provision to add
NaviServer/2.0 to headers. I doubt that anybody will ever 
need this any more. I'd scrap this entirely.

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 17:21

Message:
Logged In: YES 
user_id=87254


There's more aolpress junk in tcl/fastpath.tcl which can be
removed at the same time.

I think this closed-source client is officially dead...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646

[ naviserver-Feature Requests-1165562 ] Scrap aolpress handling

From: SourceForge.net <no...@so...> - 2005-03-17 20:50:35

Feature Requests item #1165562, was opened at 2005-03-17 21:50
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Scrap aolpress handling

Initial Comment:
In Ns_ConnSetRequiredHeaders() there is a provision to add
NaviServer/2.0 to headers. I doubt that anybody will ever 
need this any more. I'd scrap this entirely.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1165562&group_id=130646

[ naviserver-Feature Requests-1156899 ] Add optional "numbytes" arg to ns_conncptofp

From: SourceForge.net <no...@so...> - 2005-03-17 20:24:32

Feature Requests item #1156899, was opened at 2005-03-04 20:30
Message generated for change (Comment added) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646

Category: Tcl-API
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add optional "numbytes" arg to ns_conncptofp

Initial Comment:
The current ns_conncptofp will copy the content to the 
file. The entire content. What is missing is to tell to the 
command to copy only part of the content. 
The proposed API would be: 
 
 ns_conncptofp channel ?numbytes? 
 
The numbytes can be: 
 ommited (procedure does what it did before) 
  >0 (will copy somany bytes up to contentLength or  
  end-of-data whatever comes first) 
 0 (will do nothing) 
 -1 (will behave as when argument is ommited)  
 
This command is one of those which support the 
old-compat connId argument. So the actual API is 
as of today: 
 
  ns_conncptofp ?connId? channel 
 
I would vote to scrap the connId parsing. 

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 21:24

Message:
Logged In: YES 
user_id=95086

Added into CVS.

    ns_conncptofp ?-bytes tocopy? channel
 
The command will silently accept <=0 bytes and revert to
copying all that is there. Also, if more than available is
requested
it will silently copy all that is there. Otherwise, -bytes
bytes will 
be copied to the channel. Channel must be opened for writing
otherwise error is thrown.

Oh yes, Stephen, I love the arg parsing code This is really
a relief. After years any years of parsing... heh..


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 00:30

Message:
Logged In: YES 
user_id=87254


This seems reasonable.

Should it look like this though?
    ns_conncptofp ?-bytes tocopy? channel

The reason it is attractive to drop the connid arg is that
with two optional args it becomes a pain to parse.  I'm also
thinking of how ns_sendmail has been extended over time by
appending optional args.

The C version of this command returns an error if you ask
for more bytes than are available.  Should the Tcl version
silently truncate?  Maybe it should return the number of
bytes actually copied?

Does it make sense to ask for 0 (zero) bytes?  Maybe bytes
<= 0 should revert to default behaviour which is to copy all
that's available?

The existing error message "could not copy content (likely
client disconnect)" is wrong.  This might have been the case
in 3.x, but not 4.x.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156899&group_id=130646

[ naviserver-Feature Requests-1156141 ] Add ns_conn channel command

From: SourceForge.net <no...@so...> - 2005-03-17 18:26:23

Feature Requests item #1156141, was opened at 2005-03-03 21:02
Message generated for change (Comment added) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

Category: None
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add ns_conn channel command

Initial Comment:
Extend the ns_conn command to allow wrapping an Tcl 
channel arround it. This channel can be used to talk to
the remote client afterwards.

The idea is simple. The client uses standard http protocol
to open up a connection to the server. Server accepts the
connection and obtains the channel from it. The client 
also gets the channel to the server and from this point
on, 
both can exchange arbitrary data.

Example implementation is given as a patch against
the current CVS state.

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 19:26

Message:
Logged In: YES 
user_id=95086

Added to CVS head

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-17 15:20

Message:
Logged In: YES 
user_id=184124

i do not

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 13:10

Message:
Logged In: YES 
user_id=95086

OK, If you do not mind, I'll add  
 
 ns_conn channel 
 
It will return the Tcl channel you can use for 
talking to the client. When you've finished, 
you just "close $sock" and finish the proc.  
The connection thread is then ready for other 
users/requests. 
 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 12:41

Message:
Logged In: YES 
user_id=87254


Yes, I get it  :-)

My only point was that HTTP is not a short lived connection
protocol, if you don't want it to be.  Lengthy connections
with multiple messages seems to be one of the key features here.

The *only* thing HTTP can't do is have the server initiate a
message to the client.  For the kinds of cases you're
talking about here with multi-hour long connections, I'd be
wanting some kind of 'ping' request from the client every so
often just to make sure the server was idle and not broken.
 The ping becomes a 'what messages do you have for me?'
request, the server's response is itself a request to the
client.

Anyway, I was talking pure nonsense about the keepalive
stuff; the client doen't have to be so broken after all. 
I'm fine with you adding this feature.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 10:03

Message:
Logged In: YES 
user_id=95086

   "still don't understand what it buys you that a modern HTTP 
   1.1 implemnetation doesn't.  If there was no other way to do 
 
Stephen, I do not think you understood my point. What this 
command buys me is to be able to initiate an http connection 
and reuse the socket-to-socket comm channel for other 
purposes. 
 
For example we make "GET" request to a certain URL and  
then when the connection gets established, the server, 
takes the conn socket,  wraps a Tcl channel around it and  
passes this channel to my (arbitraty) Tcl proc. 
The client, instead of getting the response body, parses  
only the returned http headers to see that the connection 
succeeded and then passes this socket to my custom 
Tcl proc. Now, on the sever and on the client side you have  
a channel and both sides can use it to establish whatever  
protocol they want. They have a socket pipe between. 
The http is just used to handshake the peers, clear? 
 
What we do with this? We have two server instances on two 
different hosts. We open a channel (using the above trick) 
from one to another and then use this to generate a  
stream of preformatted packets (our protocol) on one side 
which does the backup of data from to the other side 
ultimately to the tape drive(s) attached to that host.  
We shuffle TB of data this way daily in about 500  
installations worldwide, so I  know this works fine.  
 
Now, see, this has nothing to do with HTTP1.1. or 1.0 alone.  
And yes, if you look in the implementation, you will see that  
it is a simple wrapper to the Ns_ConnSock. It simply wraps  
a Tcl channel around the the connection socket, nothing  
more and nothing less. 
 
And, I would not say that it requires a broken client.  It just  
requires a special client-processing. This is not ment for 
browser-clients. It has nothing to do with serving pages 
or any other short lived connections. Once we setup such 
a connection, we use it for hours and maybe days. 
 
We could have achieved this with a simple socket connection 
from client to a specific port  on the server. But we are not 
able to use any arbitrary port. All we have is port 80 (the web 
server port). So we must play by the http rules initially when 
accessing the server. When we passed this http handshake 
stuff, we are left with a socket-to-socket comm-channel  
which is transparent and can be used for any purpose 
whatsoever. 
 
Now, is this little bit more clear now? 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-05 04:32

Message:
Logged In: YES 
user_id=184124

What is broken about it? It is the Tcl interface to
Ns_ConnSock(), just Tcl way, using channel. The merit is to
use sync inter-exchange between client and server using
established connection. Client is not broken, it is designed
for this exchange, it is the tool to build messaging, not
general purpose protocol. It make naviserver more flexible
platform for implementing different ppas on top of it using
Tcl, that's all.
Btw, when are you going to commit your protocol extensions?

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 01:20

Message:
Logged In: YES 
user_id=87254


This command isn't a good idea because it creates a broken
interface which requires a broken client to work, and I
still don't understand what it buys you that a modern HTTP
1.1 implemnetation doesn't.  If there was no other way to do
it, then it may still have some merit.  But Ns_ConnSock() is
trivial and gives you all you need to create a Tcl channel.

We've already decided to do this the right way, whatever
that turns out to be, so why should we simultaneously also
do it the wrong way?

In fact you seem to agree, noting that this sucks because
it's not UDP and the overhead of HTTP is just too much for you.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 16:36

Message:
Logged In: YES 
user_id=184124

And that's why i want to add simple UDP support to the
server: for implementing simple messaging between servers in
the cluster, using UDP without HTTP overhead will make it
simple and faster. Using Tcl channel is intermediate
solution for such messaging because it eliminates making
multiple HTTP requests to the server but still uses heavy
TCP/HTTP overhead. Still i think this command is very
usefull and will make server more versatile provided that we
will supply usage examples with the server as well.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 16:26

Message:
Logged In: YES 
user_id=95086

  "Second, this has nothing to do with non-HTTP protocol, it is 
  HTTP communication protocol, what's inside the body has 
  nothing to do with it again." 
 
This is true. It is a regular http request, just for the body 
part, the peers are engaging in a bidirectional communication 
and can do whatever they pleased.  The request is considered 
done when one of the peers close the socket. 
 
Strictly speaking, I'm using the http body and this trick 
to implement arbitrary client-server comm channel. This will 
not work for browsers of course. It has nothing to do with 
browsers at all in fact. It arose out of the necessity to  
support alternate protocols given what we had in AS, 
that is pure http machinery. 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 15:38

Message:
Logged In: YES 
user_id=184124

On the closing note, keepalive has nothing to do with this
and should not be considered in this case. Even if you
connect using HTTP 1.1 and specify keepalive, sever has the
right to issue Connection: close header which will disable
keepalive.
So, it WILL work for all connections.
Second, this has nothing to do with non-HTTP protocol, it is
HTTP communication protocol, what's inside the body has
nothing to do with it again.
And pipelining is a bad example here, the point it to send
the request and wait for the answer and base on that send
another request. Pipelining will allow you just to submit
all requests and then later receive replies, it was designed
for speeding browser's connection especially retrieving images.

It looks to me Stephen you are all for HTTP core and
around-the-HTTP-driver solutions only. This will make simple
solutions complex because of getting around/workaround HTTP
based core in the server.

What is wrong with one simpe function which will give the
server more options without breaking anything?

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 10:45

Message:
Logged In: YES 
user_id=95086

I see. The gotcha's are: 
 
  o. it's yet another implementation of non-HTTP protocol 
  o. (HTTP 1.0 only, and no Keepalive header) be used to get 
  o. function in the docs might reasonably expect it to work for 
     all HTTP connections 
 
That's pretty clear. Well, I think you are right. This is our  
attempt to work around http-only caps of the server.  
Agreed. It will be better to get it done properly and we will 
then internally switch to the new implementation when 
it's ready. At the moment I have all this in a separate module 
and this can stay so until the multi-protocol caps are agreed 
upon and implemented. 
 
I will close this RFE then. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 10:36

Message:
Logged In: YES 
user_id=87254


I don't understand why you're not using standard HTTP
pipelining.  It would seem to be so much easier than having
to write custom client and server code.

This is a small change, but I'm not excited about it because
it's yet another implementation of non-HTTP protocol
support, which makes 3 so far...  :-(

It also seems to require that a deliberately broken client
(HTTP 1.0 only, and no Keepalive header) be used to get
around the server's io read-ahead.  Someone seeing this
function in the docs might reasonably expect it to work for
all HTTP connections.

Also, I don't think this requires core support. 
Ns_ConnSock() will return the underlying file descriptor for
a connection.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 09:36

Message:
Logged In: YES 
user_id=95086

OK. Maybe I should explain little bit more what I'm using this 
for. 
Imagine you'd like to make a client-server connection and 
then exchange arbitrary data on this connection. Imagine 
something like telnet connection. You telnet from client to 
server, get a comm-pipe betwen them and then you can 
shuffle whatever you like across this comm-pipe.  
What we have is a http server responding only to http  
requests (as it is now, w/o multiprotocol support). So, what 
I do is to use http to setup the connection, have a registered 
procedure on the server grab the socket and install my custom 
protocol handler on it. The http is only used to establish the 
communication. All the rest is handled by my custom code 
on the server and on the client.  
Once the connection is setup, we have a point-to-point  
socket communication link between server and client and we 
can use it for just about anything. All you need for this is a  
client-side http-library (we have ours written in Tcl) which  
will give you the socket back once the connection is  
established) and serve-side proc implementing your protocol 
of choice. So, the entire server http-machinery stays in place. 
I do not see any problem there because we have been using 
this trick for about 4 years in production very successfully. 
What I do not know is wether there are any gotcha's with that 
approach (we haven't hit any yet). 
I also do not know what will happen when we add alternate 
multiprotocol caps to server. Maybe this will become obsolete. 
But, as I see this now, the change is trivial and does not hurt 
anybody. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 00:21

Message:
Logged In: YES 
user_id=87254


HTTP also has (although our server doesn't, but I want to
add this) pipelining, which is sending multiple requests at
once without first waiting for the reply to previous requests.

That sounds like your "i do it when I send multiple http
requests".



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 23:45

Message:
Logged In: YES 
user_id=184124

Keepalive works when i close my connection, i.e. when i
exited from conn thread, but the point here is to do it from
the conn thread, i received request from my other server
sending me some info, i send my info back and get ack.
Everything inside one connection, before keepalive.
And yes, i send messages in my format, anyway i do it when i
send multiple http reaqests, i am interested in message body
only.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 23:32

Message:
Logged In: YES 
user_id=87254


HTTP already handles sending multiple requests accross one
socket with keepalives.  It might be more useful to add
keepalives to our HTTP client code.

In the case of HTTP keepalives, the socket is returned to
the driver thread after the first request, still open, and
the conn thread goes on to do other useful work.  If a new
request arrives before the connection times out it is passed
to the next free conn thread.

If you don't usee HTTP with keepalives you will have to
invent some kind of message chunking syntax and some kind of
time-out protocol.  But you will still end up wasting conn
threads as they idle waiting for possible extra messages.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 23:20

Message:
Logged In: YES 
user_id=184124

I could use it in my apps where i have cluster of several
servers exchanging http requests between each other,
sometimes one session needs 2 or more exchanges.
With this channel i can exchange using same connection and 
i need to send/receive data, not headers, so this can save
and improve my operations.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 23:12

Message:
Logged In: YES 
user_id=87254


Usually, by the time your script runs and you call [ns_conn
channel], the data from the client will already have been
read and buffered.  You can access this with [ns_conn content].

Does reading work?

What's the objective?  Do you want to hadle streaming data,
or make reading and writing more Tcl-like?


----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 23:07

Message:
Logged In: YES 
user_id=184124

Very usefull, commit it

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

[ naviserver-Feature Requests-1156875 ] Watchdog process restarts failed server

From: SourceForge.net <no...@so...> - 2005-03-17 17:51:54

Feature Requests item #1156875, was opened at 2005-03-04 20:03
Message generated for change (Settings changed) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Watchdog process restarts failed server

Initial Comment:
We have been using this for quite some time and it 
proved extremely useful. 
We doublefork the nsd process and make the first 
forked instance control the second. The first one 
(the watchdog) reacts on exit codes and signals 
caught during the watch and correspondingly restarts 
the second instance (the worker). 
 
Also, we have added the the "-restart" option to 
the "ns_shutdown" command. This just sends the 
SIGINT to the worker process. The watchdog is 
handling this signal and respawns the worker 
automatically. 
 
During operation, the watchdog logs events and their 
cause  into the system log file. This looks like: 
Feb 28 04:00:05 Develop nsd[19400]: worker: started. 
Mar  1 04:00:13 Develop nsd[4475]: watchdog: worker 
19400 exited (2). 
Mar  1 04:00:15 Develop nsd[21290]: worker: started. 
Mar  1 04:00:18 Develop nsd[14705]: watchdog: worker 
19399 exited (2). 
Mar  1 04:00:20 Develop nsd[21300]: worker: started. 
 
We have done all the changes with "--enable-watchdog" 
so anybody who needs this feature will have to compile 
with this option.  
 

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 18:51

Message:
Logged In: YES 
user_id=95086

Added to cvs head. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 12:46

Message:
Logged In: YES 
user_id=87254


Oh, I thought you were going to commit this.  Sure, revert
the exit() stuff and commit away, or I'll get to it at the
wekend.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 12:23

Message:
Logged In: YES 
user_id=95086

Now, what? You go change/remove _exit and commit? 
Other things are ok for me. 
Or should I commit? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 12:02

Message:
Logged In: YES 
user_id=87254


Calling _exit()  was an attempt to clarify the intent of the
code.  But I didn't pay attention to the fact that the main
function is pluggable, so I guess that's a no-go.

4 hours of retry time before giving up sounds almost like
'inifinite'...  Looks like we can get something suitable
without adding extra switches, so we can commit this and
tweak the timing later.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-15 14:54

Message:
Logged In: YES 
user_id=95086

Stephen, 
why do you call _exit(0) instead of returning 
control to caller routine?  
 
Also.. 
  I don't want to go overboard with command line switches, but 
  an option to specify not to give up trying to restart the 
  server would be nice.  If you have that, you also need to 
  turn off the restart timeout doubling at a certain point.  I 
  don't know what the cleanest way to do that is. 
 
Restart time doubling is limited to 64 seconds (insane, should 
rather be 60 seconds, heh) so it should not double after 
reaching 60 secs. After that time we constantly wait 60 secs 
between restarts. 
With the server retry of 16 times and we will wait 1, 2, 4, 8,  
16, 32,  64 (i.e. total of 127) seconds and after that 8 times 64  
secs (= 512 total ) secs and eventually abort.  
All together, that will be 640 secs = about 10 minutes 
wait/retry. After that it will exit. I think if we extend retry count 
from 16 to 256 it will be pretty enough (about 4 hours) where 
one can fix the failure cause. Otherwise, I see syntactic 
problems of how to actually specify the retries parameter. 
Of curse, you can do something like 
  bin/nsd -w -r 16 
 
but the "-r"  (restart) would really have no meaning w/o -w 
and I do not like this very much indeed. I'm really for bumping 
the retry count higher (127 or 256 tries). 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-15 01:50

Message:
Logged In: YES 
user_id=184124

Last time addition:
Can we make syslog function public, Ns_Syslog?
Also Tcl interface to it would be usefull as well, below is
syslog from my nssys module, just a suggestion, no rush with
this:

static struct syslog_data {
   int opened;
   int facility;
   int options;
   char ident[32];
   Tcl_HashTable *priorities;
   Tcl_HashTable *facilities;
} syslog_data;

/* Syslog initialization */
void SyslogInit()
{
   memset(&syslog_data,0,sizeof(syslog_data));
   syslog_data.options = LOG_PID;
   syslog_data.facility = LOG_USER;
   if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY)))
    
strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1);
   syslog_data.facilities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS);
   AddEntry(syslog_data.facilities,"auth",LOG_AUTH);
   AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV);
   AddEntry(syslog_data.facilities,"cron",LOG_CRON);
   AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON);
   AddEntry(syslog_data.facilities,"kernel",LOG_KERN);
   AddEntry(syslog_data.facilities,"lpr",LOG_LPR);
   AddEntry(syslog_data.facilities,"mail",LOG_MAIL);
   AddEntry(syslog_data.facilities,"news",LOG_NEWS);
   AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG);
   AddEntry(syslog_data.facilities,"user",LOG_USER);
   AddEntry(syslog_data.facilities,"uucp",LOG_UUCP);
   AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0);
   AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1);
   AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2);
   AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3);
   AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4);
   AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5);
   AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6);
   AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7);
   syslog_data.priorities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS);
   AddEntry(syslog_data.priorities,"emerg",LOG_EMERG);
   AddEntry(syslog_data.priorities,"alert",LOG_ALERT);
   AddEntry(syslog_data.priorities,"crit",LOG_CRIT);
   AddEntry(syslog_data.priorities,"err",LOG_ERR);
   AddEntry(syslog_data.priorities,"error",LOG_ERR);
   AddEntry(syslog_data.priorities,"warning",LOG_WARNING);
   AddEntry(syslog_data.priorities,"notice",LOG_NOTICE);
   AddEntry(syslog_data.priorities,"info",LOG_INFO);
   AddEntry(syslog_data.priorities,"debug",LOG_DEBUG);
}

/*
 * sys_log
 *
 * Usage:
 *   sys_log ?-facility f -options o -ident i? priority message
 *
 *  facility    - kernel, cron, authpriv, mail, local0,
local1, daemon, local2,
 *                news, local3, local4, local5, local6,
syslog, local7, auth, uucp, lpr, user
 *  options     - list with any of { CONS NDELAY PERROR PID
ODELAY NOWAIT }
 *  ident       - ident is prepended to every message, and
is typically the program name
 *  priority    - info, alert, emerg, err, notice, warning,
error, crit, debug
 *
 * Returns:
 *   nothing
 */
static int SysLog(ClientData data,Tcl_Interp *interp,int
argc,char **argv)
{
    int i,priority = 0;
    Tcl_HashEntry *entry;
    
    if(argc < 1) {
      Tcl_AppendResult(interp,argv[0]," ?-facility f
-options o -ident i? priority message",0);
      return TCL_ERROR;
    }
    for(i = 1;i < argc-1;) {
      if(!strcmp(argv[i],"-facility")) {
        entry =
Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]);
        if(!entry) {
          Tcl_AppendResult(interp,"wrong facility
",argv[i+1],": available facilities: ",NULL);
          ListHash(interp,syslog_data.facilities);
          return TCL_ERROR;
        }
        syslog_data.facility = (int)Tcl_GetHashValue(entry);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-options")) {
        syslog_data.options = 0;
        if(strstr(argv[i+1],"CONS")) syslog_data.options |=
LOG_CONS;
        if(strstr(argv[i+1],"NDELAY")) syslog_data.options
|= LOG_NDELAY;
        if(strstr(argv[i+1],"PERROR")) syslog_data.options
|= LOG_PERROR;
        if(strstr(argv[i+1],"PID")) syslog_data.options |=
LOG_PID;
        if(strstr(argv[i+1],"ODELAY")) syslog_data.options
|= LOG_ODELAY;
        if(strstr(argv[i+1],"NOWAIT")) syslog_data.options
|= LOG_NOWAIT;
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-ident")) {
        memset(syslog_data.ident,0,sizeof(syslog_data.ident));
       
strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      break;
    }
    if(i < argc) {
      entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]);
      if(!entry) {
        Tcl_AppendResult(interp,"wrong level ",argv[i],":
available levels: ",NULL);
        ListHash(interp,syslog_data.priorities);
        return TCL_ERROR;
      }
      priority = (int)Tcl_GetHashValue(entry);
      i++;
    }
    if(i < argc) {
      if(!syslog_data.opened) {
       
openlog(syslog_data.ident,syslog_data.options,syslog_data.facility);
        syslog_data.opened = 1;
      }
      syslog(priority,argv[i]);
      return TCL_OK;
    }
    return TCL_OK;
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-15 00:44

Message:
Logged In: YES 
user_id=87254


Oops, looks like I forgot to attach the patch...  Here it is.

Yes, the server will attempt to restart a number of times
before giving up.  I think this is as appropriate for config
errors as it is for others, then there doesn't need to be a
dual-standard fatal error.


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 19:12

Message:
Logged In: YES 
user_id=95086

I'd rather wait for Stephens code and see what's done there 
and then make final changes and commit. It should not 
take long. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 19:07

Message:
Logged In: YES 
user_id=184124

Then i do not see any reason not committing it, we can have
discussions about it for another couple of years, but until
we have it and use it, it is just discussions. We aready
agreed on having it in the core.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 19:00

Message:
Logged In: YES 
user_id=95086

Stephen, 
You mind uploading your patch changes so I can peek into it? 
I think we should soon settle on some solution and move on. 
 
Vlad, 
The patch already does that. After 16 unsuccessfull restarts 
the watchdog exits. Between every restart we wait 1, 2, 4, 8,  
16, 32, .... 16384 seconds and then exit. This would be: 32767 
seconds or about 9 hours.  
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 15:53

Message:
Logged In: YES 
user_id=184124

I would say, repeat configured number of time and then exit.
This way all socket related problems will be cleared after
couple of restartes, config problems will make server exit .

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 15:41

Message:
Logged In: YES 
user_id=95086

Well, because I could not differentiate between config 
and later runtime errors, I thought its wiser to abort 
rather to repeatedly restart broken server. 
One possibility is to add different Ns_Fatal call like: 
 
  Ns_FatalEx(int exitcode, char *fmt, ...); 
 
Or? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 15:12

Message:
Logged In: YES 
user_id=87254


Are you sure Ns_Fatal should not restart the server?  Many
of the fatal errors are caused by bad configuration and I
can see why you might want the server to exit immediately. 
But there are also runtime fatal errors, and here I'd expect
the server to be restarted.

Some config errors are due to external factors like missing
directories or file system permissions and it would be nice
for the server to come back up as soon as the issue resolved
itself.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 10:52

Message:
Logged In: YES 
user_id=95086

Calling watchdog before or after chroot has no real 
implications I believe. 
Also, when the server exits with Ns_Fatal, it hsould not be 
restarted, I will look into your changes today. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 07:27

Message:
Logged In: YES 
user_id=87254


Oh yeah, and I moved the watchdog stuff eaven earlier,
before the prebind and chroot.  Hmm, now that I think about
it it's only the prebind that really needs to happen after
the watchdog is started, to ensure the sockets are always in
a sane state...



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 07:17

Message:
Logged In: YES 
user_id=87254


Am I reading this right?  It looks like if the server calls
Ns_Fatal() it will exit with code 1, but  0 and 1 are
treated as OK exit codes and the server will not be restarted.

I've attached a patch which changes the above, fixes the pid
problem in a different way because I completely forgot you
already posted below about that small glitch, use
Ns_ParseObjv(), adds the -w switch, and a couple of small
name changes.

I don't want to go overboard with command line switches, but
an option to specify not to give up trying to restart the
server would be nice.  If you have that, you also need to
turn off the restart timeout doubling at a certain point.  I
don't know what the cleanest way to do that is.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 21:09

Message:
Logged In: YES 
user_id=95086

This is correct. As soon as we agree on some 
implementation, I will put the -i processing 
and avoid starting watchdog for inittab starts. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 20:48

Message:
Logged In: YES 
user_id=184124


There still should be possibility to run nsd from inittab,
so when -i switch is given, no watchdog should be running,
let /sbin/init to handle restarts

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 17:19

Message:
Logged In: YES 
user_id=95086

small glich: 
After applying the patch, change nsmain.c 
from: 
    nsconf.pid = serverPid: 
to 
    nsconf.pid = getpid(); 
 
otherwise the pid file will contain the bogus 
server pid if the watchdog restarted it later. 
   

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 16:26

Message:
Logged In: YES 
user_id=95086

Another try... 
Important to note: SIGKILL (signal 9) cannot be handled 
hence if somebody kills the watchdog with SIGKILL, the 
server will be left lingering w/o the watchdog. This is  
important to know. I do not see any possibility how to 
recover in such cases (i.e. how to stop the server). 
 
Apart from that, all objections from Stephen are taken 
into account. Please try again. A new copy of watchdog.patch 
file is attached. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 06:20

Message:
Logged In: YES 
user_id=184124

I just found this code in my old server sources, i just
chnaged internal name to Ns/NS, used to run pretty stable.

-------------------------------------------------------------------

#define NS_EXIT 99

static void NsTerminate(int sig)
{
    printf("NS[%d] signal %d received...",getpid(),sig);
    // Kill the server with the same signal
    if(nsPid > 0) kill(nsPid,sig);

    // Exit in case of fatal signal
    if(sig != SIGHUP) {
      printf("NS[%d] terminating...",getpid());
      exit(0);
    }
    // Reassign signal handler
    signal(sig,NsTerminate);
}

void NsWatchdog(int argc, char *argv[], char *envp[])
{
    int failcount = 0;
    time_t start;
    int status;
    pid_t pid;

    signal(SIGTTOU,SIG_IGN);
    signal(SIGTTIN,SIG_IGN);
    signal(SIGTSTP,SIG_IGN);
    signal(SIGPIPE,SIG_IGN);
    signal(SIGQUIT,SIG_IGN);
    signal(SIGHUP,NsTerminate);
    signal(SIGINT,NsTerminate);
    signal(SIGTERM,NsTerminate);

    // Go background
    if((pid = fork())) {
      if(pid < 0) err_logger("warpConfigure: fork:
%s",strerror(errno));
       exit(0);
    }
    setsid();
       
    for(;;) {
               
        // Execute the real server
        nsPid = fork();
        // Child, continue as server
        if(nsPid == 0) {
          exit(nsMain(argc, argv, ServerInit));
        }
        
        /* parent, behaves like a guardian */
        time(&start);
        printf("NS[%d] server process started",getpid());
        pid = waitpid(-1, &status, 0);
        if(WIFEXITED(status))
          printf("NS[%d] child process exited with status
%d",pid,WEXITSTATUS(status));
        else
        if(WIFSIGNALED(status))
          printf("NS[%d] child process exited due to signal
%d",pid,WTERMSIG(status));
        else
          printf("NS[%d] child process exited", pid);

        // Special exit code
        if(WIFEXITED(status)) {
          if(WEXITSTATUS(status) == NS_EXIT) {
            printf("NS[%d] child configuration error,
exiting",getpid());
            exit(0);
          } else
          if(WEXITSTATUS(status) == SIGHUP) {
          } else {
            if(time(0) - start < 10) failcount++;
            if(failcount == 10) {
              printf("Exiting due to repeated, frequent
failures");
              exit(1);
            }
          }
        }
        sleep(3);
    }
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-08 05:23

Message:
Logged In: YES 
user_id=87254


NsWatchdog() is called after the server drops root privs, so
both the watchdog and the server run as the defined user. 
What happens if the server dies and on restart needs to
rebind privileged ports?

A ps listing shows two running processes, parent and child.
  If I kill either one, the watchdog dies, the server
continues to run.
  If I kill -9 the parent, the child continues to run.
  If I kill -9 the child, the server is restarted.
Something seems not quite right here...

I'm a bit confused about how the code works.  For example,
NsWatchdog() seems to ignore all of it's arguments?  Here's
the code which calls it:

if (mode == 0) {
        i = ns_fork();
        if (i < 0) {
            Ns_Fatal("nsmain: fork() failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        setsid();
        i = NsWatchdog(argc, argv, initProc);
        if (i < 0) {
            Ns_Fatal("nsmain: watchdog failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        nsconf.pid = getpid();
    }

NsWatchdog() says it returns the worker pid, it also sets
the global variable wpid.  The code above ignores the
returned value, and the global variable, and instead calls
getpid()...  The variable 'i' is existing code, but still
somehow doesn't seem suitable...

Could that Ns_Fatal() above be moved into the NsWatchdog()
function?  I think maybe some comment is needed here.  The
code structure is very like that just above where ns_fork()
is called, but this function will return *multiple* times,
right?  This is kind of suprising, and an extra twist on the
already confusing fork() semantics (or maybe it's just me
who gets confused by fork...).  A return value of 0 here is
a 'request for orderly shutdown', right?

How about some more logging to syslog?  For example,
distinguish between start and restart.  Mention when the
MAX_SLEEP_PERIOD has been reached, etc.


Couple of small things:

Can we refer to 'the server', rather than 'the worker'? 
Worker and Watchdog begin with 'w', and so does the global
variable wpid...  Maybe serverPid?

NsWatchdog is a static function, it doesn't need the Ns prefix.

In NsWatchdog(), the variable 'run' should be something like
'nretries', 'nap' should be something more like
'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS.

The comment for WaitForWorker() is misslabeled.

Should SigHandler() be called WatchdogSigHandler()?






----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-07 00:03

Message:
Logged In: YES 
user_id=184124

Looks good to me

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 13:06

Message:
Logged In: YES 
user_id=95086

Ah, correction: 
The restart option sends SIGINT to the worker process 
which causes the watchdog to restart it.  
And, the patchfile is now attached! 

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 13:03

Message:
Logged In: YES 
user_id=95086

Here is the patch. 
I have added "-restart" option to "ns_shutdown". It is rather
clumsy to parse but should do. We should rewrite this with
your args parsing routine. 
The restart option sends SIGTERM to the worker process
which causes the watchdog to restart it. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 00:46

Message:
Logged In: YES 
user_id=87254


I don't think this has to hide behind a config option.  It's
either a good idea or it's not.  Sounds good to me.  Is
there a patch?  I'm wondering about some of the
implementation details...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

[ naviserver-Feature Requests-1161597 ] Extend ns_info with info about traces/filters/procs

From: SourceForge.net <no...@so...> - 2005-03-17 14:22:31

Feature Requests item #1161597, was opened at 2005-03-11 19:30
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Vlad Seryakov (seryakov)
Assigned to: Vlad Seryakov (seryakov)
Summary: Extend ns_info with info about traces/filters/procs

Initial Comment:
Attached is patch that extends ns_info with 3 commands:

ns_info traces
ns_info filters
ns_info requestprocs

They work the same way as ns_info callbacks, no
functionality changed or added, just information commands

----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-17 14:22

Message:
Logged In: YES 
user_id=184124

Yes, this is what i did with UrlSpecific walk function, it
will call Ns_ArpProc for every node which will append
what-ever info to DString. We can live with that for the
beginning, anyway, that info will be used in Tcl and should
be converted into strings. 
So, i guess i can add info proc for UrlSpecific trie?

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 11:06

Message:
Logged In: YES 
user_id=87254


To get info on the binary callbacks we're just going to have
to register proc info for all the default filters, request
procs etc.  If we cover everything there should be a
reasonable number of examples for module writers to follow.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-13 01:47

Message:
Logged In: YES 
user_id=184124

Also, somein docs would be usefull to put info how to get
info about some binary callbacks:

Welcome to ossweb running at /usr/local/ns/bin/nsd (pid 5524)
ossweb:nscp 1> ns_info traces
p:0xb7187be0 a:0x80817d8

# gdb
(gdb) attach 12345

(gdb) info symbol 0xb7187be0
LogTrace in section .text





----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-13 01:44

Message:
Logged In: YES 
user_id=184124

I started with minimal changes, #defines and &syntax can be
chnages of course. now that you started TclCallbacks, this
thing depends when you commit your changes, it requires
Ns_ArgProc and ns_info stuff.
Once we have unified API for callbask i will change it to be
server-specific as well, it took a while to figure out how
it works :-))

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-13 01:37

Message:
Logged In: YES 
user_id=87254


This is a nice addition.  

ns_info filters/traces is server specific, but ns_info
requestprocs returns info for all servers.  Can you make
this per-server also, to be consistent?

The URL walking procs look a little tricky, I tnink they
would benefit from having their own more detailed comments.

The comment currently says it calls a function for each
node, and the effect depends on the function.  But the
function signature is Ns_ArgProc so there's really only one
thing it can do.

Why does it fix the stack size at 512?  Can these magic
numbers be #define'd.

Using a while loop rather than an empty for loop would be a
little easier to read.

There's a lot of this kind of thing:
    (&(triePtr->branches))->n
Could this be simplified to:
    triePtr->branches.n ?


Otherwise, looks pretty good.  I like it.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646

[ naviserver-Feature Requests-1156141 ] Add ns_conn channel command

From: SourceForge.net <no...@so...> - 2005-03-17 14:20:28

Feature Requests item #1156141, was opened at 2005-03-03 20:02
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add ns_conn channel command

Initial Comment:
Extend the ns_conn command to allow wrapping an Tcl 
channel arround it. This channel can be used to talk to
the remote client afterwards.

The idea is simple. The client uses standard http protocol
to open up a connection to the server. Server accepts the
connection and obtains the channel from it. The client 
also gets the channel to the server and from this point
on, 
both can exchange arbitrary data.

Example implementation is given as a patch against
the current CVS state.

----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-17 14:20

Message:
Logged In: YES 
user_id=184124

i do not

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 12:10

Message:
Logged In: YES 
user_id=95086

OK, If you do not mind, I'll add  
 
 ns_conn channel 
 
It will return the Tcl channel you can use for 
talking to the client. When you've finished, 
you just "close $sock" and finish the proc.  
The connection thread is then ready for other 
users/requests. 
 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 11:41

Message:
Logged In: YES 
user_id=87254


Yes, I get it  :-)

My only point was that HTTP is not a short lived connection
protocol, if you don't want it to be.  Lengthy connections
with multiple messages seems to be one of the key features here.

The *only* thing HTTP can't do is have the server initiate a
message to the client.  For the kinds of cases you're
talking about here with multi-hour long connections, I'd be
wanting some kind of 'ping' request from the client every so
often just to make sure the server was idle and not broken.
 The ping becomes a 'what messages do you have for me?'
request, the server's response is itself a request to the
client.

Anyway, I was talking pure nonsense about the keepalive
stuff; the client doen't have to be so broken after all. 
I'm fine with you adding this feature.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 09:03

Message:
Logged In: YES 
user_id=95086

   "still don't understand what it buys you that a modern HTTP 
   1.1 implemnetation doesn't.  If there was no other way to do 
 
Stephen, I do not think you understood my point. What this 
command buys me is to be able to initiate an http connection 
and reuse the socket-to-socket comm channel for other 
purposes. 
 
For example we make "GET" request to a certain URL and  
then when the connection gets established, the server, 
takes the conn socket,  wraps a Tcl channel around it and  
passes this channel to my (arbitraty) Tcl proc. 
The client, instead of getting the response body, parses  
only the returned http headers to see that the connection 
succeeded and then passes this socket to my custom 
Tcl proc. Now, on the sever and on the client side you have  
a channel and both sides can use it to establish whatever  
protocol they want. They have a socket pipe between. 
The http is just used to handshake the peers, clear? 
 
What we do with this? We have two server instances on two 
different hosts. We open a channel (using the above trick) 
from one to another and then use this to generate a  
stream of preformatted packets (our protocol) on one side 
which does the backup of data from to the other side 
ultimately to the tape drive(s) attached to that host.  
We shuffle TB of data this way daily in about 500  
installations worldwide, so I  know this works fine.  
 
Now, see, this has nothing to do with HTTP1.1. or 1.0 alone.  
And yes, if you look in the implementation, you will see that  
it is a simple wrapper to the Ns_ConnSock. It simply wraps  
a Tcl channel around the the connection socket, nothing  
more and nothing less. 
 
And, I would not say that it requires a broken client.  It just  
requires a special client-processing. This is not ment for 
browser-clients. It has nothing to do with serving pages 
or any other short lived connections. Once we setup such 
a connection, we use it for hours and maybe days. 
 
We could have achieved this with a simple socket connection 
from client to a specific port  on the server. But we are not 
able to use any arbitrary port. All we have is port 80 (the web 
server port). So we must play by the http rules initially when 
accessing the server. When we passed this http handshake 
stuff, we are left with a socket-to-socket comm-channel  
which is transparent and can be used for any purpose 
whatsoever. 
 
Now, is this little bit more clear now? 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-05 03:32

Message:
Logged In: YES 
user_id=184124

What is broken about it? It is the Tcl interface to
Ns_ConnSock(), just Tcl way, using channel. The merit is to
use sync inter-exchange between client and server using
established connection. Client is not broken, it is designed
for this exchange, it is the tool to build messaging, not
general purpose protocol. It make naviserver more flexible
platform for implementing different ppas on top of it using
Tcl, that's all.
Btw, when are you going to commit your protocol extensions?

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 00:20

Message:
Logged In: YES 
user_id=87254


This command isn't a good idea because it creates a broken
interface which requires a broken client to work, and I
still don't understand what it buys you that a modern HTTP
1.1 implemnetation doesn't.  If there was no other way to do
it, then it may still have some merit.  But Ns_ConnSock() is
trivial and gives you all you need to create a Tcl channel.

We've already decided to do this the right way, whatever
that turns out to be, so why should we simultaneously also
do it the wrong way?

In fact you seem to agree, noting that this sucks because
it's not UDP and the overhead of HTTP is just too much for you.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 15:36

Message:
Logged In: YES 
user_id=184124

And that's why i want to add simple UDP support to the
server: for implementing simple messaging between servers in
the cluster, using UDP without HTTP overhead will make it
simple and faster. Using Tcl channel is intermediate
solution for such messaging because it eliminates making
multiple HTTP requests to the server but still uses heavy
TCP/HTTP overhead. Still i think this command is very
usefull and will make server more versatile provided that we
will supply usage examples with the server as well.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 15:26

Message:
Logged In: YES 
user_id=95086

  "Second, this has nothing to do with non-HTTP protocol, it is 
  HTTP communication protocol, what's inside the body has 
  nothing to do with it again." 
 
This is true. It is a regular http request, just for the body 
part, the peers are engaging in a bidirectional communication 
and can do whatever they pleased.  The request is considered 
done when one of the peers close the socket. 
 
Strictly speaking, I'm using the http body and this trick 
to implement arbitrary client-server comm channel. This will 
not work for browsers of course. It has nothing to do with 
browsers at all in fact. It arose out of the necessity to  
support alternate protocols given what we had in AS, 
that is pure http machinery. 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 14:38

Message:
Logged In: YES 
user_id=184124

On the closing note, keepalive has nothing to do with this
and should not be considered in this case. Even if you
connect using HTTP 1.1 and specify keepalive, sever has the
right to issue Connection: close header which will disable
keepalive.
So, it WILL work for all connections.
Second, this has nothing to do with non-HTTP protocol, it is
HTTP communication protocol, what's inside the body has
nothing to do with it again.
And pipelining is a bad example here, the point it to send
the request and wait for the answer and base on that send
another request. Pipelining will allow you just to submit
all requests and then later receive replies, it was designed
for speeding browser's connection especially retrieving images.

It looks to me Stephen you are all for HTTP core and
around-the-HTTP-driver solutions only. This will make simple
solutions complex because of getting around/workaround HTTP
based core in the server.

What is wrong with one simpe function which will give the
server more options without breaking anything?

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 09:45

Message:
Logged In: YES 
user_id=95086

I see. The gotcha's are: 
 
  o. it's yet another implementation of non-HTTP protocol 
  o. (HTTP 1.0 only, and no Keepalive header) be used to get 
  o. function in the docs might reasonably expect it to work for 
     all HTTP connections 
 
That's pretty clear. Well, I think you are right. This is our  
attempt to work around http-only caps of the server.  
Agreed. It will be better to get it done properly and we will 
then internally switch to the new implementation when 
it's ready. At the moment I have all this in a separate module 
and this can stay so until the multi-protocol caps are agreed 
upon and implemented. 
 
I will close this RFE then. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 09:36

Message:
Logged In: YES 
user_id=87254


I don't understand why you're not using standard HTTP
pipelining.  It would seem to be so much easier than having
to write custom client and server code.

This is a small change, but I'm not excited about it because
it's yet another implementation of non-HTTP protocol
support, which makes 3 so far...  :-(

It also seems to require that a deliberately broken client
(HTTP 1.0 only, and no Keepalive header) be used to get
around the server's io read-ahead.  Someone seeing this
function in the docs might reasonably expect it to work for
all HTTP connections.

Also, I don't think this requires core support. 
Ns_ConnSock() will return the underlying file descriptor for
a connection.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 08:36

Message:
Logged In: YES 
user_id=95086

OK. Maybe I should explain little bit more what I'm using this 
for. 
Imagine you'd like to make a client-server connection and 
then exchange arbitrary data on this connection. Imagine 
something like telnet connection. You telnet from client to 
server, get a comm-pipe betwen them and then you can 
shuffle whatever you like across this comm-pipe.  
What we have is a http server responding only to http  
requests (as it is now, w/o multiprotocol support). So, what 
I do is to use http to setup the connection, have a registered 
procedure on the server grab the socket and install my custom 
protocol handler on it. The http is only used to establish the 
communication. All the rest is handled by my custom code 
on the server and on the client.  
Once the connection is setup, we have a point-to-point  
socket communication link between server and client and we 
can use it for just about anything. All you need for this is a  
client-side http-library (we have ours written in Tcl) which  
will give you the socket back once the connection is  
established) and serve-side proc implementing your protocol 
of choice. So, the entire server http-machinery stays in place. 
I do not see any problem there because we have been using 
this trick for about 4 years in production very successfully. 
What I do not know is wether there are any gotcha's with that 
approach (we haven't hit any yet). 
I also do not know what will happen when we add alternate 
multiprotocol caps to server. Maybe this will become obsolete. 
But, as I see this now, the change is trivial and does not hurt 
anybody. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 23:21

Message:
Logged In: YES 
user_id=87254


HTTP also has (although our server doesn't, but I want to
add this) pipelining, which is sending multiple requests at
once without first waiting for the reply to previous requests.

That sounds like your "i do it when I send multiple http
requests".



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 22:45

Message:
Logged In: YES 
user_id=184124

Keepalive works when i close my connection, i.e. when i
exited from conn thread, but the point here is to do it from
the conn thread, i received request from my other server
sending me some info, i send my info back and get ack.
Everything inside one connection, before keepalive.
And yes, i send messages in my format, anyway i do it when i
send multiple http reaqests, i am interested in message body
only.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 22:32

Message:
Logged In: YES 
user_id=87254


HTTP already handles sending multiple requests accross one
socket with keepalives.  It might be more useful to add
keepalives to our HTTP client code.

In the case of HTTP keepalives, the socket is returned to
the driver thread after the first request, still open, and
the conn thread goes on to do other useful work.  If a new
request arrives before the connection times out it is passed
to the next free conn thread.

If you don't usee HTTP with keepalives you will have to
invent some kind of message chunking syntax and some kind of
time-out protocol.  But you will still end up wasting conn
threads as they idle waiting for possible extra messages.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 22:20

Message:
Logged In: YES 
user_id=184124

I could use it in my apps where i have cluster of several
servers exchanging http requests between each other,
sometimes one session needs 2 or more exchanges.
With this channel i can exchange using same connection and 
i need to send/receive data, not headers, so this can save
and improve my operations.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 22:12

Message:
Logged In: YES 
user_id=87254


Usually, by the time your script runs and you call [ns_conn
channel], the data from the client will already have been
read and buffered.  You can access this with [ns_conn content].

Does reading work?

What's the objective?  Do you want to hadle streaming data,
or make reading and writing more Tcl-like?


----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 22:07

Message:
Logged In: YES 
user_id=184124

Very usefull, commit it

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

[ naviserver-Feature Requests-1162223 ] Tcl Callbacks

From: SourceForge.net <no...@so...> - 2005-03-17 14:16:24

Feature Requests item #1162223, was opened at 2005-03-13 00:22
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Stephen Deasey (sdeasey)
Assigned to: Stephen Deasey (sdeasey)
Summary: Tcl Callbacks

Initial Comment:

The server offers a dual C/Tcl interface.  There are a
number of of places where Tcl code needs to be executed
by the C core, or the core doesn't know whether a
callback is C or Tcl.

I've attached the patch
naviserver-4.1-tclcallbacks.patch.  It extracts the Tcl
callback infrastructure from tclsched.c into
tclcallbacks.c and makes it a public interface.  Here
it is:

Ns_TclNewCallback()
Ns_TclNewCallbackObj()
Ns_TclEvalCallback()
Ns_TclCallbackProc()
Ns_TclFreeCallback()
Ns_TclCallbackArgProc()

It provides a standard interface for Tcl callbacks
throughout the server, handling interp allocation,
introspection, evaluation and cleanup.

Also, tclcallbacks.c is somewhat analogous to
callbacks.c, and so the generic ns_atexit etc.
callbacks which were in tclsched.c have been moved to
the tclcallbacks file.

The first set of code to be converted to the new
interface is tclsched.c itself, and I've converted to
using Ns_ParseObjv() while I was in there.  File size
has dropped ~250 lines.


One thing I'm wondering about is whether the
Ns_TclEvalCallback() routine should take extra args to
send to the Tcl proc.  There seems to be no standard
scheme for ordering the args so there still needs to be
a way to manually construct the command, which is why
the members of the Ns_TclCallback struct are public. 
But for this very reason, it might be worth providing a
standard way to do this so all future callbacks are
consistent.


It would be nice in the future to come up with some way
of registereing these callback points such that you
don't have to create an ns_info subcommand for example
manually.  It should be possible for loadable modules
to register their callback points and have them show up
in the right place.



----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-17 14:16

Message:
Logged In: YES 
user_id=184124

Yes, let's start slowly, we know the direction we are going,
let 's do it step by step starting with simple things. We
can have generic Tcl callbacks already and all new callbacks
will use it and we will see how flexible this mechanism is.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 11:28

Message:
Logged In: YES 
user_id=87254


Yes, I forgot to mention that Tcl callbacks are a little
more general than those for C in callbacks.c.  Those take a
single arg which is the context you registered with the
proc, and return void.   Tcl callbacks on the other hand can
return a result which at the C level gets appended to a
dstring if you pass that in.

The members of the Ns_TclCallback structure are public so
that you can construct a command and args in any format. 
This is the question I was asking originally: should there
be a default recomended way to do this?

So for example Ns_TclEvalCallback() might take varargs
cstrings, which if passed would be added to the tcl command
to execute:

  command ?cbarg1? ?cbarg2? ?context?

Where cbarg2 and cbarg2 are specific to the type of callback
and context is what was registered with the proc by the Tcl
programmer.

Unfortunately that wouldn't cover all the bases as some of
the existing APIs do weird things like place callback
specific args after the context or if no context was passed,
always append an empty string.  For new callbacks, something
standard like the above scheme should be used.


If you look at the patch for virtual hosting you'll see that
two new Tcl callbacks were created and they contain simillar
boiler plate.  This makes me think that maybe there should
be something like an [ns_addcallback serverroot myrootproc
mycontext] interface.

It's a little more complicated than that because some
callbacks are singular, like the serverroot proc, and some
are plural, like filters.  So maybe the thing to do is get
some experience with this simple version and then expand it
later...





----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-13 00:46

Message:
Logged In: YES 
user_id=184124

It looks good, unified callback interface would be better
but Tcl filters. traces, requestprocs uses different
arguments/parameters and currently the only way to get info
about them to use ns_info interface and Ns_ArgProc
interface. Tcl_Callback are for simple proc/args callbacks,
filters is special case, see if you can convert them.
But i am for this change.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-13 00:30

Message:
Logged In: YES 
user_id=87254


I've attached a patch which converts the tclrequest.c file
to use the new callback API as an example.  It also uses the
Ns_ParseObjv() API, and I've removed support for the legacy
conn variable.

It removes ~200 lines and all utility functions.  The
callback API provides default Ns_ArgProc introspection, but
this will be much nicer when combined with Vlad's RFE
1161597  :-)



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646

[ naviserver-Feature Requests-1156141 ] Add ns_conn channel command

From: SourceForge.net <no...@so...> - 2005-03-17 12:10:54

Feature Requests item #1156141, was opened at 2005-03-03 21:02
Message generated for change (Comment added) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

Category: None
Group: None
>Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add ns_conn channel command

Initial Comment:
Extend the ns_conn command to allow wrapping an Tcl 
channel arround it. This channel can be used to talk to
the remote client afterwards.

The idea is simple. The client uses standard http protocol
to open up a connection to the server. Server accepts the
connection and obtains the channel from it. The client 
also gets the channel to the server and from this point
on, 
both can exchange arbitrary data.

Example implementation is given as a patch against
the current CVS state.

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 13:10

Message:
Logged In: YES 
user_id=95086

OK, If you do not mind, I'll add  
 
 ns_conn channel 
 
It will return the Tcl channel you can use for 
talking to the client. When you've finished, 
you just "close $sock" and finish the proc.  
The connection thread is then ready for other 
users/requests. 
 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 12:41

Message:
Logged In: YES 
user_id=87254


Yes, I get it  :-)

My only point was that HTTP is not a short lived connection
protocol, if you don't want it to be.  Lengthy connections
with multiple messages seems to be one of the key features here.

The *only* thing HTTP can't do is have the server initiate a
message to the client.  For the kinds of cases you're
talking about here with multi-hour long connections, I'd be
wanting some kind of 'ping' request from the client every so
often just to make sure the server was idle and not broken.
 The ping becomes a 'what messages do you have for me?'
request, the server's response is itself a request to the
client.

Anyway, I was talking pure nonsense about the keepalive
stuff; the client doen't have to be so broken after all. 
I'm fine with you adding this feature.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 10:03

Message:
Logged In: YES 
user_id=95086

   "still don't understand what it buys you that a modern HTTP 
   1.1 implemnetation doesn't.  If there was no other way to do 
 
Stephen, I do not think you understood my point. What this 
command buys me is to be able to initiate an http connection 
and reuse the socket-to-socket comm channel for other 
purposes. 
 
For example we make "GET" request to a certain URL and  
then when the connection gets established, the server, 
takes the conn socket,  wraps a Tcl channel around it and  
passes this channel to my (arbitraty) Tcl proc. 
The client, instead of getting the response body, parses  
only the returned http headers to see that the connection 
succeeded and then passes this socket to my custom 
Tcl proc. Now, on the sever and on the client side you have  
a channel and both sides can use it to establish whatever  
protocol they want. They have a socket pipe between. 
The http is just used to handshake the peers, clear? 
 
What we do with this? We have two server instances on two 
different hosts. We open a channel (using the above trick) 
from one to another and then use this to generate a  
stream of preformatted packets (our protocol) on one side 
which does the backup of data from to the other side 
ultimately to the tape drive(s) attached to that host.  
We shuffle TB of data this way daily in about 500  
installations worldwide, so I  know this works fine.  
 
Now, see, this has nothing to do with HTTP1.1. or 1.0 alone.  
And yes, if you look in the implementation, you will see that  
it is a simple wrapper to the Ns_ConnSock. It simply wraps  
a Tcl channel around the the connection socket, nothing  
more and nothing less. 
 
And, I would not say that it requires a broken client.  It just  
requires a special client-processing. This is not ment for 
browser-clients. It has nothing to do with serving pages 
or any other short lived connections. Once we setup such 
a connection, we use it for hours and maybe days. 
 
We could have achieved this with a simple socket connection 
from client to a specific port  on the server. But we are not 
able to use any arbitrary port. All we have is port 80 (the web 
server port). So we must play by the http rules initially when 
accessing the server. When we passed this http handshake 
stuff, we are left with a socket-to-socket comm-channel  
which is transparent and can be used for any purpose 
whatsoever. 
 
Now, is this little bit more clear now? 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-05 04:32

Message:
Logged In: YES 
user_id=184124

What is broken about it? It is the Tcl interface to
Ns_ConnSock(), just Tcl way, using channel. The merit is to
use sync inter-exchange between client and server using
established connection. Client is not broken, it is designed
for this exchange, it is the tool to build messaging, not
general purpose protocol. It make naviserver more flexible
platform for implementing different ppas on top of it using
Tcl, that's all.
Btw, when are you going to commit your protocol extensions?

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 01:20

Message:
Logged In: YES 
user_id=87254


This command isn't a good idea because it creates a broken
interface which requires a broken client to work, and I
still don't understand what it buys you that a modern HTTP
1.1 implemnetation doesn't.  If there was no other way to do
it, then it may still have some merit.  But Ns_ConnSock() is
trivial and gives you all you need to create a Tcl channel.

We've already decided to do this the right way, whatever
that turns out to be, so why should we simultaneously also
do it the wrong way?

In fact you seem to agree, noting that this sucks because
it's not UDP and the overhead of HTTP is just too much for you.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 16:36

Message:
Logged In: YES 
user_id=184124

And that's why i want to add simple UDP support to the
server: for implementing simple messaging between servers in
the cluster, using UDP without HTTP overhead will make it
simple and faster. Using Tcl channel is intermediate
solution for such messaging because it eliminates making
multiple HTTP requests to the server but still uses heavy
TCP/HTTP overhead. Still i think this command is very
usefull and will make server more versatile provided that we
will supply usage examples with the server as well.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 16:26

Message:
Logged In: YES 
user_id=95086

  "Second, this has nothing to do with non-HTTP protocol, it is 
  HTTP communication protocol, what's inside the body has 
  nothing to do with it again." 
 
This is true. It is a regular http request, just for the body 
part, the peers are engaging in a bidirectional communication 
and can do whatever they pleased.  The request is considered 
done when one of the peers close the socket. 
 
Strictly speaking, I'm using the http body and this trick 
to implement arbitrary client-server comm channel. This will 
not work for browsers of course. It has nothing to do with 
browsers at all in fact. It arose out of the necessity to  
support alternate protocols given what we had in AS, 
that is pure http machinery. 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 15:38

Message:
Logged In: YES 
user_id=184124

On the closing note, keepalive has nothing to do with this
and should not be considered in this case. Even if you
connect using HTTP 1.1 and specify keepalive, sever has the
right to issue Connection: close header which will disable
keepalive.
So, it WILL work for all connections.
Second, this has nothing to do with non-HTTP protocol, it is
HTTP communication protocol, what's inside the body has
nothing to do with it again.
And pipelining is a bad example here, the point it to send
the request and wait for the answer and base on that send
another request. Pipelining will allow you just to submit
all requests and then later receive replies, it was designed
for speeding browser's connection especially retrieving images.

It looks to me Stephen you are all for HTTP core and
around-the-HTTP-driver solutions only. This will make simple
solutions complex because of getting around/workaround HTTP
based core in the server.

What is wrong with one simpe function which will give the
server more options without breaking anything?

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 10:45

Message:
Logged In: YES 
user_id=95086

I see. The gotcha's are: 
 
  o. it's yet another implementation of non-HTTP protocol 
  o. (HTTP 1.0 only, and no Keepalive header) be used to get 
  o. function in the docs might reasonably expect it to work for 
     all HTTP connections 
 
That's pretty clear. Well, I think you are right. This is our  
attempt to work around http-only caps of the server.  
Agreed. It will be better to get it done properly and we will 
then internally switch to the new implementation when 
it's ready. At the moment I have all this in a separate module 
and this can stay so until the multi-protocol caps are agreed 
upon and implemented. 
 
I will close this RFE then. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 10:36

Message:
Logged In: YES 
user_id=87254


I don't understand why you're not using standard HTTP
pipelining.  It would seem to be so much easier than having
to write custom client and server code.

This is a small change, but I'm not excited about it because
it's yet another implementation of non-HTTP protocol
support, which makes 3 so far...  :-(

It also seems to require that a deliberately broken client
(HTTP 1.0 only, and no Keepalive header) be used to get
around the server's io read-ahead.  Someone seeing this
function in the docs might reasonably expect it to work for
all HTTP connections.

Also, I don't think this requires core support. 
Ns_ConnSock() will return the underlying file descriptor for
a connection.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 09:36

Message:
Logged In: YES 
user_id=95086

OK. Maybe I should explain little bit more what I'm using this 
for. 
Imagine you'd like to make a client-server connection and 
then exchange arbitrary data on this connection. Imagine 
something like telnet connection. You telnet from client to 
server, get a comm-pipe betwen them and then you can 
shuffle whatever you like across this comm-pipe.  
What we have is a http server responding only to http  
requests (as it is now, w/o multiprotocol support). So, what 
I do is to use http to setup the connection, have a registered 
procedure on the server grab the socket and install my custom 
protocol handler on it. The http is only used to establish the 
communication. All the rest is handled by my custom code 
on the server and on the client.  
Once the connection is setup, we have a point-to-point  
socket communication link between server and client and we 
can use it for just about anything. All you need for this is a  
client-side http-library (we have ours written in Tcl) which  
will give you the socket back once the connection is  
established) and serve-side proc implementing your protocol 
of choice. So, the entire server http-machinery stays in place. 
I do not see any problem there because we have been using 
this trick for about 4 years in production very successfully. 
What I do not know is wether there are any gotcha's with that 
approach (we haven't hit any yet). 
I also do not know what will happen when we add alternate 
multiprotocol caps to server. Maybe this will become obsolete. 
But, as I see this now, the change is trivial and does not hurt 
anybody. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 00:21

Message:
Logged In: YES 
user_id=87254


HTTP also has (although our server doesn't, but I want to
add this) pipelining, which is sending multiple requests at
once without first waiting for the reply to previous requests.

That sounds like your "i do it when I send multiple http
requests".



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 23:45

Message:
Logged In: YES 
user_id=184124

Keepalive works when i close my connection, i.e. when i
exited from conn thread, but the point here is to do it from
the conn thread, i received request from my other server
sending me some info, i send my info back and get ack.
Everything inside one connection, before keepalive.
And yes, i send messages in my format, anyway i do it when i
send multiple http reaqests, i am interested in message body
only.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 23:32

Message:
Logged In: YES 
user_id=87254


HTTP already handles sending multiple requests accross one
socket with keepalives.  It might be more useful to add
keepalives to our HTTP client code.

In the case of HTTP keepalives, the socket is returned to
the driver thread after the first request, still open, and
the conn thread goes on to do other useful work.  If a new
request arrives before the connection times out it is passed
to the next free conn thread.

If you don't usee HTTP with keepalives you will have to
invent some kind of message chunking syntax and some kind of
time-out protocol.  But you will still end up wasting conn
threads as they idle waiting for possible extra messages.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 23:20

Message:
Logged In: YES 
user_id=184124

I could use it in my apps where i have cluster of several
servers exchanging http requests between each other,
sometimes one session needs 2 or more exchanges.
With this channel i can exchange using same connection and 
i need to send/receive data, not headers, so this can save
and improve my operations.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 23:12

Message:
Logged In: YES 
user_id=87254


Usually, by the time your script runs and you call [ns_conn
channel], the data from the client will already have been
read and buffered.  You can access this with [ns_conn content].

Does reading work?

What's the objective?  Do you want to hadle streaming data,
or make reading and writing more Tcl-like?


----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 23:07

Message:
Logged In: YES 
user_id=184124

Very usefull, commit it

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

[ naviserver-Feature Requests-1156875 ] Watchdog process restarts failed server

From: SourceForge.net <no...@so...> - 2005-03-17 11:46:21

Feature Requests item #1156875, was opened at 2005-03-04 12:03
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Watchdog process restarts failed server

Initial Comment:
We have been using this for quite some time and it 
proved extremely useful. 
We doublefork the nsd process and make the first 
forked instance control the second. The first one 
(the watchdog) reacts on exit codes and signals 
caught during the watch and correspondingly restarts 
the second instance (the worker). 
 
Also, we have added the the "-restart" option to 
the "ns_shutdown" command. This just sends the 
SIGINT to the worker process. The watchdog is 
handling this signal and respawns the worker 
automatically. 
 
During operation, the watchdog logs events and their 
cause  into the system log file. This looks like: 
Feb 28 04:00:05 Develop nsd[19400]: worker: started. 
Mar  1 04:00:13 Develop nsd[4475]: watchdog: worker 
19400 exited (2). 
Mar  1 04:00:15 Develop nsd[21290]: worker: started. 
Mar  1 04:00:18 Develop nsd[14705]: watchdog: worker 
19399 exited (2). 
Mar  1 04:00:20 Develop nsd[21300]: worker: started. 
 
We have done all the changes with "--enable-watchdog" 
so anybody who needs this feature will have to compile 
with this option.  
 

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 04:46

Message:
Logged In: YES 
user_id=87254


Oh, I thought you were going to commit this.  Sure, revert
the exit() stuff and commit away, or I'll get to it at the
wekend.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 04:23

Message:
Logged In: YES 
user_id=95086

Now, what? You go change/remove _exit and commit? 
Other things are ok for me. 
Or should I commit? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 04:02

Message:
Logged In: YES 
user_id=87254


Calling _exit()  was an attempt to clarify the intent of the
code.  But I didn't pay attention to the fact that the main
function is pluggable, so I guess that's a no-go.

4 hours of retry time before giving up sounds almost like
'inifinite'...  Looks like we can get something suitable
without adding extra switches, so we can commit this and
tweak the timing later.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-15 06:54

Message:
Logged In: YES 
user_id=95086

Stephen, 
why do you call _exit(0) instead of returning 
control to caller routine?  
 
Also.. 
  I don't want to go overboard with command line switches, but 
  an option to specify not to give up trying to restart the 
  server would be nice.  If you have that, you also need to 
  turn off the restart timeout doubling at a certain point.  I 
  don't know what the cleanest way to do that is. 
 
Restart time doubling is limited to 64 seconds (insane, should 
rather be 60 seconds, heh) so it should not double after 
reaching 60 secs. After that time we constantly wait 60 secs 
between restarts. 
With the server retry of 16 times and we will wait 1, 2, 4, 8,  
16, 32,  64 (i.e. total of 127) seconds and after that 8 times 64  
secs (= 512 total ) secs and eventually abort.  
All together, that will be 640 secs = about 10 minutes 
wait/retry. After that it will exit. I think if we extend retry count 
from 16 to 256 it will be pretty enough (about 4 hours) where 
one can fix the failure cause. Otherwise, I see syntactic 
problems of how to actually specify the retries parameter. 
Of curse, you can do something like 
  bin/nsd -w -r 16 
 
but the "-r"  (restart) would really have no meaning w/o -w 
and I do not like this very much indeed. I'm really for bumping 
the retry count higher (127 or 256 tries). 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 17:50

Message:
Logged In: YES 
user_id=184124

Last time addition:
Can we make syslog function public, Ns_Syslog?
Also Tcl interface to it would be usefull as well, below is
syslog from my nssys module, just a suggestion, no rush with
this:

static struct syslog_data {
   int opened;
   int facility;
   int options;
   char ident[32];
   Tcl_HashTable *priorities;
   Tcl_HashTable *facilities;
} syslog_data;

/* Syslog initialization */
void SyslogInit()
{
   memset(&syslog_data,0,sizeof(syslog_data));
   syslog_data.options = LOG_PID;
   syslog_data.facility = LOG_USER;
   if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY)))
    
strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1);
   syslog_data.facilities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS);
   AddEntry(syslog_data.facilities,"auth",LOG_AUTH);
   AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV);
   AddEntry(syslog_data.facilities,"cron",LOG_CRON);
   AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON);
   AddEntry(syslog_data.facilities,"kernel",LOG_KERN);
   AddEntry(syslog_data.facilities,"lpr",LOG_LPR);
   AddEntry(syslog_data.facilities,"mail",LOG_MAIL);
   AddEntry(syslog_data.facilities,"news",LOG_NEWS);
   AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG);
   AddEntry(syslog_data.facilities,"user",LOG_USER);
   AddEntry(syslog_data.facilities,"uucp",LOG_UUCP);
   AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0);
   AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1);
   AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2);
   AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3);
   AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4);
   AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5);
   AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6);
   AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7);
   syslog_data.priorities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS);
   AddEntry(syslog_data.priorities,"emerg",LOG_EMERG);
   AddEntry(syslog_data.priorities,"alert",LOG_ALERT);
   AddEntry(syslog_data.priorities,"crit",LOG_CRIT);
   AddEntry(syslog_data.priorities,"err",LOG_ERR);
   AddEntry(syslog_data.priorities,"error",LOG_ERR);
   AddEntry(syslog_data.priorities,"warning",LOG_WARNING);
   AddEntry(syslog_data.priorities,"notice",LOG_NOTICE);
   AddEntry(syslog_data.priorities,"info",LOG_INFO);
   AddEntry(syslog_data.priorities,"debug",LOG_DEBUG);
}

/*
 * sys_log
 *
 * Usage:
 *   sys_log ?-facility f -options o -ident i? priority message
 *
 *  facility    - kernel, cron, authpriv, mail, local0,
local1, daemon, local2,
 *                news, local3, local4, local5, local6,
syslog, local7, auth, uucp, lpr, user
 *  options     - list with any of { CONS NDELAY PERROR PID
ODELAY NOWAIT }
 *  ident       - ident is prepended to every message, and
is typically the program name
 *  priority    - info, alert, emerg, err, notice, warning,
error, crit, debug
 *
 * Returns:
 *   nothing
 */
static int SysLog(ClientData data,Tcl_Interp *interp,int
argc,char **argv)
{
    int i,priority = 0;
    Tcl_HashEntry *entry;
    
    if(argc < 1) {
      Tcl_AppendResult(interp,argv[0]," ?-facility f
-options o -ident i? priority message",0);
      return TCL_ERROR;
    }
    for(i = 1;i < argc-1;) {
      if(!strcmp(argv[i],"-facility")) {
        entry =
Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]);
        if(!entry) {
          Tcl_AppendResult(interp,"wrong facility
",argv[i+1],": available facilities: ",NULL);
          ListHash(interp,syslog_data.facilities);
          return TCL_ERROR;
        }
        syslog_data.facility = (int)Tcl_GetHashValue(entry);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-options")) {
        syslog_data.options = 0;
        if(strstr(argv[i+1],"CONS")) syslog_data.options |=
LOG_CONS;
        if(strstr(argv[i+1],"NDELAY")) syslog_data.options
|= LOG_NDELAY;
        if(strstr(argv[i+1],"PERROR")) syslog_data.options
|= LOG_PERROR;
        if(strstr(argv[i+1],"PID")) syslog_data.options |=
LOG_PID;
        if(strstr(argv[i+1],"ODELAY")) syslog_data.options
|= LOG_ODELAY;
        if(strstr(argv[i+1],"NOWAIT")) syslog_data.options
|= LOG_NOWAIT;
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-ident")) {
        memset(syslog_data.ident,0,sizeof(syslog_data.ident));
       
strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      break;
    }
    if(i < argc) {
      entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]);
      if(!entry) {
        Tcl_AppendResult(interp,"wrong level ",argv[i],":
available levels: ",NULL);
        ListHash(interp,syslog_data.priorities);
        return TCL_ERROR;
      }
      priority = (int)Tcl_GetHashValue(entry);
      i++;
    }
    if(i < argc) {
      if(!syslog_data.opened) {
       
openlog(syslog_data.ident,syslog_data.options,syslog_data.facility);
        syslog_data.opened = 1;
      }
      syslog(priority,argv[i]);
      return TCL_OK;
    }
    return TCL_OK;
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 16:44

Message:
Logged In: YES 
user_id=87254


Oops, looks like I forgot to attach the patch...  Here it is.

Yes, the server will attempt to restart a number of times
before giving up.  I think this is as appropriate for config
errors as it is for others, then there doesn't need to be a
dual-standard fatal error.


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 11:12

Message:
Logged In: YES 
user_id=95086

I'd rather wait for Stephens code and see what's done there 
and then make final changes and commit. It should not 
take long. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 11:07

Message:
Logged In: YES 
user_id=184124

Then i do not see any reason not committing it, we can have
discussions about it for another couple of years, but until
we have it and use it, it is just discussions. We aready
agreed on having it in the core.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 11:00

Message:
Logged In: YES 
user_id=95086

Stephen, 
You mind uploading your patch changes so I can peek into it? 
I think we should soon settle on some solution and move on. 
 
Vlad, 
The patch already does that. After 16 unsuccessfull restarts 
the watchdog exits. Between every restart we wait 1, 2, 4, 8,  
16, 32, .... 16384 seconds and then exit. This would be: 32767 
seconds or about 9 hours.  
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 07:53

Message:
Logged In: YES 
user_id=184124

I would say, repeat configured number of time and then exit.
This way all socket related problems will be cleared after
couple of restartes, config problems will make server exit .

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 07:41

Message:
Logged In: YES 
user_id=95086

Well, because I could not differentiate between config 
and later runtime errors, I thought its wiser to abort 
rather to repeatedly restart broken server. 
One possibility is to add different Ns_Fatal call like: 
 
  Ns_FatalEx(int exitcode, char *fmt, ...); 
 
Or? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 07:12

Message:
Logged In: YES 
user_id=87254


Are you sure Ns_Fatal should not restart the server?  Many
of the fatal errors are caused by bad configuration and I
can see why you might want the server to exit immediately. 
But there are also runtime fatal errors, and here I'd expect
the server to be restarted.

Some config errors are due to external factors like missing
directories or file system permissions and it would be nice
for the server to come back up as soon as the issue resolved
itself.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 02:52

Message:
Logged In: YES 
user_id=95086

Calling watchdog before or after chroot has no real 
implications I believe. 
Also, when the server exits with Ns_Fatal, it hsould not be 
restarted, I will look into your changes today. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-10 23:27

Message:
Logged In: YES 
user_id=87254


Oh yeah, and I moved the watchdog stuff eaven earlier,
before the prebind and chroot.  Hmm, now that I think about
it it's only the prebind that really needs to happen after
the watchdog is started, to ensure the sockets are always in
a sane state...



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-10 23:17

Message:
Logged In: YES 
user_id=87254


Am I reading this right?  It looks like if the server calls
Ns_Fatal() it will exit with code 1, but  0 and 1 are
treated as OK exit codes and the server will not be restarted.

I've attached a patch which changes the above, fixes the pid
problem in a different way because I completely forgot you
already posted below about that small glitch, use
Ns_ParseObjv(), adds the -w switch, and a couple of small
name changes.

I don't want to go overboard with command line switches, but
an option to specify not to give up trying to restart the
server would be nice.  If you have that, you also need to
turn off the restart timeout doubling at a certain point.  I
don't know what the cleanest way to do that is.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 13:09

Message:
Logged In: YES 
user_id=95086

This is correct. As soon as we agree on some 
implementation, I will put the -i processing 
and avoid starting watchdog for inittab starts. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 12:48

Message:
Logged In: YES 
user_id=184124


There still should be possibility to run nsd from inittab,
so when -i switch is given, no watchdog should be running,
let /sbin/init to handle restarts

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 09:19

Message:
Logged In: YES 
user_id=95086

small glich: 
After applying the patch, change nsmain.c 
from: 
    nsconf.pid = serverPid: 
to 
    nsconf.pid = getpid(); 
 
otherwise the pid file will contain the bogus 
server pid if the watchdog restarted it later. 
   

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 08:26

Message:
Logged In: YES 
user_id=95086

Another try... 
Important to note: SIGKILL (signal 9) cannot be handled 
hence if somebody kills the watchdog with SIGKILL, the 
server will be left lingering w/o the watchdog. This is  
important to know. I do not see any possibility how to 
recover in such cases (i.e. how to stop the server). 
 
Apart from that, all objections from Stephen are taken 
into account. Please try again. A new copy of watchdog.patch 
file is attached. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-07 22:20

Message:
Logged In: YES 
user_id=184124

I just found this code in my old server sources, i just
chnaged internal name to Ns/NS, used to run pretty stable.

-------------------------------------------------------------------

#define NS_EXIT 99

static void NsTerminate(int sig)
{
    printf("NS[%d] signal %d received...",getpid(),sig);
    // Kill the server with the same signal
    if(nsPid > 0) kill(nsPid,sig);

    // Exit in case of fatal signal
    if(sig != SIGHUP) {
      printf("NS[%d] terminating...",getpid());
      exit(0);
    }
    // Reassign signal handler
    signal(sig,NsTerminate);
}

void NsWatchdog(int argc, char *argv[], char *envp[])
{
    int failcount = 0;
    time_t start;
    int status;
    pid_t pid;

    signal(SIGTTOU,SIG_IGN);
    signal(SIGTTIN,SIG_IGN);
    signal(SIGTSTP,SIG_IGN);
    signal(SIGPIPE,SIG_IGN);
    signal(SIGQUIT,SIG_IGN);
    signal(SIGHUP,NsTerminate);
    signal(SIGINT,NsTerminate);
    signal(SIGTERM,NsTerminate);

    // Go background
    if((pid = fork())) {
      if(pid < 0) err_logger("warpConfigure: fork:
%s",strerror(errno));
       exit(0);
    }
    setsid();
       
    for(;;) {
               
        // Execute the real server
        nsPid = fork();
        // Child, continue as server
        if(nsPid == 0) {
          exit(nsMain(argc, argv, ServerInit));
        }
        
        /* parent, behaves like a guardian */
        time(&start);
        printf("NS[%d] server process started",getpid());
        pid = waitpid(-1, &status, 0);
        if(WIFEXITED(status))
          printf("NS[%d] child process exited with status
%d",pid,WEXITSTATUS(status));
        else
        if(WIFSIGNALED(status))
          printf("NS[%d] child process exited due to signal
%d",pid,WTERMSIG(status));
        else
          printf("NS[%d] child process exited", pid);

        // Special exit code
        if(WIFEXITED(status)) {
          if(WEXITSTATUS(status) == NS_EXIT) {
            printf("NS[%d] child configuration error,
exiting",getpid());
            exit(0);
          } else
          if(WEXITSTATUS(status) == SIGHUP) {
          } else {
            if(time(0) - start < 10) failcount++;
            if(failcount == 10) {
              printf("Exiting due to repeated, frequent
failures");
              exit(1);
            }
          }
        }
        sleep(3);
    }
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-07 21:23

Message:
Logged In: YES 
user_id=87254


NsWatchdog() is called after the server drops root privs, so
both the watchdog and the server run as the defined user. 
What happens if the server dies and on restart needs to
rebind privileged ports?

A ps listing shows two running processes, parent and child.
  If I kill either one, the watchdog dies, the server
continues to run.
  If I kill -9 the parent, the child continues to run.
  If I kill -9 the child, the server is restarted.
Something seems not quite right here...

I'm a bit confused about how the code works.  For example,
NsWatchdog() seems to ignore all of it's arguments?  Here's
the code which calls it:

if (mode == 0) {
        i = ns_fork();
        if (i < 0) {
            Ns_Fatal("nsmain: fork() failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        setsid();
        i = NsWatchdog(argc, argv, initProc);
        if (i < 0) {
            Ns_Fatal("nsmain: watchdog failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        nsconf.pid = getpid();
    }

NsWatchdog() says it returns the worker pid, it also sets
the global variable wpid.  The code above ignores the
returned value, and the global variable, and instead calls
getpid()...  The variable 'i' is existing code, but still
somehow doesn't seem suitable...

Could that Ns_Fatal() above be moved into the NsWatchdog()
function?  I think maybe some comment is needed here.  The
code structure is very like that just above where ns_fork()
is called, but this function will return *multiple* times,
right?  This is kind of suprising, and an extra twist on the
already confusing fork() semantics (or maybe it's just me
who gets confused by fork...).  A return value of 0 here is
a 'request for orderly shutdown', right?

How about some more logging to syslog?  For example,
distinguish between start and restart.  Mention when the
MAX_SLEEP_PERIOD has been reached, etc.


Couple of small things:

Can we refer to 'the server', rather than 'the worker'? 
Worker and Watchdog begin with 'w', and so does the global
variable wpid...  Maybe serverPid?

NsWatchdog is a static function, it doesn't need the Ns prefix.

In NsWatchdog(), the variable 'run' should be something like
'nretries', 'nap' should be something more like
'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS.

The comment for WaitForWorker() is misslabeled.

Should SigHandler() be called WatchdogSigHandler()?






----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-06 16:03

Message:
Logged In: YES 
user_id=184124

Looks good to me

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 05:06

Message:
Logged In: YES 
user_id=95086

Ah, correction: 
The restart option sends SIGINT to the worker process 
which causes the watchdog to restart it.  
And, the patchfile is now attached! 

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 05:03

Message:
Logged In: YES 
user_id=95086

Here is the patch. 
I have added "-restart" option to "ns_shutdown". It is rather
clumsy to parse but should do. We should rewrite this with
your args parsing routine. 
The restart option sends SIGTERM to the worker process
which causes the watchdog to restart it. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 16:46

Message:
Logged In: YES 
user_id=87254


I don't think this has to hide behind a config option.  It's
either a good idea or it's not.  Sounds good to me.  Is
there a patch?  I'm wondering about some of the
implementation details...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

[ naviserver-Feature Requests-1156141 ] Add ns_conn channel command

From: SourceForge.net <no...@so...> - 2005-03-17 11:41:14

Feature Requests item #1156141, was opened at 2005-03-03 13:02
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

Category: None
Group: None
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Add ns_conn channel command

Initial Comment:
Extend the ns_conn command to allow wrapping an Tcl 
channel arround it. This channel can be used to talk to
the remote client afterwards.

The idea is simple. The client uses standard http protocol
to open up a connection to the server. Server accepts the
connection and obtains the channel from it. The client 
also gets the channel to the server and from this point
on, 
both can exchange arbitrary data.

Example implementation is given as a patch against
the current CVS state.

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 04:41

Message:
Logged In: YES 
user_id=87254


Yes, I get it  :-)

My only point was that HTTP is not a short lived connection
protocol, if you don't want it to be.  Lengthy connections
with multiple messages seems to be one of the key features here.

The *only* thing HTTP can't do is have the server initiate a
message to the client.  For the kinds of cases you're
talking about here with multi-hour long connections, I'd be
wanting some kind of 'ping' request from the client every so
often just to make sure the server was idle and not broken.
 The ping becomes a 'what messages do you have for me?'
request, the server's response is itself a request to the
client.

Anyway, I was talking pure nonsense about the keepalive
stuff; the client doen't have to be so broken after all. 
I'm fine with you adding this feature.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 02:03

Message:
Logged In: YES 
user_id=95086

   "still don't understand what it buys you that a modern HTTP 
   1.1 implemnetation doesn't.  If there was no other way to do 
 
Stephen, I do not think you understood my point. What this 
command buys me is to be able to initiate an http connection 
and reuse the socket-to-socket comm channel for other 
purposes. 
 
For example we make "GET" request to a certain URL and  
then when the connection gets established, the server, 
takes the conn socket,  wraps a Tcl channel around it and  
passes this channel to my (arbitraty) Tcl proc. 
The client, instead of getting the response body, parses  
only the returned http headers to see that the connection 
succeeded and then passes this socket to my custom 
Tcl proc. Now, on the sever and on the client side you have  
a channel and both sides can use it to establish whatever  
protocol they want. They have a socket pipe between. 
The http is just used to handshake the peers, clear? 
 
What we do with this? We have two server instances on two 
different hosts. We open a channel (using the above trick) 
from one to another and then use this to generate a  
stream of preformatted packets (our protocol) on one side 
which does the backup of data from to the other side 
ultimately to the tape drive(s) attached to that host.  
We shuffle TB of data this way daily in about 500  
installations worldwide, so I  know this works fine.  
 
Now, see, this has nothing to do with HTTP1.1. or 1.0 alone.  
And yes, if you look in the implementation, you will see that  
it is a simple wrapper to the Ns_ConnSock. It simply wraps  
a Tcl channel around the the connection socket, nothing  
more and nothing less. 
 
And, I would not say that it requires a broken client.  It just  
requires a special client-processing. This is not ment for 
browser-clients. It has nothing to do with serving pages 
or any other short lived connections. Once we setup such 
a connection, we use it for hours and maybe days. 
 
We could have achieved this with a simple socket connection 
from client to a specific port  on the server. But we are not 
able to use any arbitrary port. All we have is port 80 (the web 
server port). So we must play by the http rules initially when 
accessing the server. When we passed this http handshake 
stuff, we are left with a socket-to-socket comm-channel  
which is transparent and can be used for any purpose 
whatsoever. 
 
Now, is this little bit more clear now? 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 20:32

Message:
Logged In: YES 
user_id=184124

What is broken about it? It is the Tcl interface to
Ns_ConnSock(), just Tcl way, using channel. The merit is to
use sync inter-exchange between client and server using
established connection. Client is not broken, it is designed
for this exchange, it is the tool to build messaging, not
general purpose protocol. It make naviserver more flexible
platform for implementing different ppas on top of it using
Tcl, that's all.
Btw, when are you going to commit your protocol extensions?

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 17:20

Message:
Logged In: YES 
user_id=87254


This command isn't a good idea because it creates a broken
interface which requires a broken client to work, and I
still don't understand what it buys you that a modern HTTP
1.1 implemnetation doesn't.  If there was no other way to do
it, then it may still have some merit.  But Ns_ConnSock() is
trivial and gives you all you need to create a Tcl channel.

We've already decided to do this the right way, whatever
that turns out to be, so why should we simultaneously also
do it the wrong way?

In fact you seem to agree, noting that this sucks because
it's not UDP and the overhead of HTTP is just too much for you.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 08:36

Message:
Logged In: YES 
user_id=184124

And that's why i want to add simple UDP support to the
server: for implementing simple messaging between servers in
the cluster, using UDP without HTTP overhead will make it
simple and faster. Using Tcl channel is intermediate
solution for such messaging because it eliminates making
multiple HTTP requests to the server but still uses heavy
TCP/HTTP overhead. Still i think this command is very
usefull and will make server more versatile provided that we
will supply usage examples with the server as well.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 08:26

Message:
Logged In: YES 
user_id=95086

  "Second, this has nothing to do with non-HTTP protocol, it is 
  HTTP communication protocol, what's inside the body has 
  nothing to do with it again." 
 
This is true. It is a regular http request, just for the body 
part, the peers are engaging in a bidirectional communication 
and can do whatever they pleased.  The request is considered 
done when one of the peers close the socket. 
 
Strictly speaking, I'm using the http body and this trick 
to implement arbitrary client-server comm channel. This will 
not work for browsers of course. It has nothing to do with 
browsers at all in fact. It arose out of the necessity to  
support alternate protocols given what we had in AS, 
that is pure http machinery. 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-04 07:38

Message:
Logged In: YES 
user_id=184124

On the closing note, keepalive has nothing to do with this
and should not be considered in this case. Even if you
connect using HTTP 1.1 and specify keepalive, sever has the
right to issue Connection: close header which will disable
keepalive.
So, it WILL work for all connections.
Second, this has nothing to do with non-HTTP protocol, it is
HTTP communication protocol, what's inside the body has
nothing to do with it again.
And pipelining is a bad example here, the point it to send
the request and wait for the answer and base on that send
another request. Pipelining will allow you just to submit
all requests and then later receive replies, it was designed
for speeding browser's connection especially retrieving images.

It looks to me Stephen you are all for HTTP core and
around-the-HTTP-driver solutions only. This will make simple
solutions complex because of getting around/workaround HTTP
based core in the server.

What is wrong with one simpe function which will give the
server more options without breaking anything?

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 02:45

Message:
Logged In: YES 
user_id=95086

I see. The gotcha's are: 
 
  o. it's yet another implementation of non-HTTP protocol 
  o. (HTTP 1.0 only, and no Keepalive header) be used to get 
  o. function in the docs might reasonably expect it to work for 
     all HTTP connections 
 
That's pretty clear. Well, I think you are right. This is our  
attempt to work around http-only caps of the server.  
Agreed. It will be better to get it done properly and we will 
then internally switch to the new implementation when 
it's ready. At the moment I have all this in a separate module 
and this can stay so until the multi-protocol caps are agreed 
upon and implemented. 
 
I will close this RFE then. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 02:36

Message:
Logged In: YES 
user_id=87254


I don't understand why you're not using standard HTTP
pipelining.  It would seem to be so much easier than having
to write custom client and server code.

This is a small change, but I'm not excited about it because
it's yet another implementation of non-HTTP protocol
support, which makes 3 so far...  :-(

It also seems to require that a deliberately broken client
(HTTP 1.0 only, and no Keepalive header) be used to get
around the server's io read-ahead.  Someone seeing this
function in the docs might reasonably expect it to work for
all HTTP connections.

Also, I don't think this requires core support. 
Ns_ConnSock() will return the underlying file descriptor for
a connection.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-04 01:36

Message:
Logged In: YES 
user_id=95086

OK. Maybe I should explain little bit more what I'm using this 
for. 
Imagine you'd like to make a client-server connection and 
then exchange arbitrary data on this connection. Imagine 
something like telnet connection. You telnet from client to 
server, get a comm-pipe betwen them and then you can 
shuffle whatever you like across this comm-pipe.  
What we have is a http server responding only to http  
requests (as it is now, w/o multiprotocol support). So, what 
I do is to use http to setup the connection, have a registered 
procedure on the server grab the socket and install my custom 
protocol handler on it. The http is only used to establish the 
communication. All the rest is handled by my custom code 
on the server and on the client.  
Once the connection is setup, we have a point-to-point  
socket communication link between server and client and we 
can use it for just about anything. All you need for this is a  
client-side http-library (we have ours written in Tcl) which  
will give you the socket back once the connection is  
established) and serve-side proc implementing your protocol 
of choice. So, the entire server http-machinery stays in place. 
I do not see any problem there because we have been using 
this trick for about 4 years in production very successfully. 
What I do not know is wether there are any gotcha's with that 
approach (we haven't hit any yet). 
I also do not know what will happen when we add alternate 
multiprotocol caps to server. Maybe this will become obsolete. 
But, as I see this now, the change is trivial and does not hurt 
anybody. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 16:21

Message:
Logged In: YES 
user_id=87254


HTTP also has (although our server doesn't, but I want to
add this) pipelining, which is sending multiple requests at
once without first waiting for the reply to previous requests.

That sounds like your "i do it when I send multiple http
requests".



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 15:45

Message:
Logged In: YES 
user_id=184124

Keepalive works when i close my connection, i.e. when i
exited from conn thread, but the point here is to do it from
the conn thread, i received request from my other server
sending me some info, i send my info back and get ack.
Everything inside one connection, before keepalive.
And yes, i send messages in my format, anyway i do it when i
send multiple http reaqests, i am interested in message body
only.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 15:32

Message:
Logged In: YES 
user_id=87254


HTTP already handles sending multiple requests accross one
socket with keepalives.  It might be more useful to add
keepalives to our HTTP client code.

In the case of HTTP keepalives, the socket is returned to
the driver thread after the first request, still open, and
the conn thread goes on to do other useful work.  If a new
request arrives before the connection times out it is passed
to the next free conn thread.

If you don't usee HTTP with keepalives you will have to
invent some kind of message chunking syntax and some kind of
time-out protocol.  But you will still end up wasting conn
threads as they idle waiting for possible extra messages.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 15:20

Message:
Logged In: YES 
user_id=184124

I could use it in my apps where i have cluster of several
servers exchanging http requests between each other,
sometimes one session needs 2 or more exchanges.
With this channel i can exchange using same connection and 
i need to send/receive data, not headers, so this can save
and improve my operations.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-03 15:12

Message:
Logged In: YES 
user_id=87254


Usually, by the time your script runs and you call [ns_conn
channel], the data from the client will already have been
read and buffered.  You can access this with [ns_conn content].

Does reading work?

What's the objective?  Do you want to hadle streaming data,
or make reading and writing more Tcl-like?


----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-03 15:07

Message:
Logged In: YES 
user_id=184124

Very usefull, commit it

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156141&group_id=130646

[ naviserver-Feature Requests-1162223 ] Tcl Callbacks

From: SourceForge.net <no...@so...> - 2005-03-17 11:28:28

Feature Requests item #1162223, was opened at 2005-03-12 17:22
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Stephen Deasey (sdeasey)
Assigned to: Stephen Deasey (sdeasey)
Summary: Tcl Callbacks

Initial Comment:

The server offers a dual C/Tcl interface.  There are a
number of of places where Tcl code needs to be executed
by the C core, or the core doesn't know whether a
callback is C or Tcl.

I've attached the patch
naviserver-4.1-tclcallbacks.patch.  It extracts the Tcl
callback infrastructure from tclsched.c into
tclcallbacks.c and makes it a public interface.  Here
it is:

Ns_TclNewCallback()
Ns_TclNewCallbackObj()
Ns_TclEvalCallback()
Ns_TclCallbackProc()
Ns_TclFreeCallback()
Ns_TclCallbackArgProc()

It provides a standard interface for Tcl callbacks
throughout the server, handling interp allocation,
introspection, evaluation and cleanup.

Also, tclcallbacks.c is somewhat analogous to
callbacks.c, and so the generic ns_atexit etc.
callbacks which were in tclsched.c have been moved to
the tclcallbacks file.

The first set of code to be converted to the new
interface is tclsched.c itself, and I've converted to
using Ns_ParseObjv() while I was in there.  File size
has dropped ~250 lines.


One thing I'm wondering about is whether the
Ns_TclEvalCallback() routine should take extra args to
send to the Tcl proc.  There seems to be no standard
scheme for ordering the args so there still needs to be
a way to manually construct the command, which is why
the members of the Ns_TclCallback struct are public. 
But for this very reason, it might be worth providing a
standard way to do this so all future callbacks are
consistent.


It would be nice in the future to come up with some way
of registereing these callback points such that you
don't have to create an ns_info subcommand for example
manually.  It should be possible for loadable modules
to register their callback points and have them show up
in the right place.



----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 04:28

Message:
Logged In: YES 
user_id=87254


Yes, I forgot to mention that Tcl callbacks are a little
more general than those for C in callbacks.c.  Those take a
single arg which is the context you registered with the
proc, and return void.   Tcl callbacks on the other hand can
return a result which at the C level gets appended to a
dstring if you pass that in.

The members of the Ns_TclCallback structure are public so
that you can construct a command and args in any format. 
This is the question I was asking originally: should there
be a default recomended way to do this?

So for example Ns_TclEvalCallback() might take varargs
cstrings, which if passed would be added to the tcl command
to execute:

  command ?cbarg1? ?cbarg2? ?context?

Where cbarg2 and cbarg2 are specific to the type of callback
and context is what was registered with the proc by the Tcl
programmer.

Unfortunately that wouldn't cover all the bases as some of
the existing APIs do weird things like place callback
specific args after the context or if no context was passed,
always append an empty string.  For new callbacks, something
standard like the above scheme should be used.


If you look at the patch for virtual hosting you'll see that
two new Tcl callbacks were created and they contain simillar
boiler plate.  This makes me think that maybe there should
be something like an [ns_addcallback serverroot myrootproc
mycontext] interface.

It's a little more complicated than that because some
callbacks are singular, like the serverroot proc, and some
are plural, like filters.  So maybe the thing to do is get
some experience with this simple version and then expand it
later...





----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-12 17:46

Message:
Logged In: YES 
user_id=184124

It looks good, unified callback interface would be better
but Tcl filters. traces, requestprocs uses different
arguments/parameters and currently the only way to get info
about them to use ns_info interface and Ns_ArgProc
interface. Tcl_Callback are for simple proc/args callbacks,
filters is special case, see if you can convert them.
But i am for this change.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-12 17:30

Message:
Logged In: YES 
user_id=87254


I've attached a patch which converts the tclrequest.c file
to use the new callback API as an example.  It also uses the
Ns_ParseObjv() API, and I've removed support for the legacy
conn variable.

It removes ~200 lines and all utility functions.  The
callback API provides default Ns_ArgProc introspection, but
this will be much nicer when combined with Vlad's RFE
1161597  :-)



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1162223&group_id=130646

[ naviserver-Feature Requests-1156875 ] Watchdog process restarts failed server

From: SourceForge.net <no...@so...> - 2005-03-17 11:23:33

Feature Requests item #1156875, was opened at 2005-03-04 20:03
Message generated for change (Comment added) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Watchdog process restarts failed server

Initial Comment:
We have been using this for quite some time and it 
proved extremely useful. 
We doublefork the nsd process and make the first 
forked instance control the second. The first one 
(the watchdog) reacts on exit codes and signals 
caught during the watch and correspondingly restarts 
the second instance (the worker). 
 
Also, we have added the the "-restart" option to 
the "ns_shutdown" command. This just sends the 
SIGINT to the worker process. The watchdog is 
handling this signal and respawns the worker 
automatically. 
 
During operation, the watchdog logs events and their 
cause  into the system log file. This looks like: 
Feb 28 04:00:05 Develop nsd[19400]: worker: started. 
Mar  1 04:00:13 Develop nsd[4475]: watchdog: worker 
19400 exited (2). 
Mar  1 04:00:15 Develop nsd[21290]: worker: started. 
Mar  1 04:00:18 Develop nsd[14705]: watchdog: worker 
19399 exited (2). 
Mar  1 04:00:20 Develop nsd[21300]: worker: started. 
 
We have done all the changes with "--enable-watchdog" 
so anybody who needs this feature will have to compile 
with this option.  
 

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-17 12:23

Message:
Logged In: YES 
user_id=95086

Now, what? You go change/remove _exit and commit? 
Other things are ok for me. 
Or should I commit? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 12:02

Message:
Logged In: YES 
user_id=87254


Calling _exit()  was an attempt to clarify the intent of the
code.  But I didn't pay attention to the fact that the main
function is pluggable, so I guess that's a no-go.

4 hours of retry time before giving up sounds almost like
'inifinite'...  Looks like we can get something suitable
without adding extra switches, so we can commit this and
tweak the timing later.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-15 14:54

Message:
Logged In: YES 
user_id=95086

Stephen, 
why do you call _exit(0) instead of returning 
control to caller routine?  
 
Also.. 
  I don't want to go overboard with command line switches, but 
  an option to specify not to give up trying to restart the 
  server would be nice.  If you have that, you also need to 
  turn off the restart timeout doubling at a certain point.  I 
  don't know what the cleanest way to do that is. 
 
Restart time doubling is limited to 64 seconds (insane, should 
rather be 60 seconds, heh) so it should not double after 
reaching 60 secs. After that time we constantly wait 60 secs 
between restarts. 
With the server retry of 16 times and we will wait 1, 2, 4, 8,  
16, 32,  64 (i.e. total of 127) seconds and after that 8 times 64  
secs (= 512 total ) secs and eventually abort.  
All together, that will be 640 secs = about 10 minutes 
wait/retry. After that it will exit. I think if we extend retry count 
from 16 to 256 it will be pretty enough (about 4 hours) where 
one can fix the failure cause. Otherwise, I see syntactic 
problems of how to actually specify the retries parameter. 
Of curse, you can do something like 
  bin/nsd -w -r 16 
 
but the "-r"  (restart) would really have no meaning w/o -w 
and I do not like this very much indeed. I'm really for bumping 
the retry count higher (127 or 256 tries). 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-15 01:50

Message:
Logged In: YES 
user_id=184124

Last time addition:
Can we make syslog function public, Ns_Syslog?
Also Tcl interface to it would be usefull as well, below is
syslog from my nssys module, just a suggestion, no rush with
this:

static struct syslog_data {
   int opened;
   int facility;
   int options;
   char ident[32];
   Tcl_HashTable *priorities;
   Tcl_HashTable *facilities;
} syslog_data;

/* Syslog initialization */
void SyslogInit()
{
   memset(&syslog_data,0,sizeof(syslog_data));
   syslog_data.options = LOG_PID;
   syslog_data.facility = LOG_USER;
   if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY)))
    
strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1);
   syslog_data.facilities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS);
   AddEntry(syslog_data.facilities,"auth",LOG_AUTH);
   AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV);
   AddEntry(syslog_data.facilities,"cron",LOG_CRON);
   AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON);
   AddEntry(syslog_data.facilities,"kernel",LOG_KERN);
   AddEntry(syslog_data.facilities,"lpr",LOG_LPR);
   AddEntry(syslog_data.facilities,"mail",LOG_MAIL);
   AddEntry(syslog_data.facilities,"news",LOG_NEWS);
   AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG);
   AddEntry(syslog_data.facilities,"user",LOG_USER);
   AddEntry(syslog_data.facilities,"uucp",LOG_UUCP);
   AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0);
   AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1);
   AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2);
   AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3);
   AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4);
   AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5);
   AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6);
   AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7);
   syslog_data.priorities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS);
   AddEntry(syslog_data.priorities,"emerg",LOG_EMERG);
   AddEntry(syslog_data.priorities,"alert",LOG_ALERT);
   AddEntry(syslog_data.priorities,"crit",LOG_CRIT);
   AddEntry(syslog_data.priorities,"err",LOG_ERR);
   AddEntry(syslog_data.priorities,"error",LOG_ERR);
   AddEntry(syslog_data.priorities,"warning",LOG_WARNING);
   AddEntry(syslog_data.priorities,"notice",LOG_NOTICE);
   AddEntry(syslog_data.priorities,"info",LOG_INFO);
   AddEntry(syslog_data.priorities,"debug",LOG_DEBUG);
}

/*
 * sys_log
 *
 * Usage:
 *   sys_log ?-facility f -options o -ident i? priority message
 *
 *  facility    - kernel, cron, authpriv, mail, local0,
local1, daemon, local2,
 *                news, local3, local4, local5, local6,
syslog, local7, auth, uucp, lpr, user
 *  options     - list with any of { CONS NDELAY PERROR PID
ODELAY NOWAIT }
 *  ident       - ident is prepended to every message, and
is typically the program name
 *  priority    - info, alert, emerg, err, notice, warning,
error, crit, debug
 *
 * Returns:
 *   nothing
 */
static int SysLog(ClientData data,Tcl_Interp *interp,int
argc,char **argv)
{
    int i,priority = 0;
    Tcl_HashEntry *entry;
    
    if(argc < 1) {
      Tcl_AppendResult(interp,argv[0]," ?-facility f
-options o -ident i? priority message",0);
      return TCL_ERROR;
    }
    for(i = 1;i < argc-1;) {
      if(!strcmp(argv[i],"-facility")) {
        entry =
Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]);
        if(!entry) {
          Tcl_AppendResult(interp,"wrong facility
",argv[i+1],": available facilities: ",NULL);
          ListHash(interp,syslog_data.facilities);
          return TCL_ERROR;
        }
        syslog_data.facility = (int)Tcl_GetHashValue(entry);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-options")) {
        syslog_data.options = 0;
        if(strstr(argv[i+1],"CONS")) syslog_data.options |=
LOG_CONS;
        if(strstr(argv[i+1],"NDELAY")) syslog_data.options
|= LOG_NDELAY;
        if(strstr(argv[i+1],"PERROR")) syslog_data.options
|= LOG_PERROR;
        if(strstr(argv[i+1],"PID")) syslog_data.options |=
LOG_PID;
        if(strstr(argv[i+1],"ODELAY")) syslog_data.options
|= LOG_ODELAY;
        if(strstr(argv[i+1],"NOWAIT")) syslog_data.options
|= LOG_NOWAIT;
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-ident")) {
        memset(syslog_data.ident,0,sizeof(syslog_data.ident));
       
strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      break;
    }
    if(i < argc) {
      entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]);
      if(!entry) {
        Tcl_AppendResult(interp,"wrong level ",argv[i],":
available levels: ",NULL);
        ListHash(interp,syslog_data.priorities);
        return TCL_ERROR;
      }
      priority = (int)Tcl_GetHashValue(entry);
      i++;
    }
    if(i < argc) {
      if(!syslog_data.opened) {
       
openlog(syslog_data.ident,syslog_data.options,syslog_data.facility);
        syslog_data.opened = 1;
      }
      syslog(priority,argv[i]);
      return TCL_OK;
    }
    return TCL_OK;
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-15 00:44

Message:
Logged In: YES 
user_id=87254


Oops, looks like I forgot to attach the patch...  Here it is.

Yes, the server will attempt to restart a number of times
before giving up.  I think this is as appropriate for config
errors as it is for others, then there doesn't need to be a
dual-standard fatal error.


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 19:12

Message:
Logged In: YES 
user_id=95086

I'd rather wait for Stephens code and see what's done there 
and then make final changes and commit. It should not 
take long. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 19:07

Message:
Logged In: YES 
user_id=184124

Then i do not see any reason not committing it, we can have
discussions about it for another couple of years, but until
we have it and use it, it is just discussions. We aready
agreed on having it in the core.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 19:00

Message:
Logged In: YES 
user_id=95086

Stephen, 
You mind uploading your patch changes so I can peek into it? 
I think we should soon settle on some solution and move on. 
 
Vlad, 
The patch already does that. After 16 unsuccessfull restarts 
the watchdog exits. Between every restart we wait 1, 2, 4, 8,  
16, 32, .... 16384 seconds and then exit. This would be: 32767 
seconds or about 9 hours.  
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 15:53

Message:
Logged In: YES 
user_id=184124

I would say, repeat configured number of time and then exit.
This way all socket related problems will be cleared after
couple of restartes, config problems will make server exit .

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 15:41

Message:
Logged In: YES 
user_id=95086

Well, because I could not differentiate between config 
and later runtime errors, I thought its wiser to abort 
rather to repeatedly restart broken server. 
One possibility is to add different Ns_Fatal call like: 
 
  Ns_FatalEx(int exitcode, char *fmt, ...); 
 
Or? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 15:12

Message:
Logged In: YES 
user_id=87254


Are you sure Ns_Fatal should not restart the server?  Many
of the fatal errors are caused by bad configuration and I
can see why you might want the server to exit immediately. 
But there are also runtime fatal errors, and here I'd expect
the server to be restarted.

Some config errors are due to external factors like missing
directories or file system permissions and it would be nice
for the server to come back up as soon as the issue resolved
itself.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 10:52

Message:
Logged In: YES 
user_id=95086

Calling watchdog before or after chroot has no real 
implications I believe. 
Also, when the server exits with Ns_Fatal, it hsould not be 
restarted, I will look into your changes today. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 07:27

Message:
Logged In: YES 
user_id=87254


Oh yeah, and I moved the watchdog stuff eaven earlier,
before the prebind and chroot.  Hmm, now that I think about
it it's only the prebind that really needs to happen after
the watchdog is started, to ensure the sockets are always in
a sane state...



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 07:17

Message:
Logged In: YES 
user_id=87254


Am I reading this right?  It looks like if the server calls
Ns_Fatal() it will exit with code 1, but  0 and 1 are
treated as OK exit codes and the server will not be restarted.

I've attached a patch which changes the above, fixes the pid
problem in a different way because I completely forgot you
already posted below about that small glitch, use
Ns_ParseObjv(), adds the -w switch, and a couple of small
name changes.

I don't want to go overboard with command line switches, but
an option to specify not to give up trying to restart the
server would be nice.  If you have that, you also need to
turn off the restart timeout doubling at a certain point.  I
don't know what the cleanest way to do that is.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 21:09

Message:
Logged In: YES 
user_id=95086

This is correct. As soon as we agree on some 
implementation, I will put the -i processing 
and avoid starting watchdog for inittab starts. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 20:48

Message:
Logged In: YES 
user_id=184124


There still should be possibility to run nsd from inittab,
so when -i switch is given, no watchdog should be running,
let /sbin/init to handle restarts

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 17:19

Message:
Logged In: YES 
user_id=95086

small glich: 
After applying the patch, change nsmain.c 
from: 
    nsconf.pid = serverPid: 
to 
    nsconf.pid = getpid(); 
 
otherwise the pid file will contain the bogus 
server pid if the watchdog restarted it later. 
   

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 16:26

Message:
Logged In: YES 
user_id=95086

Another try... 
Important to note: SIGKILL (signal 9) cannot be handled 
hence if somebody kills the watchdog with SIGKILL, the 
server will be left lingering w/o the watchdog. This is  
important to know. I do not see any possibility how to 
recover in such cases (i.e. how to stop the server). 
 
Apart from that, all objections from Stephen are taken 
into account. Please try again. A new copy of watchdog.patch 
file is attached. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 06:20

Message:
Logged In: YES 
user_id=184124

I just found this code in my old server sources, i just
chnaged internal name to Ns/NS, used to run pretty stable.

-------------------------------------------------------------------

#define NS_EXIT 99

static void NsTerminate(int sig)
{
    printf("NS[%d] signal %d received...",getpid(),sig);
    // Kill the server with the same signal
    if(nsPid > 0) kill(nsPid,sig);

    // Exit in case of fatal signal
    if(sig != SIGHUP) {
      printf("NS[%d] terminating...",getpid());
      exit(0);
    }
    // Reassign signal handler
    signal(sig,NsTerminate);
}

void NsWatchdog(int argc, char *argv[], char *envp[])
{
    int failcount = 0;
    time_t start;
    int status;
    pid_t pid;

    signal(SIGTTOU,SIG_IGN);
    signal(SIGTTIN,SIG_IGN);
    signal(SIGTSTP,SIG_IGN);
    signal(SIGPIPE,SIG_IGN);
    signal(SIGQUIT,SIG_IGN);
    signal(SIGHUP,NsTerminate);
    signal(SIGINT,NsTerminate);
    signal(SIGTERM,NsTerminate);

    // Go background
    if((pid = fork())) {
      if(pid < 0) err_logger("warpConfigure: fork:
%s",strerror(errno));
       exit(0);
    }
    setsid();
       
    for(;;) {
               
        // Execute the real server
        nsPid = fork();
        // Child, continue as server
        if(nsPid == 0) {
          exit(nsMain(argc, argv, ServerInit));
        }
        
        /* parent, behaves like a guardian */
        time(&start);
        printf("NS[%d] server process started",getpid());
        pid = waitpid(-1, &status, 0);
        if(WIFEXITED(status))
          printf("NS[%d] child process exited with status
%d",pid,WEXITSTATUS(status));
        else
        if(WIFSIGNALED(status))
          printf("NS[%d] child process exited due to signal
%d",pid,WTERMSIG(status));
        else
          printf("NS[%d] child process exited", pid);

        // Special exit code
        if(WIFEXITED(status)) {
          if(WEXITSTATUS(status) == NS_EXIT) {
            printf("NS[%d] child configuration error,
exiting",getpid());
            exit(0);
          } else
          if(WEXITSTATUS(status) == SIGHUP) {
          } else {
            if(time(0) - start < 10) failcount++;
            if(failcount == 10) {
              printf("Exiting due to repeated, frequent
failures");
              exit(1);
            }
          }
        }
        sleep(3);
    }
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-08 05:23

Message:
Logged In: YES 
user_id=87254


NsWatchdog() is called after the server drops root privs, so
both the watchdog and the server run as the defined user. 
What happens if the server dies and on restart needs to
rebind privileged ports?

A ps listing shows two running processes, parent and child.
  If I kill either one, the watchdog dies, the server
continues to run.
  If I kill -9 the parent, the child continues to run.
  If I kill -9 the child, the server is restarted.
Something seems not quite right here...

I'm a bit confused about how the code works.  For example,
NsWatchdog() seems to ignore all of it's arguments?  Here's
the code which calls it:

if (mode == 0) {
        i = ns_fork();
        if (i < 0) {
            Ns_Fatal("nsmain: fork() failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        setsid();
        i = NsWatchdog(argc, argv, initProc);
        if (i < 0) {
            Ns_Fatal("nsmain: watchdog failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        nsconf.pid = getpid();
    }

NsWatchdog() says it returns the worker pid, it also sets
the global variable wpid.  The code above ignores the
returned value, and the global variable, and instead calls
getpid()...  The variable 'i' is existing code, but still
somehow doesn't seem suitable...

Could that Ns_Fatal() above be moved into the NsWatchdog()
function?  I think maybe some comment is needed here.  The
code structure is very like that just above where ns_fork()
is called, but this function will return *multiple* times,
right?  This is kind of suprising, and an extra twist on the
already confusing fork() semantics (or maybe it's just me
who gets confused by fork...).  A return value of 0 here is
a 'request for orderly shutdown', right?

How about some more logging to syslog?  For example,
distinguish between start and restart.  Mention when the
MAX_SLEEP_PERIOD has been reached, etc.


Couple of small things:

Can we refer to 'the server', rather than 'the worker'? 
Worker and Watchdog begin with 'w', and so does the global
variable wpid...  Maybe serverPid?

NsWatchdog is a static function, it doesn't need the Ns prefix.

In NsWatchdog(), the variable 'run' should be something like
'nretries', 'nap' should be something more like
'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS.

The comment for WaitForWorker() is misslabeled.

Should SigHandler() be called WatchdogSigHandler()?






----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-07 00:03

Message:
Logged In: YES 
user_id=184124

Looks good to me

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 13:06

Message:
Logged In: YES 
user_id=95086

Ah, correction: 
The restart option sends SIGINT to the worker process 
which causes the watchdog to restart it.  
And, the patchfile is now attached! 

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 13:03

Message:
Logged In: YES 
user_id=95086

Here is the patch. 
I have added "-restart" option to "ns_shutdown". It is rather
clumsy to parse but should do. We should rewrite this with
your args parsing routine. 
The restart option sends SIGTERM to the worker process
which causes the watchdog to restart it. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 00:46

Message:
Logged In: YES 
user_id=87254


I don't think this has to hide behind a config option.  It's
either a good idea or it's not.  Sounds good to me.  Is
there a patch?  I'm wondering about some of the
implementation details...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

[ naviserver-Feature Requests-1161597 ] Extend ns_info with info about traces/filters/procs

From: SourceForge.net <no...@so...> - 2005-03-17 11:06:18

Feature Requests item #1161597, was opened at 2005-03-11 12:30
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646

Category: C-API
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Vlad Seryakov (seryakov)
Assigned to: Vlad Seryakov (seryakov)
Summary: Extend ns_info with info about traces/filters/procs

Initial Comment:
Attached is patch that extends ns_info with 3 commands:

ns_info traces
ns_info filters
ns_info requestprocs

They work the same way as ns_info callbacks, no
functionality changed or added, just information commands

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 04:06

Message:
Logged In: YES 
user_id=87254


To get info on the binary callbacks we're just going to have
to register proc info for all the default filters, request
procs etc.  If we cover everything there should be a
reasonable number of examples for module writers to follow.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-12 18:47

Message:
Logged In: YES 
user_id=184124

Also, somein docs would be usefull to put info how to get
info about some binary callbacks:

Welcome to ossweb running at /usr/local/ns/bin/nsd (pid 5524)
ossweb:nscp 1> ns_info traces
p:0xb7187be0 a:0x80817d8

# gdb
(gdb) attach 12345

(gdb) info symbol 0xb7187be0
LogTrace in section .text





----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-12 18:44

Message:
Logged In: YES 
user_id=184124

I started with minimal changes, #defines and &syntax can be
chnages of course. now that you started TclCallbacks, this
thing depends when you commit your changes, it requires
Ns_ArgProc and ns_info stuff.
Once we have unified API for callbask i will change it to be
server-specific as well, it took a while to figure out how
it works :-))

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-12 18:37

Message:
Logged In: YES 
user_id=87254


This is a nice addition.  

ns_info filters/traces is server specific, but ns_info
requestprocs returns info for all servers.  Can you make
this per-server also, to be consistent?

The URL walking procs look a little tricky, I tnink they
would benefit from having their own more detailed comments.

The comment currently says it calls a function for each
node, and the effect depends on the function.  But the
function signature is Ns_ArgProc so there's really only one
thing it can do.

Why does it fix the stack size at 512?  Can these magic
numbers be #define'd.

Using a while loop rather than an empty for loop would be a
little easier to read.

There's a lot of this kind of thing:
    (&(triePtr->branches))->n
Could this be simplified to:
    triePtr->branches.n ?


Otherwise, looks pretty good.  I like it.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1161597&group_id=130646

[ naviserver-Feature Requests-1156875 ] Watchdog process restarts failed server

From: SourceForge.net <no...@so...> - 2005-03-17 11:02:00

Feature Requests item #1156875, was opened at 2005-03-04 12:03
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Watchdog process restarts failed server

Initial Comment:
We have been using this for quite some time and it 
proved extremely useful. 
We doublefork the nsd process and make the first 
forked instance control the second. The first one 
(the watchdog) reacts on exit codes and signals 
caught during the watch and correspondingly restarts 
the second instance (the worker). 
 
Also, we have added the the "-restart" option to 
the "ns_shutdown" command. This just sends the 
SIGINT to the worker process. The watchdog is 
handling this signal and respawns the worker 
automatically. 
 
During operation, the watchdog logs events and their 
cause  into the system log file. This looks like: 
Feb 28 04:00:05 Develop nsd[19400]: worker: started. 
Mar  1 04:00:13 Develop nsd[4475]: watchdog: worker 
19400 exited (2). 
Mar  1 04:00:15 Develop nsd[21290]: worker: started. 
Mar  1 04:00:18 Develop nsd[14705]: watchdog: worker 
19399 exited (2). 
Mar  1 04:00:20 Develop nsd[21300]: worker: started. 
 
We have done all the changes with "--enable-watchdog" 
so anybody who needs this feature will have to compile 
with this option.  
 

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 04:02

Message:
Logged In: YES 
user_id=87254


Calling _exit()  was an attempt to clarify the intent of the
code.  But I didn't pay attention to the fact that the main
function is pluggable, so I guess that's a no-go.

4 hours of retry time before giving up sounds almost like
'inifinite'...  Looks like we can get something suitable
without adding extra switches, so we can commit this and
tweak the timing later.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-15 06:54

Message:
Logged In: YES 
user_id=95086

Stephen, 
why do you call _exit(0) instead of returning 
control to caller routine?  
 
Also.. 
  I don't want to go overboard with command line switches, but 
  an option to specify not to give up trying to restart the 
  server would be nice.  If you have that, you also need to 
  turn off the restart timeout doubling at a certain point.  I 
  don't know what the cleanest way to do that is. 
 
Restart time doubling is limited to 64 seconds (insane, should 
rather be 60 seconds, heh) so it should not double after 
reaching 60 secs. After that time we constantly wait 60 secs 
between restarts. 
With the server retry of 16 times and we will wait 1, 2, 4, 8,  
16, 32,  64 (i.e. total of 127) seconds and after that 8 times 64  
secs (= 512 total ) secs and eventually abort.  
All together, that will be 640 secs = about 10 minutes 
wait/retry. After that it will exit. I think if we extend retry count 
from 16 to 256 it will be pretty enough (about 4 hours) where 
one can fix the failure cause. Otherwise, I see syntactic 
problems of how to actually specify the retries parameter. 
Of curse, you can do something like 
  bin/nsd -w -r 16 
 
but the "-r"  (restart) would really have no meaning w/o -w 
and I do not like this very much indeed. I'm really for bumping 
the retry count higher (127 or 256 tries). 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 17:50

Message:
Logged In: YES 
user_id=184124

Last time addition:
Can we make syslog function public, Ns_Syslog?
Also Tcl interface to it would be usefull as well, below is
syslog from my nssys module, just a suggestion, no rush with
this:

static struct syslog_data {
   int opened;
   int facility;
   int options;
   char ident[32];
   Tcl_HashTable *priorities;
   Tcl_HashTable *facilities;
} syslog_data;

/* Syslog initialization */
void SyslogInit()
{
   memset(&syslog_data,0,sizeof(syslog_data));
   syslog_data.options = LOG_PID;
   syslog_data.facility = LOG_USER;
   if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY)))
    
strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1);
   syslog_data.facilities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS);
   AddEntry(syslog_data.facilities,"auth",LOG_AUTH);
   AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV);
   AddEntry(syslog_data.facilities,"cron",LOG_CRON);
   AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON);
   AddEntry(syslog_data.facilities,"kernel",LOG_KERN);
   AddEntry(syslog_data.facilities,"lpr",LOG_LPR);
   AddEntry(syslog_data.facilities,"mail",LOG_MAIL);
   AddEntry(syslog_data.facilities,"news",LOG_NEWS);
   AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG);
   AddEntry(syslog_data.facilities,"user",LOG_USER);
   AddEntry(syslog_data.facilities,"uucp",LOG_UUCP);
   AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0);
   AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1);
   AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2);
   AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3);
   AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4);
   AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5);
   AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6);
   AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7);
   syslog_data.priorities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS);
   AddEntry(syslog_data.priorities,"emerg",LOG_EMERG);
   AddEntry(syslog_data.priorities,"alert",LOG_ALERT);
   AddEntry(syslog_data.priorities,"crit",LOG_CRIT);
   AddEntry(syslog_data.priorities,"err",LOG_ERR);
   AddEntry(syslog_data.priorities,"error",LOG_ERR);
   AddEntry(syslog_data.priorities,"warning",LOG_WARNING);
   AddEntry(syslog_data.priorities,"notice",LOG_NOTICE);
   AddEntry(syslog_data.priorities,"info",LOG_INFO);
   AddEntry(syslog_data.priorities,"debug",LOG_DEBUG);
}

/*
 * sys_log
 *
 * Usage:
 *   sys_log ?-facility f -options o -ident i? priority message
 *
 *  facility    - kernel, cron, authpriv, mail, local0,
local1, daemon, local2,
 *                news, local3, local4, local5, local6,
syslog, local7, auth, uucp, lpr, user
 *  options     - list with any of { CONS NDELAY PERROR PID
ODELAY NOWAIT }
 *  ident       - ident is prepended to every message, and
is typically the program name
 *  priority    - info, alert, emerg, err, notice, warning,
error, crit, debug
 *
 * Returns:
 *   nothing
 */
static int SysLog(ClientData data,Tcl_Interp *interp,int
argc,char **argv)
{
    int i,priority = 0;
    Tcl_HashEntry *entry;
    
    if(argc < 1) {
      Tcl_AppendResult(interp,argv[0]," ?-facility f
-options o -ident i? priority message",0);
      return TCL_ERROR;
    }
    for(i = 1;i < argc-1;) {
      if(!strcmp(argv[i],"-facility")) {
        entry =
Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]);
        if(!entry) {
          Tcl_AppendResult(interp,"wrong facility
",argv[i+1],": available facilities: ",NULL);
          ListHash(interp,syslog_data.facilities);
          return TCL_ERROR;
        }
        syslog_data.facility = (int)Tcl_GetHashValue(entry);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-options")) {
        syslog_data.options = 0;
        if(strstr(argv[i+1],"CONS")) syslog_data.options |=
LOG_CONS;
        if(strstr(argv[i+1],"NDELAY")) syslog_data.options
|= LOG_NDELAY;
        if(strstr(argv[i+1],"PERROR")) syslog_data.options
|= LOG_PERROR;
        if(strstr(argv[i+1],"PID")) syslog_data.options |=
LOG_PID;
        if(strstr(argv[i+1],"ODELAY")) syslog_data.options
|= LOG_ODELAY;
        if(strstr(argv[i+1],"NOWAIT")) syslog_data.options
|= LOG_NOWAIT;
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-ident")) {
        memset(syslog_data.ident,0,sizeof(syslog_data.ident));
       
strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      break;
    }
    if(i < argc) {
      entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]);
      if(!entry) {
        Tcl_AppendResult(interp,"wrong level ",argv[i],":
available levels: ",NULL);
        ListHash(interp,syslog_data.priorities);
        return TCL_ERROR;
      }
      priority = (int)Tcl_GetHashValue(entry);
      i++;
    }
    if(i < argc) {
      if(!syslog_data.opened) {
       
openlog(syslog_data.ident,syslog_data.options,syslog_data.facility);
        syslog_data.opened = 1;
      }
      syslog(priority,argv[i]);
      return TCL_OK;
    }
    return TCL_OK;
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 16:44

Message:
Logged In: YES 
user_id=87254


Oops, looks like I forgot to attach the patch...  Here it is.

Yes, the server will attempt to restart a number of times
before giving up.  I think this is as appropriate for config
errors as it is for others, then there doesn't need to be a
dual-standard fatal error.


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 11:12

Message:
Logged In: YES 
user_id=95086

I'd rather wait for Stephens code and see what's done there 
and then make final changes and commit. It should not 
take long. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 11:07

Message:
Logged In: YES 
user_id=184124

Then i do not see any reason not committing it, we can have
discussions about it for another couple of years, but until
we have it and use it, it is just discussions. We aready
agreed on having it in the core.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 11:00

Message:
Logged In: YES 
user_id=95086

Stephen, 
You mind uploading your patch changes so I can peek into it? 
I think we should soon settle on some solution and move on. 
 
Vlad, 
The patch already does that. After 16 unsuccessfull restarts 
the watchdog exits. Between every restart we wait 1, 2, 4, 8,  
16, 32, .... 16384 seconds and then exit. This would be: 32767 
seconds or about 9 hours.  
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 07:53

Message:
Logged In: YES 
user_id=184124

I would say, repeat configured number of time and then exit.
This way all socket related problems will be cleared after
couple of restartes, config problems will make server exit .

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 07:41

Message:
Logged In: YES 
user_id=95086

Well, because I could not differentiate between config 
and later runtime errors, I thought its wiser to abort 
rather to repeatedly restart broken server. 
One possibility is to add different Ns_Fatal call like: 
 
  Ns_FatalEx(int exitcode, char *fmt, ...); 
 
Or? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 07:12

Message:
Logged In: YES 
user_id=87254


Are you sure Ns_Fatal should not restart the server?  Many
of the fatal errors are caused by bad configuration and I
can see why you might want the server to exit immediately. 
But there are also runtime fatal errors, and here I'd expect
the server to be restarted.

Some config errors are due to external factors like missing
directories or file system permissions and it would be nice
for the server to come back up as soon as the issue resolved
itself.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 02:52

Message:
Logged In: YES 
user_id=95086

Calling watchdog before or after chroot has no real 
implications I believe. 
Also, when the server exits with Ns_Fatal, it hsould not be 
restarted, I will look into your changes today. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-10 23:27

Message:
Logged In: YES 
user_id=87254


Oh yeah, and I moved the watchdog stuff eaven earlier,
before the prebind and chroot.  Hmm, now that I think about
it it's only the prebind that really needs to happen after
the watchdog is started, to ensure the sockets are always in
a sane state...



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-10 23:17

Message:
Logged In: YES 
user_id=87254


Am I reading this right?  It looks like if the server calls
Ns_Fatal() it will exit with code 1, but  0 and 1 are
treated as OK exit codes and the server will not be restarted.

I've attached a patch which changes the above, fixes the pid
problem in a different way because I completely forgot you
already posted below about that small glitch, use
Ns_ParseObjv(), adds the -w switch, and a couple of small
name changes.

I don't want to go overboard with command line switches, but
an option to specify not to give up trying to restart the
server would be nice.  If you have that, you also need to
turn off the restart timeout doubling at a certain point.  I
don't know what the cleanest way to do that is.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 13:09

Message:
Logged In: YES 
user_id=95086

This is correct. As soon as we agree on some 
implementation, I will put the -i processing 
and avoid starting watchdog for inittab starts. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 12:48

Message:
Logged In: YES 
user_id=184124


There still should be possibility to run nsd from inittab,
so when -i switch is given, no watchdog should be running,
let /sbin/init to handle restarts

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 09:19

Message:
Logged In: YES 
user_id=95086

small glich: 
After applying the patch, change nsmain.c 
from: 
    nsconf.pid = serverPid: 
to 
    nsconf.pid = getpid(); 
 
otherwise the pid file will contain the bogus 
server pid if the watchdog restarted it later. 
   

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 08:26

Message:
Logged In: YES 
user_id=95086

Another try... 
Important to note: SIGKILL (signal 9) cannot be handled 
hence if somebody kills the watchdog with SIGKILL, the 
server will be left lingering w/o the watchdog. This is  
important to know. I do not see any possibility how to 
recover in such cases (i.e. how to stop the server). 
 
Apart from that, all objections from Stephen are taken 
into account. Please try again. A new copy of watchdog.patch 
file is attached. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-07 22:20

Message:
Logged In: YES 
user_id=184124

I just found this code in my old server sources, i just
chnaged internal name to Ns/NS, used to run pretty stable.

-------------------------------------------------------------------

#define NS_EXIT 99

static void NsTerminate(int sig)
{
    printf("NS[%d] signal %d received...",getpid(),sig);
    // Kill the server with the same signal
    if(nsPid > 0) kill(nsPid,sig);

    // Exit in case of fatal signal
    if(sig != SIGHUP) {
      printf("NS[%d] terminating...",getpid());
      exit(0);
    }
    // Reassign signal handler
    signal(sig,NsTerminate);
}

void NsWatchdog(int argc, char *argv[], char *envp[])
{
    int failcount = 0;
    time_t start;
    int status;
    pid_t pid;

    signal(SIGTTOU,SIG_IGN);
    signal(SIGTTIN,SIG_IGN);
    signal(SIGTSTP,SIG_IGN);
    signal(SIGPIPE,SIG_IGN);
    signal(SIGQUIT,SIG_IGN);
    signal(SIGHUP,NsTerminate);
    signal(SIGINT,NsTerminate);
    signal(SIGTERM,NsTerminate);

    // Go background
    if((pid = fork())) {
      if(pid < 0) err_logger("warpConfigure: fork:
%s",strerror(errno));
       exit(0);
    }
    setsid();
       
    for(;;) {
               
        // Execute the real server
        nsPid = fork();
        // Child, continue as server
        if(nsPid == 0) {
          exit(nsMain(argc, argv, ServerInit));
        }
        
        /* parent, behaves like a guardian */
        time(&start);
        printf("NS[%d] server process started",getpid());
        pid = waitpid(-1, &status, 0);
        if(WIFEXITED(status))
          printf("NS[%d] child process exited with status
%d",pid,WEXITSTATUS(status));
        else
        if(WIFSIGNALED(status))
          printf("NS[%d] child process exited due to signal
%d",pid,WTERMSIG(status));
        else
          printf("NS[%d] child process exited", pid);

        // Special exit code
        if(WIFEXITED(status)) {
          if(WEXITSTATUS(status) == NS_EXIT) {
            printf("NS[%d] child configuration error,
exiting",getpid());
            exit(0);
          } else
          if(WEXITSTATUS(status) == SIGHUP) {
          } else {
            if(time(0) - start < 10) failcount++;
            if(failcount == 10) {
              printf("Exiting due to repeated, frequent
failures");
              exit(1);
            }
          }
        }
        sleep(3);
    }
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-07 21:23

Message:
Logged In: YES 
user_id=87254


NsWatchdog() is called after the server drops root privs, so
both the watchdog and the server run as the defined user. 
What happens if the server dies and on restart needs to
rebind privileged ports?

A ps listing shows two running processes, parent and child.
  If I kill either one, the watchdog dies, the server
continues to run.
  If I kill -9 the parent, the child continues to run.
  If I kill -9 the child, the server is restarted.
Something seems not quite right here...

I'm a bit confused about how the code works.  For example,
NsWatchdog() seems to ignore all of it's arguments?  Here's
the code which calls it:

if (mode == 0) {
        i = ns_fork();
        if (i < 0) {
            Ns_Fatal("nsmain: fork() failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        setsid();
        i = NsWatchdog(argc, argv, initProc);
        if (i < 0) {
            Ns_Fatal("nsmain: watchdog failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        nsconf.pid = getpid();
    }

NsWatchdog() says it returns the worker pid, it also sets
the global variable wpid.  The code above ignores the
returned value, and the global variable, and instead calls
getpid()...  The variable 'i' is existing code, but still
somehow doesn't seem suitable...

Could that Ns_Fatal() above be moved into the NsWatchdog()
function?  I think maybe some comment is needed here.  The
code structure is very like that just above where ns_fork()
is called, but this function will return *multiple* times,
right?  This is kind of suprising, and an extra twist on the
already confusing fork() semantics (or maybe it's just me
who gets confused by fork...).  A return value of 0 here is
a 'request for orderly shutdown', right?

How about some more logging to syslog?  For example,
distinguish between start and restart.  Mention when the
MAX_SLEEP_PERIOD has been reached, etc.


Couple of small things:

Can we refer to 'the server', rather than 'the worker'? 
Worker and Watchdog begin with 'w', and so does the global
variable wpid...  Maybe serverPid?

NsWatchdog is a static function, it doesn't need the Ns prefix.

In NsWatchdog(), the variable 'run' should be something like
'nretries', 'nap' should be something more like
'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS.

The comment for WaitForWorker() is misslabeled.

Should SigHandler() be called WatchdogSigHandler()?






----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-06 16:03

Message:
Logged In: YES 
user_id=184124

Looks good to me

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 05:06

Message:
Logged In: YES 
user_id=95086

Ah, correction: 
The restart option sends SIGINT to the worker process 
which causes the watchdog to restart it.  
And, the patchfile is now attached! 

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 05:03

Message:
Logged In: YES 
user_id=95086

Here is the patch. 
I have added "-restart" option to "ns_shutdown". It is rather
clumsy to parse but should do. We should rewrite this with
your args parsing routine. 
The restart option sends SIGTERM to the worker process
which causes the watchdog to restart it. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 16:46

Message:
Logged In: YES 
user_id=87254


I don't think this has to hide behind a config option.  It's
either a good idea or it's not.  Sounds good to me.  Is
there a patch?  I'm wondering about some of the
implementation details...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

[ naviserver-Feature Requests-1159307 ] Move nsext, nspd from core to modules directory

From: SourceForge.net <no...@so...> - 2005-03-17 10:53:36

Feature Requests item #1159307, was opened at 2005-03-08 12:41
Message generated for change (Comment added) made by sdeasey
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159307&group_id=130646

Category: None
Group: None
Status: Closed
Resolution: None
Priority: 5
Submitted By: Vlad Seryakov (seryakov)
Assigned to: Vlad Seryakov (seryakov)
Summary: Move nsext, nspd from core to modules directory

Initial Comment:
Last time i used those modules was when i used sybase
driver, actually those things were created for sybase
drivers only. Except that, all other DB modules uses
native access and even for sybase FreeTSD exists with
working module(i use it for SQL Server access).

No point keeping this.

----------------------------------------------------------------------

>Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-17 03:53

Message:
Logged In: YES 
user_id=87254


Oh, I forgot that cvs doesn't let you delete old
directories.  No big deal, SF removed them for us:

http://sourceforge.net/tracker/index.php?func=detail&aid=1164303&group_id=1&atid=200001




----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-13 10:17

Message:
Logged In: YES 
user_id=184124

moved

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-12 18:41

Message:
Logged In: YES 
user_id=87254


You can file a SourceForge request ticket to get them to
move stuff about, but I don't think this is neccessary.  We
have no history to preserve.

Just remove these module from their current location and
import them fresh into a modules directory.  So,
/cvsroot/naviserver/naviserver is the current location of
the server, and /cvsroot/naviserver/modules/nsext will be
the location of the nsext module.

Keeping all non-core modules in their own directory will
make it easier to check out all at once, generate a website,
etc.

I think the Makefile will need touching up to build out of core.



----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-12 17:48

Message:
Logged In: YES 
user_id=184124

We do not have contrib or modules directory, but yes, we
move it from the core. I am not sure how to do this with CVS.

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-12 17:32

Message:
Logged In: YES 
user_id=87254


If by "get rid of" you mean extract into a non-core module,
I'm all for this.  Sounds good to me.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 15:51

Message:
Logged In: YES 
user_id=95086

I have written a Tcl threading extension and part of it  
are thread shared arrays (aka nsv's) with a bind  
option to external key/value databases (like gdbm). 
So we actually store all things in thread shared arrays 
which are bound to a persistent gdbm databases on the 
filesystem. It's all transparent to the application. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 13:21

Message:
Logged In: YES 
user_id=184124

Not related to nsext, just out of curiosity, how do you keep
any state especially between multiple servers?


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 13:13

Message:
Logged In: YES 
user_id=95086

We do not use any database modules at all 
hence I do not really care.  
I do think that any important database vendor 
has thread-safe libraries nowdays so the task 
of writing an in-process driver should not be 
a problem any more. 
Besides, AFAIK, all popular databases have already 
a corresponding in-process driver implementation. 
 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1159307&group_id=130646

[ naviserver-Feature Requests-1156875 ] Watchdog process restarts failed server

From: SourceForge.net <no...@so...> - 2005-03-15 13:54:11

Feature Requests item #1156875, was opened at 2005-03-04 20:03
Message generated for change (Comment added) made by vasiljevic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Watchdog process restarts failed server

Initial Comment:
We have been using this for quite some time and it 
proved extremely useful. 
We doublefork the nsd process and make the first 
forked instance control the second. The first one 
(the watchdog) reacts on exit codes and signals 
caught during the watch and correspondingly restarts 
the second instance (the worker). 
 
Also, we have added the the "-restart" option to 
the "ns_shutdown" command. This just sends the 
SIGINT to the worker process. The watchdog is 
handling this signal and respawns the worker 
automatically. 
 
During operation, the watchdog logs events and their 
cause  into the system log file. This looks like: 
Feb 28 04:00:05 Develop nsd[19400]: worker: started. 
Mar  1 04:00:13 Develop nsd[4475]: watchdog: worker 
19400 exited (2). 
Mar  1 04:00:15 Develop nsd[21290]: worker: started. 
Mar  1 04:00:18 Develop nsd[14705]: watchdog: worker 
19399 exited (2). 
Mar  1 04:00:20 Develop nsd[21300]: worker: started. 
 
We have done all the changes with "--enable-watchdog" 
so anybody who needs this feature will have to compile 
with this option.  
 

----------------------------------------------------------------------

>Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-15 14:54

Message:
Logged In: YES 
user_id=95086

Stephen, 
why do you call _exit(0) instead of returning 
control to caller routine?  
 
Also.. 
  I don't want to go overboard with command line switches, but 
  an option to specify not to give up trying to restart the 
  server would be nice.  If you have that, you also need to 
  turn off the restart timeout doubling at a certain point.  I 
  don't know what the cleanest way to do that is. 
 
Restart time doubling is limited to 64 seconds (insane, should 
rather be 60 seconds, heh) so it should not double after 
reaching 60 secs. After that time we constantly wait 60 secs 
between restarts. 
With the server retry of 16 times and we will wait 1, 2, 4, 8,  
16, 32,  64 (i.e. total of 127) seconds and after that 8 times 64  
secs (= 512 total ) secs and eventually abort.  
All together, that will be 640 secs = about 10 minutes 
wait/retry. After that it will exit. I think if we extend retry count 
from 16 to 256 it will be pretty enough (about 4 hours) where 
one can fix the failure cause. Otherwise, I see syntactic 
problems of how to actually specify the retries parameter. 
Of curse, you can do something like 
  bin/nsd -w -r 16 
 
but the "-r"  (restart) would really have no meaning w/o -w 
and I do not like this very much indeed. I'm really for bumping 
the retry count higher (127 or 256 tries). 
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-15 01:50

Message:
Logged In: YES 
user_id=184124

Last time addition:
Can we make syslog function public, Ns_Syslog?
Also Tcl interface to it would be usefull as well, below is
syslog from my nssys module, just a suggestion, no rush with
this:

static struct syslog_data {
   int opened;
   int facility;
   int options;
   char ident[32];
   Tcl_HashTable *priorities;
   Tcl_HashTable *facilities;
} syslog_data;

/* Syslog initialization */
void SyslogInit()
{
   memset(&syslog_data,0,sizeof(syslog_data));
   syslog_data.options = LOG_PID;
   syslog_data.facility = LOG_USER;
   if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY)))
    
strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1);
   syslog_data.facilities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS);
   AddEntry(syslog_data.facilities,"auth",LOG_AUTH);
   AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV);
   AddEntry(syslog_data.facilities,"cron",LOG_CRON);
   AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON);
   AddEntry(syslog_data.facilities,"kernel",LOG_KERN);
   AddEntry(syslog_data.facilities,"lpr",LOG_LPR);
   AddEntry(syslog_data.facilities,"mail",LOG_MAIL);
   AddEntry(syslog_data.facilities,"news",LOG_NEWS);
   AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG);
   AddEntry(syslog_data.facilities,"user",LOG_USER);
   AddEntry(syslog_data.facilities,"uucp",LOG_UUCP);
   AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0);
   AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1);
   AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2);
   AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3);
   AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4);
   AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5);
   AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6);
   AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7);
   syslog_data.priorities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS);
   AddEntry(syslog_data.priorities,"emerg",LOG_EMERG);
   AddEntry(syslog_data.priorities,"alert",LOG_ALERT);
   AddEntry(syslog_data.priorities,"crit",LOG_CRIT);
   AddEntry(syslog_data.priorities,"err",LOG_ERR);
   AddEntry(syslog_data.priorities,"error",LOG_ERR);
   AddEntry(syslog_data.priorities,"warning",LOG_WARNING);
   AddEntry(syslog_data.priorities,"notice",LOG_NOTICE);
   AddEntry(syslog_data.priorities,"info",LOG_INFO);
   AddEntry(syslog_data.priorities,"debug",LOG_DEBUG);
}

/*
 * sys_log
 *
 * Usage:
 *   sys_log ?-facility f -options o -ident i? priority message
 *
 *  facility    - kernel, cron, authpriv, mail, local0,
local1, daemon, local2,
 *                news, local3, local4, local5, local6,
syslog, local7, auth, uucp, lpr, user
 *  options     - list with any of { CONS NDELAY PERROR PID
ODELAY NOWAIT }
 *  ident       - ident is prepended to every message, and
is typically the program name
 *  priority    - info, alert, emerg, err, notice, warning,
error, crit, debug
 *
 * Returns:
 *   nothing
 */
static int SysLog(ClientData data,Tcl_Interp *interp,int
argc,char **argv)
{
    int i,priority = 0;
    Tcl_HashEntry *entry;
    
    if(argc < 1) {
      Tcl_AppendResult(interp,argv[0]," ?-facility f
-options o -ident i? priority message",0);
      return TCL_ERROR;
    }
    for(i = 1;i < argc-1;) {
      if(!strcmp(argv[i],"-facility")) {
        entry =
Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]);
        if(!entry) {
          Tcl_AppendResult(interp,"wrong facility
",argv[i+1],": available facilities: ",NULL);
          ListHash(interp,syslog_data.facilities);
          return TCL_ERROR;
        }
        syslog_data.facility = (int)Tcl_GetHashValue(entry);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-options")) {
        syslog_data.options = 0;
        if(strstr(argv[i+1],"CONS")) syslog_data.options |=
LOG_CONS;
        if(strstr(argv[i+1],"NDELAY")) syslog_data.options
|= LOG_NDELAY;
        if(strstr(argv[i+1],"PERROR")) syslog_data.options
|= LOG_PERROR;
        if(strstr(argv[i+1],"PID")) syslog_data.options |=
LOG_PID;
        if(strstr(argv[i+1],"ODELAY")) syslog_data.options
|= LOG_ODELAY;
        if(strstr(argv[i+1],"NOWAIT")) syslog_data.options
|= LOG_NOWAIT;
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-ident")) {
        memset(syslog_data.ident,0,sizeof(syslog_data.ident));
       
strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      break;
    }
    if(i < argc) {
      entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]);
      if(!entry) {
        Tcl_AppendResult(interp,"wrong level ",argv[i],":
available levels: ",NULL);
        ListHash(interp,syslog_data.priorities);
        return TCL_ERROR;
      }
      priority = (int)Tcl_GetHashValue(entry);
      i++;
    }
    if(i < argc) {
      if(!syslog_data.opened) {
       
openlog(syslog_data.ident,syslog_data.options,syslog_data.facility);
        syslog_data.opened = 1;
      }
      syslog(priority,argv[i]);
      return TCL_OK;
    }
    return TCL_OK;
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-15 00:44

Message:
Logged In: YES 
user_id=87254


Oops, looks like I forgot to attach the patch...  Here it is.

Yes, the server will attempt to restart a number of times
before giving up.  I think this is as appropriate for config
errors as it is for others, then there doesn't need to be a
dual-standard fatal error.


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 19:12

Message:
Logged In: YES 
user_id=95086

I'd rather wait for Stephens code and see what's done there 
and then make final changes and commit. It should not 
take long. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 19:07

Message:
Logged In: YES 
user_id=184124

Then i do not see any reason not committing it, we can have
discussions about it for another couple of years, but until
we have it and use it, it is just discussions. We aready
agreed on having it in the core.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 19:00

Message:
Logged In: YES 
user_id=95086

Stephen, 
You mind uploading your patch changes so I can peek into it? 
I think we should soon settle on some solution and move on. 
 
Vlad, 
The patch already does that. After 16 unsuccessfull restarts 
the watchdog exits. Between every restart we wait 1, 2, 4, 8,  
16, 32, .... 16384 seconds and then exit. This would be: 32767 
seconds or about 9 hours.  
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 15:53

Message:
Logged In: YES 
user_id=184124

I would say, repeat configured number of time and then exit.
This way all socket related problems will be cleared after
couple of restartes, config problems will make server exit .

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 15:41

Message:
Logged In: YES 
user_id=95086

Well, because I could not differentiate between config 
and later runtime errors, I thought its wiser to abort 
rather to repeatedly restart broken server. 
One possibility is to add different Ns_Fatal call like: 
 
  Ns_FatalEx(int exitcode, char *fmt, ...); 
 
Or? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 15:12

Message:
Logged In: YES 
user_id=87254


Are you sure Ns_Fatal should not restart the server?  Many
of the fatal errors are caused by bad configuration and I
can see why you might want the server to exit immediately. 
But there are also runtime fatal errors, and here I'd expect
the server to be restarted.

Some config errors are due to external factors like missing
directories or file system permissions and it would be nice
for the server to come back up as soon as the issue resolved
itself.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 10:52

Message:
Logged In: YES 
user_id=95086

Calling watchdog before or after chroot has no real 
implications I believe. 
Also, when the server exits with Ns_Fatal, it hsould not be 
restarted, I will look into your changes today. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 07:27

Message:
Logged In: YES 
user_id=87254


Oh yeah, and I moved the watchdog stuff eaven earlier,
before the prebind and chroot.  Hmm, now that I think about
it it's only the prebind that really needs to happen after
the watchdog is started, to ensure the sockets are always in
a sane state...



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 07:17

Message:
Logged In: YES 
user_id=87254


Am I reading this right?  It looks like if the server calls
Ns_Fatal() it will exit with code 1, but  0 and 1 are
treated as OK exit codes and the server will not be restarted.

I've attached a patch which changes the above, fixes the pid
problem in a different way because I completely forgot you
already posted below about that small glitch, use
Ns_ParseObjv(), adds the -w switch, and a couple of small
name changes.

I don't want to go overboard with command line switches, but
an option to specify not to give up trying to restart the
server would be nice.  If you have that, you also need to
turn off the restart timeout doubling at a certain point.  I
don't know what the cleanest way to do that is.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 21:09

Message:
Logged In: YES 
user_id=95086

This is correct. As soon as we agree on some 
implementation, I will put the -i processing 
and avoid starting watchdog for inittab starts. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 20:48

Message:
Logged In: YES 
user_id=184124


There still should be possibility to run nsd from inittab,
so when -i switch is given, no watchdog should be running,
let /sbin/init to handle restarts

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 17:19

Message:
Logged In: YES 
user_id=95086

small glich: 
After applying the patch, change nsmain.c 
from: 
    nsconf.pid = serverPid: 
to 
    nsconf.pid = getpid(); 
 
otherwise the pid file will contain the bogus 
server pid if the watchdog restarted it later. 
   

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 16:26

Message:
Logged In: YES 
user_id=95086

Another try... 
Important to note: SIGKILL (signal 9) cannot be handled 
hence if somebody kills the watchdog with SIGKILL, the 
server will be left lingering w/o the watchdog. This is  
important to know. I do not see any possibility how to 
recover in such cases (i.e. how to stop the server). 
 
Apart from that, all objections from Stephen are taken 
into account. Please try again. A new copy of watchdog.patch 
file is attached. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 06:20

Message:
Logged In: YES 
user_id=184124

I just found this code in my old server sources, i just
chnaged internal name to Ns/NS, used to run pretty stable.

-------------------------------------------------------------------

#define NS_EXIT 99

static void NsTerminate(int sig)
{
    printf("NS[%d] signal %d received...",getpid(),sig);
    // Kill the server with the same signal
    if(nsPid > 0) kill(nsPid,sig);

    // Exit in case of fatal signal
    if(sig != SIGHUP) {
      printf("NS[%d] terminating...",getpid());
      exit(0);
    }
    // Reassign signal handler
    signal(sig,NsTerminate);
}

void NsWatchdog(int argc, char *argv[], char *envp[])
{
    int failcount = 0;
    time_t start;
    int status;
    pid_t pid;

    signal(SIGTTOU,SIG_IGN);
    signal(SIGTTIN,SIG_IGN);
    signal(SIGTSTP,SIG_IGN);
    signal(SIGPIPE,SIG_IGN);
    signal(SIGQUIT,SIG_IGN);
    signal(SIGHUP,NsTerminate);
    signal(SIGINT,NsTerminate);
    signal(SIGTERM,NsTerminate);

    // Go background
    if((pid = fork())) {
      if(pid < 0) err_logger("warpConfigure: fork:
%s",strerror(errno));
       exit(0);
    }
    setsid();
       
    for(;;) {
               
        // Execute the real server
        nsPid = fork();
        // Child, continue as server
        if(nsPid == 0) {
          exit(nsMain(argc, argv, ServerInit));
        }
        
        /* parent, behaves like a guardian */
        time(&start);
        printf("NS[%d] server process started",getpid());
        pid = waitpid(-1, &status, 0);
        if(WIFEXITED(status))
          printf("NS[%d] child process exited with status
%d",pid,WEXITSTATUS(status));
        else
        if(WIFSIGNALED(status))
          printf("NS[%d] child process exited due to signal
%d",pid,WTERMSIG(status));
        else
          printf("NS[%d] child process exited", pid);

        // Special exit code
        if(WIFEXITED(status)) {
          if(WEXITSTATUS(status) == NS_EXIT) {
            printf("NS[%d] child configuration error,
exiting",getpid());
            exit(0);
          } else
          if(WEXITSTATUS(status) == SIGHUP) {
          } else {
            if(time(0) - start < 10) failcount++;
            if(failcount == 10) {
              printf("Exiting due to repeated, frequent
failures");
              exit(1);
            }
          }
        }
        sleep(3);
    }
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-08 05:23

Message:
Logged In: YES 
user_id=87254


NsWatchdog() is called after the server drops root privs, so
both the watchdog and the server run as the defined user. 
What happens if the server dies and on restart needs to
rebind privileged ports?

A ps listing shows two running processes, parent and child.
  If I kill either one, the watchdog dies, the server
continues to run.
  If I kill -9 the parent, the child continues to run.
  If I kill -9 the child, the server is restarted.
Something seems not quite right here...

I'm a bit confused about how the code works.  For example,
NsWatchdog() seems to ignore all of it's arguments?  Here's
the code which calls it:

if (mode == 0) {
        i = ns_fork();
        if (i < 0) {
            Ns_Fatal("nsmain: fork() failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        setsid();
        i = NsWatchdog(argc, argv, initProc);
        if (i < 0) {
            Ns_Fatal("nsmain: watchdog failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        nsconf.pid = getpid();
    }

NsWatchdog() says it returns the worker pid, it also sets
the global variable wpid.  The code above ignores the
returned value, and the global variable, and instead calls
getpid()...  The variable 'i' is existing code, but still
somehow doesn't seem suitable...

Could that Ns_Fatal() above be moved into the NsWatchdog()
function?  I think maybe some comment is needed here.  The
code structure is very like that just above where ns_fork()
is called, but this function will return *multiple* times,
right?  This is kind of suprising, and an extra twist on the
already confusing fork() semantics (or maybe it's just me
who gets confused by fork...).  A return value of 0 here is
a 'request for orderly shutdown', right?

How about some more logging to syslog?  For example,
distinguish between start and restart.  Mention when the
MAX_SLEEP_PERIOD has been reached, etc.


Couple of small things:

Can we refer to 'the server', rather than 'the worker'? 
Worker and Watchdog begin with 'w', and so does the global
variable wpid...  Maybe serverPid?

NsWatchdog is a static function, it doesn't need the Ns prefix.

In NsWatchdog(), the variable 'run' should be something like
'nretries', 'nap' should be something more like
'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS.

The comment for WaitForWorker() is misslabeled.

Should SigHandler() be called WatchdogSigHandler()?






----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-07 00:03

Message:
Logged In: YES 
user_id=184124

Looks good to me

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 13:06

Message:
Logged In: YES 
user_id=95086

Ah, correction: 
The restart option sends SIGINT to the worker process 
which causes the watchdog to restart it.  
And, the patchfile is now attached! 

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 13:03

Message:
Logged In: YES 
user_id=95086

Here is the patch. 
I have added "-restart" option to "ns_shutdown". It is rather
clumsy to parse but should do. We should rewrite this with
your args parsing routine. 
The restart option sends SIGTERM to the worker process
which causes the watchdog to restart it. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-05 00:46

Message:
Logged In: YES 
user_id=87254


I don't think this has to hide behind a config option.  It's
either a good idea or it's not.  Sounds good to me.  Is
there a patch?  I'm wondering about some of the
implementation details...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

[ naviserver-Feature Requests-1156875 ] Watchdog process restarts failed server

From: SourceForge.net <no...@so...> - 2005-03-15 00:50:39

Feature Requests item #1156875, was opened at 2005-03-04 19:03
Message generated for change (Comment added) made by seryakov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Zoran Vasiljevic (vasiljevic)
Assigned to: Zoran Vasiljevic (vasiljevic)
Summary: Watchdog process restarts failed server

Initial Comment:
We have been using this for quite some time and it 
proved extremely useful. 
We doublefork the nsd process and make the first 
forked instance control the second. The first one 
(the watchdog) reacts on exit codes and signals 
caught during the watch and correspondingly restarts 
the second instance (the worker). 
 
Also, we have added the the "-restart" option to 
the "ns_shutdown" command. This just sends the 
SIGINT to the worker process. The watchdog is 
handling this signal and respawns the worker 
automatically. 
 
During operation, the watchdog logs events and their 
cause  into the system log file. This looks like: 
Feb 28 04:00:05 Develop nsd[19400]: worker: started. 
Mar  1 04:00:13 Develop nsd[4475]: watchdog: worker 
19400 exited (2). 
Mar  1 04:00:15 Develop nsd[21290]: worker: started. 
Mar  1 04:00:18 Develop nsd[14705]: watchdog: worker 
19399 exited (2). 
Mar  1 04:00:20 Develop nsd[21300]: worker: started. 
 
We have done all the changes with "--enable-watchdog" 
so anybody who needs this feature will have to compile 
with this option.  
 

----------------------------------------------------------------------

>Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-15 00:50

Message:
Logged In: YES 
user_id=184124

Last time addition:
Can we make syslog function public, Ns_Syslog?
Also Tcl interface to it would be usefull as well, below is
syslog from my nssys module, just a suggestion, no rush with
this:

static struct syslog_data {
   int opened;
   int facility;
   int options;
   char ident[32];
   Tcl_HashTable *priorities;
   Tcl_HashTable *facilities;
} syslog_data;

/* Syslog initialization */
void SyslogInit()
{
   memset(&syslog_data,0,sizeof(syslog_data));
   syslog_data.options = LOG_PID;
   syslog_data.facility = LOG_USER;
   if((argv0 = Tcl_GetVar(interp,"argv0",TCL_GLOBAL_ONLY)))
    
strncpy(syslog_data.ident,argv0,sizeof(syslog_data.ident)-1);
   syslog_data.facilities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.facilities,TCL_STRING_KEYS);
   AddEntry(syslog_data.facilities,"auth",LOG_AUTH);
   AddEntry(syslog_data.facilities,"authpriv",LOG_AUTHPRIV);
   AddEntry(syslog_data.facilities,"cron",LOG_CRON);
   AddEntry(syslog_data.facilities,"daemon",LOG_DAEMON);
   AddEntry(syslog_data.facilities,"kernel",LOG_KERN);
   AddEntry(syslog_data.facilities,"lpr",LOG_LPR);
   AddEntry(syslog_data.facilities,"mail",LOG_MAIL);
   AddEntry(syslog_data.facilities,"news",LOG_NEWS);
   AddEntry(syslog_data.facilities,"syslog",LOG_SYSLOG);
   AddEntry(syslog_data.facilities,"user",LOG_USER);
   AddEntry(syslog_data.facilities,"uucp",LOG_UUCP);
   AddEntry(syslog_data.facilities,"local0",LOG_LOCAL0);
   AddEntry(syslog_data.facilities,"local1",LOG_LOCAL1);
   AddEntry(syslog_data.facilities,"local2",LOG_LOCAL2);
   AddEntry(syslog_data.facilities,"local3",LOG_LOCAL3);
   AddEntry(syslog_data.facilities,"local4",LOG_LOCAL4);
   AddEntry(syslog_data.facilities,"local5",LOG_LOCAL5);
   AddEntry(syslog_data.facilities,"local6",LOG_LOCAL6);
   AddEntry(syslog_data.facilities,"local7",LOG_LOCAL7);
   syslog_data.priorities = (Tcl_HashTable
*)Tcl_Alloc(sizeof(Tcl_HashTable));
   Tcl_InitHashTable(syslog_data.priorities,TCL_STRING_KEYS);
   AddEntry(syslog_data.priorities,"emerg",LOG_EMERG);
   AddEntry(syslog_data.priorities,"alert",LOG_ALERT);
   AddEntry(syslog_data.priorities,"crit",LOG_CRIT);
   AddEntry(syslog_data.priorities,"err",LOG_ERR);
   AddEntry(syslog_data.priorities,"error",LOG_ERR);
   AddEntry(syslog_data.priorities,"warning",LOG_WARNING);
   AddEntry(syslog_data.priorities,"notice",LOG_NOTICE);
   AddEntry(syslog_data.priorities,"info",LOG_INFO);
   AddEntry(syslog_data.priorities,"debug",LOG_DEBUG);
}

/*
 * sys_log
 *
 * Usage:
 *   sys_log ?-facility f -options o -ident i? priority message
 *
 *  facility    - kernel, cron, authpriv, mail, local0,
local1, daemon, local2,
 *                news, local3, local4, local5, local6,
syslog, local7, auth, uucp, lpr, user
 *  options     - list with any of { CONS NDELAY PERROR PID
ODELAY NOWAIT }
 *  ident       - ident is prepended to every message, and
is typically the program name
 *  priority    - info, alert, emerg, err, notice, warning,
error, crit, debug
 *
 * Returns:
 *   nothing
 */
static int SysLog(ClientData data,Tcl_Interp *interp,int
argc,char **argv)
{
    int i,priority = 0;
    Tcl_HashEntry *entry;
    
    if(argc < 1) {
      Tcl_AppendResult(interp,argv[0]," ?-facility f
-options o -ident i? priority message",0);
      return TCL_ERROR;
    }
    for(i = 1;i < argc-1;) {
      if(!strcmp(argv[i],"-facility")) {
        entry =
Tcl_FindHashEntry(syslog_data.facilities,argv[i+1]);
        if(!entry) {
          Tcl_AppendResult(interp,"wrong facility
",argv[i+1],": available facilities: ",NULL);
          ListHash(interp,syslog_data.facilities);
          return TCL_ERROR;
        }
        syslog_data.facility = (int)Tcl_GetHashValue(entry);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-options")) {
        syslog_data.options = 0;
        if(strstr(argv[i+1],"CONS")) syslog_data.options |=
LOG_CONS;
        if(strstr(argv[i+1],"NDELAY")) syslog_data.options
|= LOG_NDELAY;
        if(strstr(argv[i+1],"PERROR")) syslog_data.options
|= LOG_PERROR;
        if(strstr(argv[i+1],"PID")) syslog_data.options |=
LOG_PID;
        if(strstr(argv[i+1],"ODELAY")) syslog_data.options
|= LOG_ODELAY;
        if(strstr(argv[i+1],"NOWAIT")) syslog_data.options
|= LOG_NOWAIT;
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      if(!strcmp(argv[i],"-ident")) {
        memset(syslog_data.ident,0,sizeof(syslog_data.ident));
       
strncpy(syslog_data.ident,argv[i+1],sizeof(syslog_data.ident)-1);
        closelog();
        syslog_data.opened = 0;
        i += 2;
        continue;
      }
      break;
    }
    if(i < argc) {
      entry = Tcl_FindHashEntry(syslog_data.priorities,argv[i]);
      if(!entry) {
        Tcl_AppendResult(interp,"wrong level ",argv[i],":
available levels: ",NULL);
        ListHash(interp,syslog_data.priorities);
        return TCL_ERROR;
      }
      priority = (int)Tcl_GetHashValue(entry);
      i++;
    }
    if(i < argc) {
      if(!syslog_data.opened) {
       
openlog(syslog_data.ident,syslog_data.options,syslog_data.facility);
        syslog_data.opened = 1;
      }
      syslog(priority,argv[i]);
      return TCL_OK;
    }
    return TCL_OK;
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 23:44

Message:
Logged In: YES 
user_id=87254


Oops, looks like I forgot to attach the patch...  Here it is.

Yes, the server will attempt to restart a number of times
before giving up.  I think this is as appropriate for config
errors as it is for others, then there doesn't need to be a
dual-standard fatal error.


----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 18:12

Message:
Logged In: YES 
user_id=95086

I'd rather wait for Stephens code and see what's done there 
and then make final changes and commit. It should not 
take long. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 18:07

Message:
Logged In: YES 
user_id=184124

Then i do not see any reason not committing it, we can have
discussions about it for another couple of years, but until
we have it and use it, it is just discussions. We aready
agreed on having it in the core.

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 18:00

Message:
Logged In: YES 
user_id=95086

Stephen, 
You mind uploading your patch changes so I can peek into it? 
I think we should soon settle on some solution and move on. 
 
Vlad, 
The patch already does that. After 16 unsuccessfull restarts 
the watchdog exits. Between every restart we wait 1, 2, 4, 8,  
16, 32, .... 16384 seconds and then exit. This would be: 32767 
seconds or about 9 hours.  
 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-14 14:53

Message:
Logged In: YES 
user_id=184124

I would say, repeat configured number of time and then exit.
This way all socket related problems will be cleared after
couple of restartes, config problems will make server exit .

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 14:41

Message:
Logged In: YES 
user_id=95086

Well, because I could not differentiate between config 
and later runtime errors, I thought its wiser to abort 
rather to repeatedly restart broken server. 
One possibility is to add different Ns_Fatal call like: 
 
  Ns_FatalEx(int exitcode, char *fmt, ...); 
 
Or? 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-14 14:12

Message:
Logged In: YES 
user_id=87254


Are you sure Ns_Fatal should not restart the server?  Many
of the fatal errors are caused by bad configuration and I
can see why you might want the server to exit immediately. 
But there are also runtime fatal errors, and here I'd expect
the server to be restarted.

Some config errors are due to external factors like missing
directories or file system permissions and it would be nice
for the server to come back up as soon as the issue resolved
itself.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-14 09:52

Message:
Logged In: YES 
user_id=95086

Calling watchdog before or after chroot has no real 
implications I believe. 
Also, when the server exits with Ns_Fatal, it hsould not be 
restarted, I will look into your changes today. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 06:27

Message:
Logged In: YES 
user_id=87254


Oh yeah, and I moved the watchdog stuff eaven earlier,
before the prebind and chroot.  Hmm, now that I think about
it it's only the prebind that really needs to happen after
the watchdog is started, to ensure the sockets are always in
a sane state...



----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-11 06:17

Message:
Logged In: YES 
user_id=87254


Am I reading this right?  It looks like if the server calls
Ns_Fatal() it will exit with code 1, but  0 and 1 are
treated as OK exit codes and the server will not be restarted.

I've attached a patch which changes the above, fixes the pid
problem in a different way because I completely forgot you
already posted below about that small glitch, use
Ns_ParseObjv(), adds the -w switch, and a couple of small
name changes.

I don't want to go overboard with command line switches, but
an option to specify not to give up trying to restart the
server would be nice.  If you have that, you also need to
turn off the restart timeout doubling at a certain point.  I
don't know what the cleanest way to do that is.



----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 20:09

Message:
Logged In: YES 
user_id=95086

This is correct. As soon as we agree on some 
implementation, I will put the -i processing 
and avoid starting watchdog for inittab starts. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 19:48

Message:
Logged In: YES 
user_id=184124


There still should be possibility to run nsd from inittab,
so when -i switch is given, no watchdog should be running,
let /sbin/init to handle restarts

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 16:19

Message:
Logged In: YES 
user_id=95086

small glich: 
After applying the patch, change nsmain.c 
from: 
    nsconf.pid = serverPid: 
to 
    nsconf.pid = getpid(); 
 
otherwise the pid file will contain the bogus 
server pid if the watchdog restarted it later. 
   

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-08 15:26

Message:
Logged In: YES 
user_id=95086

Another try... 
Important to note: SIGKILL (signal 9) cannot be handled 
hence if somebody kills the watchdog with SIGKILL, the 
server will be left lingering w/o the watchdog. This is  
important to know. I do not see any possibility how to 
recover in such cases (i.e. how to stop the server). 
 
Apart from that, all objections from Stephen are taken 
into account. Please try again. A new copy of watchdog.patch 
file is attached. 

----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-08 05:20

Message:
Logged In: YES 
user_id=184124

I just found this code in my old server sources, i just
chnaged internal name to Ns/NS, used to run pretty stable.

-------------------------------------------------------------------

#define NS_EXIT 99

static void NsTerminate(int sig)
{
    printf("NS[%d] signal %d received...",getpid(),sig);
    // Kill the server with the same signal
    if(nsPid > 0) kill(nsPid,sig);

    // Exit in case of fatal signal
    if(sig != SIGHUP) {
      printf("NS[%d] terminating...",getpid());
      exit(0);
    }
    // Reassign signal handler
    signal(sig,NsTerminate);
}

void NsWatchdog(int argc, char *argv[], char *envp[])
{
    int failcount = 0;
    time_t start;
    int status;
    pid_t pid;

    signal(SIGTTOU,SIG_IGN);
    signal(SIGTTIN,SIG_IGN);
    signal(SIGTSTP,SIG_IGN);
    signal(SIGPIPE,SIG_IGN);
    signal(SIGQUIT,SIG_IGN);
    signal(SIGHUP,NsTerminate);
    signal(SIGINT,NsTerminate);
    signal(SIGTERM,NsTerminate);

    // Go background
    if((pid = fork())) {
      if(pid < 0) err_logger("warpConfigure: fork:
%s",strerror(errno));
       exit(0);
    }
    setsid();
       
    for(;;) {
               
        // Execute the real server
        nsPid = fork();
        // Child, continue as server
        if(nsPid == 0) {
          exit(nsMain(argc, argv, ServerInit));
        }
        
        /* parent, behaves like a guardian */
        time(&start);
        printf("NS[%d] server process started",getpid());
        pid = waitpid(-1, &status, 0);
        if(WIFEXITED(status))
          printf("NS[%d] child process exited with status
%d",pid,WEXITSTATUS(status));
        else
        if(WIFSIGNALED(status))
          printf("NS[%d] child process exited due to signal
%d",pid,WTERMSIG(status));
        else
          printf("NS[%d] child process exited", pid);

        // Special exit code
        if(WIFEXITED(status)) {
          if(WEXITSTATUS(status) == NS_EXIT) {
            printf("NS[%d] child configuration error,
exiting",getpid());
            exit(0);
          } else
          if(WEXITSTATUS(status) == SIGHUP) {
          } else {
            if(time(0) - start < 10) failcount++;
            if(failcount == 10) {
              printf("Exiting due to repeated, frequent
failures");
              exit(1);
            }
          }
        }
        sleep(3);
    }
}


----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-08 04:23

Message:
Logged In: YES 
user_id=87254


NsWatchdog() is called after the server drops root privs, so
both the watchdog and the server run as the defined user. 
What happens if the server dies and on restart needs to
rebind privileged ports?

A ps listing shows two running processes, parent and child.
  If I kill either one, the watchdog dies, the server
continues to run.
  If I kill -9 the parent, the child continues to run.
  If I kill -9 the child, the server is restarted.
Something seems not quite right here...

I'm a bit confused about how the code works.  For example,
NsWatchdog() seems to ignore all of it's arguments?  Here's
the code which calls it:

if (mode == 0) {
        i = ns_fork();
        if (i < 0) {
            Ns_Fatal("nsmain: fork() failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        setsid();
        i = NsWatchdog(argc, argv, initProc);
        if (i < 0) {
            Ns_Fatal("nsmain: watchdog failed: '%s'",
strerror(errno));
        }
        if (i > 0) {
            return 0;
        }
        nsconf.pid = getpid();
    }

NsWatchdog() says it returns the worker pid, it also sets
the global variable wpid.  The code above ignores the
returned value, and the global variable, and instead calls
getpid()...  The variable 'i' is existing code, but still
somehow doesn't seem suitable...

Could that Ns_Fatal() above be moved into the NsWatchdog()
function?  I think maybe some comment is needed here.  The
code structure is very like that just above where ns_fork()
is called, but this function will return *multiple* times,
right?  This is kind of suprising, and an extra twist on the
already confusing fork() semantics (or maybe it's just me
who gets confused by fork...).  A return value of 0 here is
a 'request for orderly shutdown', right?

How about some more logging to syslog?  For example,
distinguish between start and restart.  Mention when the
MAX_SLEEP_PERIOD has been reached, etc.


Couple of small things:

Can we refer to 'the server', rather than 'the worker'? 
Worker and Watchdog begin with 'w', and so does the global
variable wpid...  Maybe serverPid?

NsWatchdog is a static function, it doesn't need the Ns prefix.

In NsWatchdog(), the variable 'run' should be something like
'nretries', 'nap' should be something more like
'retrySeconds' and MAX_SLEEP_PERIOD should be MAX_RETRY_SECONDS.

The comment for WaitForWorker() is misslabeled.

Should SigHandler() be called WatchdogSigHandler()?






----------------------------------------------------------------------

Comment By: Vlad Seryakov (seryakov)
Date: 2005-03-06 23:03

Message:
Logged In: YES 
user_id=184124

Looks good to me

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 12:06

Message:
Logged In: YES 
user_id=95086

Ah, correction: 
The restart option sends SIGINT to the worker process 
which causes the watchdog to restart it.  
And, the patchfile is now attached! 

----------------------------------------------------------------------

Comment By: Zoran Vasiljevic (vasiljevic)
Date: 2005-03-05 12:03

Message:
Logged In: YES 
user_id=95086

Here is the patch. 
I have added "-restart" option to "ns_shutdown". It is rather
clumsy to parse but should do. We should rewrite this with
your args parsing routine. 
The restart option sends SIGTERM to the worker process
which causes the watchdog to restart it. 

----------------------------------------------------------------------

Comment By: Stephen Deasey (sdeasey)
Date: 2005-03-04 23:46

Message:
Logged In: YES 
user_id=87254


I don't think this has to hide behind a config option.  It's
either a good idea or it's not.  Sounds good to me.  Is
there a patch?  I'm wondering about some of the
implementation details...



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=719009&aid=1156875&group_id=130646

Flat | Threaded

<< < 1 .. 7 8 9 10 11 .. 16 > >> (Page 9 of 16)

2005	Jan	Feb (61)	Mar (153)	Apr (39)	May (10)	Jun (15)	Jul (15)	Aug (2)	Sep	Oct (17)	Nov (2)	Dec (13)
2006	Jan (18)	Feb (9)	Mar	Apr	May	Jun	Jul (7)	Aug (1)	Sep (2)	Oct	Nov (1)	Dec
2007	Jan (8)	Feb (3)	Mar	Apr	May (2)	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb	Mar	Apr (6)	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec