Hi,
when sending the PUBLISH messages to self, using:
modparam("pua_dialoginfo", "presence_server", "sip:a.b.c.d:5060") # send PUBLISH messages to self
I get a deadlock of the entire opensips daemon.
If I send the messages to somewhere else, the deadlock does not occur.
If I release the locks as per the attached patch, the deadlock doesn't occur either.
But according to the comments in the code, the lock is not meant to be freed just yet. So I'm not sure what my patch breaks instead. (I'm not even sure I'm supposed to publish to self, but it seems like it, as I want to generate notifies based on the state changes found in the publish.)
I'm running 1.6-svn (r6568)
Regards,
Walter Doekes
OSSO B.V.
Patch to work around deadlock
Hi Walter,
That is not the right fix, because that lock must actually be kept until the reply is received. I want to reproduce this myself to be able to investigate. I don't understand exactly which is the case when you get deadlock. You subscribe for dialog event to yoursefl? Is this it?
Regards,
Anca
Hi Anca,
Thanks for your quick reply.
I've tried to reduce/declutter my config file to make it easily reproducable, but during this reduction the problem goes away. I can hardly give you the full config file as it's a bit of a complex hack, and incomplete at that ;)
Yes, what I want to do is implement BLF on OpenSIPS. To do this, I do basically this:
route {
if (uri == myself) {
if ($si == "e.f.g.h") { # my IP
if (method == "PUBLISH") {
handle_publish("sip:myself@myself.myself");
exit;
}
}
if (method == "SUBSCRIBE") {
handle_subscribe();
exit;
}
if (method == "INVITE") {
dialoginfo_set();
record_route();
}
if ($si == "a.b.c.d") {
$var(local) = "sip:" + $(hdr(X-PIDstAccount)[-1]) + "@anydomain";
lookup("opensips_location", "", "$var(local)");
} else {
$rd = "a.b.c.d";
}
t_relay();
}
}
(Add a bit of nat handling, registration handling and transaction handling.)
Modules loaded are:
loadmodule "xlog.so" # logging (xlog)
loadmodule "sl.so" # stateless functions (sl_*)
loadmodule "tm.so" # t_*: transactions in memory
loadmodule "signaling.so" # send reply according to state
loadmodule "rr.so" # record route
loadmodule "nathelper.so" # fix_nated_*()
loadmodule "textops.so" # append_hf
loadmodule "uri.so" # has_totag
loadmodule "db_mysql.so" # mysql
loadmodule "auth.so" # digest auth
loadmodule "auth_db.so" # digest auth
loadmodule "group.so" # groups (for acls)
loadmodule "permissions.so" # permissions (provide acls together with groups)
loadmodule "usrloc.so" # user location
loadmodule "registrar.so" # lookup/save/registered
loadmodule "dialog.so" # ...
loadmodule "presence.so" # handle SUBSCRIBE events
loadmodule "presence_dialoginfo.so" # handle SUBSCRIBE events for dialoginfo
loadmodule "presence_xml.so" # handle SUBSCRIBE events for dialoginfo
loadmodule "pua.so" # ...
loadmodule "pua_dialoginfo.so" # ...
...
modparam("pua_dialoginfo", "presence_server", "sip:e.f.g.h:5060") # send PUBLISH messages to self
Now, it's quite possible that I'm doing things wrong. My grandstream test phone has not answered at all to the opensips NOTIFY's sent by handle_publish().
I'll try re-adding and reorganising my config file to get back the complete behaviour I want (with or without the deadlock). In the mean time you can consider this report INVALID/WORKSFORME and I can file a new one if the problem re-appears.
Regards,
Walter
Furthermore, I can add that I used children=1.
(Having children as 1 also led me to believe that setting $var()s in startup_route could be used as process-wide constants, which they cannot. This should clear up some more of the odd issues I was having. How would you feel about a process_startup_route? I cannot believe I'm the only one who likes constants at the top of the file.)
Hi Walter,
The reason you get a deadlock is exactly that you use only one child. The process might get blocked while trying to send a new Publish request and there is no one to handle the reply for the previous one and release the lock. So, it is compulsory to have more children when using pua module.
Regards,
Anca
Okay, thank you for that information.
Isn't this something that should be documented? Or is it already and have I simply missed it?
Regards,
Walter
Thanks you, Walter. I updated the documentation.
Regards,
Anca