From: Koichi S. <koi...@gm...> - 2013-02-21 10:28:44
|
Hello, I found that "select 1" does now work to detect datanode/coordinator crash correctly when gtm/gtm_proxy crashes. When gtm/gtm_proxy crashes, "select 1" returns error and monitoring program (HA middleware or other operation support program) determine coordinator/datanode crashes, which is wrong. So we need another means to detect coordinator/datanode is running but gtm/gtm_proxy crashed. One solution will be to make "select 1" not to return error. In this case, we may need another means to detect if coordinator/datanode crashes. It could be very complicated and lead to allow very inconsistent view visible. I think cleaner solution is to provide "watchdog" to tell that sever loop is running and is ready to accept connections. I understand this is duplicate implementation in the case of PostgreSQL itself but is needed for XC. I also understand that this could conflict when PG itself implement similar feature. This kind of risk is found in many other places in XC and I believe watchdog timer is a good solution for monitoring coordinator/datanode independent from gtm status. Any feedbacks? ---------- Koichi Suzuki |
From: Koichi S. <koi...@gm...> - 2013-03-08 02:52:03
Attachments:
20120702_02_xc_watchdog.patch
|
I didn't have reactions to this. Again, we need to detect if coordinator/datanode is running even when gtm is down. Select 1 or select now does not for this purpose (it works for log shipping slave though). I'd like to start with the watchdog patch I submitted last July, attached just in case. This includes watchdog for gtm/gtmproxies. This may not be needed so far. An alternative is just to test if connection with one of PQ* functions succeeds. A bit of handling at the server is involved in this function and it could be used to detect if the server accepts connections. Please understand this is specific to XC, not to PG. Any input is welcome. Regards; ---------- Koichi Suzuki 2013/2/21 Koichi Suzuki <koi...@gm...>: > Hello, > > I found that "select 1" does now work to detect datanode/coordinator > crash correctly when gtm/gtm_proxy crashes. When gtm/gtm_proxy > crashes, "select 1" returns error and monitoring program (HA > middleware or other operation support program) determine > coordinator/datanode crashes, which is wrong. > > So we need another means to detect coordinator/datanode is running but > gtm/gtm_proxy crashed. One solution will be to make "select 1" not > to return error. In this case, we may need another means to detect if > coordinator/datanode crashes. It could be very complicated and lead > to allow very inconsistent view visible. I think cleaner solution is > to provide "watchdog" to tell that sever loop is running and is ready > to accept connections. I understand this is duplicate implementation > in the case of PostgreSQL itself but is needed for XC. I also > understand that this could conflict when PG itself implement similar > feature. This kind of risk is found in many other places in XC and I > believe watchdog timer is a good solution for monitoring > coordinator/datanode independent from gtm status. > > Any feedbacks? > ---------- > Koichi Suzuki |
From: Michael P. <mic...@gm...> - 2013-03-08 03:03:00
|
On Fri, Mar 8, 2013 at 11:51 AM, Koichi Suzuki <koi...@gm...>wrote: > I didn't have reactions to this. Again, we need to detect if > coordinator/datanode is running even when gtm is down. Select 1 or > select now does not for this purpose (it works for log shipping slave > though). > > I'd like to start with the watchdog patch I submitted last July, > attached just in case. This includes watchdog for gtm/gtmproxies. > This may not be needed so far. > > An alternative is just to test if connection with one of PQ* functions > succeeds. A bit of handling at the server is involved in this > function and it could be used to detect if the server accepts > connections. > > Please understand this is specific to XC, not to PG. > Watchdog processes have no place inside the core code. I think that merge with 9.3 will be done in a close future, so why not using an extension based on the facility for custom background workers introduced in 9.3. This could even be used with Postgres itself if it is nicely implemented, you know? -- Michael |
From: Koichi S. <koi...@gm...> - 2013-03-08 03:13:21
|
Because 9.3 merge will not be done in 1.1, I don't think it's feasible at present. Second means will be to use PQ* functions. Anyway, this will be provided by pgxc_monitor. May be a good idea to use custom background, but this could be too much because the requirement is very small. Regards; ---------- Koichi Suzuki 2013/3/8 Michael Paquier <mic...@gm...>: > > > On Fri, Mar 8, 2013 at 11:51 AM, Koichi Suzuki <koi...@gm...> > wrote: >> >> I didn't have reactions to this. Again, we need to detect if >> coordinator/datanode is running even when gtm is down. Select 1 or >> select now does not for this purpose (it works for log shipping slave >> though). >> >> I'd like to start with the watchdog patch I submitted last July, >> attached just in case. This includes watchdog for gtm/gtmproxies. >> This may not be needed so far. >> >> An alternative is just to test if connection with one of PQ* functions >> succeeds. A bit of handling at the server is involved in this >> function and it could be used to detect if the server accepts >> connections. >> >> Please understand this is specific to XC, not to PG. > > Watchdog processes have no place inside the core code. I think that merge > with 9.3 will be done in a close future, so why not using an extension based > on the facility for custom background workers introduced in 9.3. This could > even be used with Postgres itself if it is nicely implemented, you know? > -- > Michael |
From: Michael P. <mic...@gm...> - 2013-03-08 03:32:18
|
On Fri, Mar 8, 2013 at 12:13 PM, Koichi Suzuki <koi...@gm...>wrote: > Because 9.3 merge will not be done in 1.1, I don't think it's feasible > at present. Second means will be to use PQ* functions. Anyway, > this will be provided by pgxc_monitor. May be a good idea to use > custom background, but this could be too much because the requirement > is very small. > In this case use something like PQPing or similar, but simply do not involve core. There would be underlying performance impact for sure. -- Michael |
From: Koichi S. <koi...@gm...> - 2013-03-08 07:18:55
Attachments:
pgxc_monitor_20130308.patch
|
Okay, here's a patch which uses PQping. This is new to 9.1 and is extremely simple and matches my needs. Regards; ---------- Koichi Suzuki 2013/3/8 Michael Paquier <mic...@gm...>: > > > On Fri, Mar 8, 2013 at 12:13 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> Because 9.3 merge will not be done in 1.1, I don't think it's feasible >> at present. Second means will be to use PQ* functions. Anyway, >> this will be provided by pgxc_monitor. May be a good idea to use >> custom background, but this could be too much because the requirement >> is very small. > > In this case use something like PQPing or similar, but simply do not involve > core. There would be underlying performance impact for sure. > -- > Michael |
From: Nikhil S. <ni...@st...> - 2013-03-08 08:09:35
|
I use a simple 'psql -c "\x"' query to monitor coordinator/datanodes. The psql call ensures that the connection protocol is followed and accepted by that node. It then does an innocuous activity on the psql side before exiting. Works well for me. Regards, Nikhils On Fri, Mar 8, 2013 at 12:48 PM, Koichi Suzuki <koi...@gm...> wrote: > Okay, here's a patch which uses PQping. This is new to 9.1 and is > extremely simple and matches my needs. > > Regards; > ---------- > Koichi Suzuki > > > 2013/3/8 Michael Paquier <mic...@gm...>: >> >> >> On Fri, Mar 8, 2013 at 12:13 PM, Koichi Suzuki <koi...@gm...> >> wrote: >>> >>> Because 9.3 merge will not be done in 1.1, I don't think it's feasible >>> at present. Second means will be to use PQ* functions. Anyway, >>> this will be provided by pgxc_monitor. May be a good idea to use >>> custom background, but this could be too much because the requirement >>> is very small. >> >> In this case use something like PQPing or similar, but simply do not involve >> core. There would be underlying performance impact for sure. >> -- >> Michael > > ------------------------------------------------------------------------------ > Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester > Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the > endpoint security space. For insight on selecting the right partner to > tackle endpoint security challenges, access the full report. > http://p.sf.net/sfu/symantec-dev2dev > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Service |
From: Michael P. <mic...@gm...> - 2013-03-08 10:32:27
|
On Fri, Mar 8, 2013 at 5:09 PM, Nikhil Sontakke <ni...@st...> wrote: > I use a simple 'psql -c "\x"' query to monitor coordinator/datanodes. > The psql call ensures that the connection protocol is followed and > accepted by that node. It then does an innocuous activity on the psql > side before exiting. Works well for me. > +1. -- Michael |
From: Koichi S. <koi...@gm...> - 2013-03-08 08:35:11
|
Does it work correctly if gtm/gtm_proxy is not running? I found PQping is lighter and easier to use, which is dedicated API to check if the server is running. It is independent from users/databases and does not require any password. Just check the target is working. I think this is more flexible to be used in various setups. Regards; ---------- Koichi Suzuki 2013/3/8 Nikhil Sontakke <ni...@st...>: > I use a simple 'psql -c "\x"' query to monitor coordinator/datanodes. > The psql call ensures that the connection protocol is followed and > accepted by that node. It then does an innocuous activity on the psql > side before exiting. Works well for me. > > Regards, > Nikhils > > On Fri, Mar 8, 2013 at 12:48 PM, Koichi Suzuki > <koi...@gm...> wrote: >> Okay, here's a patch which uses PQping. This is new to 9.1 and is >> extremely simple and matches my needs. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2013/3/8 Michael Paquier <mic...@gm...>: >>> >>> >>> On Fri, Mar 8, 2013 at 12:13 PM, Koichi Suzuki <koi...@gm...> >>> wrote: >>>> >>>> Because 9.3 merge will not be done in 1.1, I don't think it's feasible >>>> at present. Second means will be to use PQ* functions. Anyway, >>>> this will be provided by pgxc_monitor. May be a good idea to use >>>> custom background, but this could be too much because the requirement >>>> is very small. >>> >>> In this case use something like PQPing or similar, but simply do not involve >>> core. There would be underlying performance impact for sure. >>> -- >>> Michael >> >> ------------------------------------------------------------------------------ >> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester >> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the >> endpoint security space. For insight on selecting the right partner to >> tackle endpoint security challenges, access the full report. >> http://p.sf.net/sfu/symantec-dev2dev >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Service |
From: Nikhil S. <ni...@st...> - 2013-03-08 09:23:15
|
> Does it work correctly if gtm/gtm_proxy is not running? Yeah, it does. I faced the same issues that if gtm is down, then the call would error out and the HA infrastructure would wrongly assume that this node is down and do failover. With this simple psql call all that's avoided. Regards, Nikhils >I found > PQping is lighter and easier to use, which is dedicated API to check > if the server is running. It is independent from users/databases and > does not require any password. Just check the target is working. > > I think this is more flexible to be used in various setups. > > Regards; > ---------- > Koichi Suzuki > > > 2013/3/8 Nikhil Sontakke <ni...@st...>: >> I use a simple 'psql -c "\x"' query to monitor coordinator/datanodes. >> The psql call ensures that the connection protocol is followed and >> accepted by that node. It then does an innocuous activity on the psql >> side before exiting. Works well for me. >> >> Regards, >> Nikhils >> >> On Fri, Mar 8, 2013 at 12:48 PM, Koichi Suzuki >> <koi...@gm...> wrote: >>> Okay, here's a patch which uses PQping. This is new to 9.1 and is >>> extremely simple and matches my needs. >>> >>> Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> 2013/3/8 Michael Paquier <mic...@gm...>: >>>> >>>> >>>> On Fri, Mar 8, 2013 at 12:13 PM, Koichi Suzuki <koi...@gm...> >>>> wrote: >>>>> >>>>> Because 9.3 merge will not be done in 1.1, I don't think it's feasible >>>>> at present. Second means will be to use PQ* functions. Anyway, >>>>> this will be provided by pgxc_monitor. May be a good idea to use >>>>> custom background, but this could be too much because the requirement >>>>> is very small. >>>> >>>> In this case use something like PQPing or similar, but simply do not involve >>>> core. There would be underlying performance impact for sure. >>>> -- >>>> Michael >>> >>> ------------------------------------------------------------------------------ >>> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester >>> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the >>> endpoint security space. For insight on selecting the right partner to >>> tackle endpoint security challenges, access the full report. >>> http://p.sf.net/sfu/symantec-dev2dev >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud >> Postgres-XC Support and Service -- StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Service |