libpaxos-general Mailing List for libpaxos - General purpose Paxos library
Status: Beta
Brought to you by:
marco-tijuana
You can subscribe to this list here.
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
|
Feb
(10) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(6) |
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(10) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
(2) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
(6) |
Apr
(2) |
May
(4) |
Jun
(5) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Vineet s. <vs....@gm...> - 2018-08-29 09:39:55
|
Hello Sir, I checked out libpaxos. I am considering implementation of paxos algorithm for my minor project and your source code could really help me in my project. If you could send proper documentation and explanation of code,it would be a really great help for me. Thankyou |
From: Jinglei R. <ji...@re...> - 2016-07-11 07:12:37
|
Hi Daniele, BEV_OPT_THREADSAFE solves the issue. Thank you so much! By the way, I make a PR to show you our C++ replica wrapper. It is more aligned with practical use scenarios, I think. It is located in a separate dir "c++". Best, Jinglei ________________________________ From: Daniele Sciascia <dan...@gm...> Sent: Wednesday, June 15, 2016 2:25:47 PM To: Jinglei Ren Cc: libpaxos Subject: Re: [Libpaxos-general] Submit rate skew Hi Jinglei, Sorry, I misread your previous email: the errors you sent were traces. Perhaps you need to use option BEV_OPT_THREADSAFE when creating the client bufferevent. The client bufferevent is potentially accessed by the client and the event loop. This could explain paths 1 and 2. Daniele > On Jun 14, 2016, at 1:25 AM, Jinglei Ren <ji...@re...stems> wrote: > > Thank you, Daniele. But evthread_use_pthreads() has been called in the first place, and returns zero (success). Strange... > > Is there any possible issue in the event handler and dispatching code of evpaxos that involves buffers while individual client values arrive randomly and in multiple parts? It seems if the client value size is large, the chance of failure increases. > > Best, > Jinglei > > -----Original Message----- > From: Daniele Sciascia [mailto:dan...@us...] > Sent: Tuesday, June 14, 2016 3:07 AM > To: Jinglei Ren <ji...@re...stems> > Cc: libpaxos <lib...@li...> > Subject: Re: [Libpaxos-general] Submit rate skew > > Hi Jinglei, > > I think it should be fine to use paxos_submit() from a different thread. Internally it makes no use of shared resources, except for the bufferevent which operates on a common the shared event_base. > You have to setup libevent for multithreading: call evthread_use_pthreads *before* creating the event_base, and link to libevent_pthreads. > > Daniele > >> On Jun 12, 2016, at 10:40 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> Sorry for my late reply, and thank you all very much! I am switching back to this issue. >> >> But before that, I have seen occasional segment faults or memory errors. Is there any problem if I call paxos_submit() in a different thread than the thread blocking on event_base_dispatch()? In contrast, the library's sample code in sample/client.c calls paxos_submit() on deliver callback and seems working well. >> >> More details about my implementation: >> The main thread creates an event_base, and initializes an evpaxos_replica with it. >> The main thread spawns another thread that calls event_base_dispatch(). >> The main thread connects to a proposer and creates a new bufferevent on the same event_base. >> The main thread submit data to the bufferevent by paxos_submit(). >> >> Any potential concurrency bug? >> >> Some typical exception paths: >> 1. "free(): invalid pointer" -- in evbuffer_add () -- in >> bufferevent_write () -- in bufferevent_pack_data () -- in >> msgpack_pack_raw_body () -- in msgpack_pack_string () -- in >> msgpack_pack_paxos_value () -- in msgpack_pack_paxos_client_value () >> -- in msgpack_pack_paxos_message () -- in send_paxos_message () -- in >> paxos_submit () >> >> 2. SIGSEGV in evbuffer_drain () -- in evbuffer_drain () -- in >> evbuffer_write_atmost () -- in event_base_loop () >> >> 3. SIGSEGV in __memcpy_sse2_unaligned () -- in >> msgpack_unpack_string_at () -- in msgpack_unpack_paxos_value_at () -- >> in msgpack_unpack_paxos_client_value () -- in >> msgpack_unpack_paxos_message () -- in recv_paxos_message () -- in >> on_read () -- in event_base_loop () >> >> Thanks, >> Jinglei >> From: Daniele Sciascia <dan...@us...> >> Sent: Sunday, May 8, 2016 8:39:31 PM >> To: Sciascia Daniele >> Cc: Jinglei Ren; libpaxos >> Subject: Re: [Libpaxos-general] Submit rate skew >> >> Hi Jinglei, >> I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. >> I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. >> So far I can't tell if this problem is due to libpaxos itself. >> >> Cheers, >> >> Daniele >> >>> On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: >>> >>> Hi Jinglei, >>> >>> Thanks for reporting your issue. I have never observed this behavior, so I don't know why is that happening. >>> Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. >>> >>> Your replicas are running on the same machine, are the clients also in the same machine? >>> >>> I can't easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). >>> Would it be possible for you to share a minimal client that reproduces your issue? >>> >>> Thanks, >>> >>> Daniele >>> >>>> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: >>>> >>>> Hi guys, >>>> >>>> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? >>>> >>>> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. >>>> >>>> Many thanks, >>>> Jinglei >>>> >>>> ------------------------------------------------------------------- >>>> ----------- Find and fix application performance issues faster with >>>> Applications Manager Applications Manager provides deep performance >>>> insights into multiple tiers of your business applications. It >>>> resolves application problems quickly and reduces your MTTR. Get >>>> your free trial! >>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___________ >>>> ____________________________________ >>>> Libpaxos-general mailing list >>>> Lib...@li... >>>> https://lists.sourceforge.net/lists/listinfo/libpaxos-general >>> >>> >>> -------------------------------------------------------------------- >>> ---------- Find and fix application performance issues faster with >>> Applications Manager Applications Manager provides deep performance >>> insights into multiple tiers of your business applications. It >>> resolves application problems quickly and reduces your MTTR. Get >>> your free trial! >>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >>> _______________________________________________ >>> Libpaxos-general mailing list >>> Lib...@li... >>> https://lists.sourceforge.net/lists/listinfo/libpaxos-general >> >> ---------------------------------------------------------------------- >> -------- What NetFlow Analyzer can do for you? Monitors network >> bandwidth and traffic patterns at an interface-level. Reveals which >> users, apps, and protocols are consuming the most bandwidth. Provides >> multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make >> informed decisions using capacity planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e______________ >> _________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > |
From: Daniele S. <dan...@gm...> - 2016-06-15 06:25:56
|
Hi Jinglei, Sorry, I misread your previous email: the errors you sent were traces. Perhaps you need to use option BEV_OPT_THREADSAFE when creating the client bufferevent. The client bufferevent is potentially accessed by the client and the event loop. This could explain paths 1 and 2. Daniele > On Jun 14, 2016, at 1:25 AM, Jinglei Ren <ji...@re...stems> wrote: > > Thank you, Daniele. But evthread_use_pthreads() has been called in the first place, and returns zero (success). Strange... > > Is there any possible issue in the event handler and dispatching code of evpaxos that involves buffers while individual client values arrive randomly and in multiple parts? It seems if the client value size is large, the chance of failure increases. > > Best, > Jinglei > > -----Original Message----- > From: Daniele Sciascia [mailto:dan...@us...] > Sent: Tuesday, June 14, 2016 3:07 AM > To: Jinglei Ren <ji...@re...stems> > Cc: libpaxos <lib...@li...> > Subject: Re: [Libpaxos-general] Submit rate skew > > Hi Jinglei, > > I think it should be fine to use paxos_submit() from a different thread. Internally it makes no use of shared resources, except for the bufferevent which operates on a common the shared event_base. > You have to setup libevent for multithreading: call evthread_use_pthreads *before* creating the event_base, and link to libevent_pthreads. > > Daniele > >> On Jun 12, 2016, at 10:40 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> Sorry for my late reply, and thank you all very much! I am switching back to this issue. >> >> But before that, I have seen occasional segment faults or memory errors. Is there any problem if I call paxos_submit() in a different thread than the thread blocking on event_base_dispatch()? In contrast, the library's sample code in sample/client.c calls paxos_submit() on deliver callback and seems working well. >> >> More details about my implementation: >> The main thread creates an event_base, and initializes an evpaxos_replica with it. >> The main thread spawns another thread that calls event_base_dispatch(). >> The main thread connects to a proposer and creates a new bufferevent on the same event_base. >> The main thread submit data to the bufferevent by paxos_submit(). >> >> Any potential concurrency bug? >> >> Some typical exception paths: >> 1. "free(): invalid pointer" -- in evbuffer_add () -- in >> bufferevent_write () -- in bufferevent_pack_data () -- in >> msgpack_pack_raw_body () -- in msgpack_pack_string () -- in >> msgpack_pack_paxos_value () -- in msgpack_pack_paxos_client_value () >> -- in msgpack_pack_paxos_message () -- in send_paxos_message () -- in >> paxos_submit () >> >> 2. SIGSEGV in evbuffer_drain () -- in evbuffer_drain () -- in >> evbuffer_write_atmost () -- in event_base_loop () >> >> 3. SIGSEGV in __memcpy_sse2_unaligned () -- in >> msgpack_unpack_string_at () -- in msgpack_unpack_paxos_value_at () -- >> in msgpack_unpack_paxos_client_value () -- in >> msgpack_unpack_paxos_message () -- in recv_paxos_message () -- in >> on_read () -- in event_base_loop () >> >> Thanks, >> Jinglei >> From: Daniele Sciascia <dan...@us...> >> Sent: Sunday, May 8, 2016 8:39:31 PM >> To: Sciascia Daniele >> Cc: Jinglei Ren; libpaxos >> Subject: Re: [Libpaxos-general] Submit rate skew >> >> Hi Jinglei, >> I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. >> I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. >> So far I can't tell if this problem is due to libpaxos itself. >> >> Cheers, >> >> Daniele >> >>> On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: >>> >>> Hi Jinglei, >>> >>> Thanks for reporting your issue. I have never observed this behavior, so I don't know why is that happening. >>> Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. >>> >>> Your replicas are running on the same machine, are the clients also in the same machine? >>> >>> I can't easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). >>> Would it be possible for you to share a minimal client that reproduces your issue? >>> >>> Thanks, >>> >>> Daniele >>> >>>> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: >>>> >>>> Hi guys, >>>> >>>> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? >>>> >>>> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. >>>> >>>> Many thanks, >>>> Jinglei >>>> >>>> ------------------------------------------------------------------- >>>> ----------- Find and fix application performance issues faster with >>>> Applications Manager Applications Manager provides deep performance >>>> insights into multiple tiers of your business applications. It >>>> resolves application problems quickly and reduces your MTTR. Get >>>> your free trial! >>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___________ >>>> ____________________________________ >>>> Libpaxos-general mailing list >>>> Lib...@li... >>>> https://lists.sourceforge.net/lists/listinfo/libpaxos-general >>> >>> >>> -------------------------------------------------------------------- >>> ---------- Find and fix application performance issues faster with >>> Applications Manager Applications Manager provides deep performance >>> insights into multiple tiers of your business applications. It >>> resolves application problems quickly and reduces your MTTR. Get >>> your free trial! >>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >>> _______________________________________________ >>> Libpaxos-general mailing list >>> Lib...@li... >>> https://lists.sourceforge.net/lists/listinfo/libpaxos-general >> >> ---------------------------------------------------------------------- >> -------- What NetFlow Analyzer can do for you? Monitors network >> bandwidth and traffic patterns at an interface-level. Reveals which >> users, apps, and protocols are consuming the most bandwidth. Provides >> multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make >> informed decisions using capacity planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e______________ >> _________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > |
From: Daniele S. <dan...@gm...> - 2016-06-14 20:02:15
|
Hi Jinglei, It’s hard to tell what’s going on. Can you provide a stack trace for one of those errors? This would perhaps help identify which part of the code is faulty. Or even better if you could share a minimal client that reproduces the issue. Daniele > On Jun 14, 2016, at 1:25 AM, Jinglei Ren <ji...@re...stems> wrote: > > Thank you, Daniele. But evthread_use_pthreads() has been called in the first place, and returns zero (success). Strange... > > Is there any possible issue in the event handler and dispatching code of evpaxos that involves buffers while individual client values arrive randomly and in multiple parts? It seems if the client value size is large, the chance of failure increases. > > Best, > Jinglei > > -----Original Message----- > From: Daniele Sciascia [mailto:dan...@us...] > Sent: Tuesday, June 14, 2016 3:07 AM > To: Jinglei Ren <ji...@re...stems> > Cc: libpaxos <lib...@li...> > Subject: Re: [Libpaxos-general] Submit rate skew > > Hi Jinglei, > > I think it should be fine to use paxos_submit() from a different thread. Internally it makes no use of shared resources, except for the bufferevent which operates on a common the shared event_base. > You have to setup libevent for multithreading: call evthread_use_pthreads *before* creating the event_base, and link to libevent_pthreads. > > Daniele > >> On Jun 12, 2016, at 10:40 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> Sorry for my late reply, and thank you all very much! I am switching back to this issue. >> >> But before that, I have seen occasional segment faults or memory errors. Is there any problem if I call paxos_submit() in a different thread than the thread blocking on event_base_dispatch()? In contrast, the library's sample code in sample/client.c calls paxos_submit() on deliver callback and seems working well. >> >> More details about my implementation: >> The main thread creates an event_base, and initializes an evpaxos_replica with it. >> The main thread spawns another thread that calls event_base_dispatch(). >> The main thread connects to a proposer and creates a new bufferevent on the same event_base. >> The main thread submit data to the bufferevent by paxos_submit(). >> >> Any potential concurrency bug? >> >> Some typical exception paths: >> 1. "free(): invalid pointer" -- in evbuffer_add () -- in >> bufferevent_write () -- in bufferevent_pack_data () -- in >> msgpack_pack_raw_body () -- in msgpack_pack_string () -- in >> msgpack_pack_paxos_value () -- in msgpack_pack_paxos_client_value () >> -- in msgpack_pack_paxos_message () -- in send_paxos_message () -- in >> paxos_submit () >> >> 2. SIGSEGV in evbuffer_drain () -- in evbuffer_drain () -- in >> evbuffer_write_atmost () -- in event_base_loop () >> >> 3. SIGSEGV in __memcpy_sse2_unaligned () -- in >> msgpack_unpack_string_at () -- in msgpack_unpack_paxos_value_at () -- >> in msgpack_unpack_paxos_client_value () -- in >> msgpack_unpack_paxos_message () -- in recv_paxos_message () -- in >> on_read () -- in event_base_loop () >> >> Thanks, >> Jinglei >> From: Daniele Sciascia <dan...@us...> >> Sent: Sunday, May 8, 2016 8:39:31 PM >> To: Sciascia Daniele >> Cc: Jinglei Ren; libpaxos >> Subject: Re: [Libpaxos-general] Submit rate skew >> >> Hi Jinglei, >> I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. >> I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. >> So far I can't tell if this problem is due to libpaxos itself. >> >> Cheers, >> >> Daniele >> >>> On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: >>> >>> Hi Jinglei, >>> >>> Thanks for reporting your issue. I have never observed this behavior, so I don't know why is that happening. >>> Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. >>> >>> Your replicas are running on the same machine, are the clients also in the same machine? >>> >>> I can't easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). >>> Would it be possible for you to share a minimal client that reproduces your issue? >>> >>> Thanks, >>> >>> Daniele >>> >>>> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: >>>> >>>> Hi guys, >>>> >>>> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? >>>> >>>> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. >>>> >>>> Many thanks, >>>> Jinglei >>>> >>>> ------------------------------------------------------------------- >>>> ----------- Find and fix application performance issues faster with >>>> Applications Manager Applications Manager provides deep performance >>>> insights into multiple tiers of your business applications. It >>>> resolves application problems quickly and reduces your MTTR. Get >>>> your free trial! >>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___________ >>>> ____________________________________ >>>> Libpaxos-general mailing list >>>> Lib...@li... >>>> https://lists.sourceforge.net/lists/listinfo/libpaxos-general >>> >>> >>> -------------------------------------------------------------------- >>> ---------- Find and fix application performance issues faster with >>> Applications Manager Applications Manager provides deep performance >>> insights into multiple tiers of your business applications. It >>> resolves application problems quickly and reduces your MTTR. Get >>> your free trial! >>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >>> _______________________________________________ >>> Libpaxos-general mailing list >>> Lib...@li... >>> https://lists.sourceforge.net/lists/listinfo/libpaxos-general >> >> ---------------------------------------------------------------------- >> -------- What NetFlow Analyzer can do for you? Monitors network >> bandwidth and traffic patterns at an interface-level. Reveals which >> users, apps, and protocols are consuming the most bandwidth. Provides >> multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make >> informed decisions using capacity planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e______________ >> _________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Jinglei R. <ji...@re...> - 2016-06-13 23:40:45
|
Thank you, Daniele. But evthread_use_pthreads() has been called in the first place, and returns zero (success). Strange... Is there any possible issue in the event handler and dispatching code of evpaxos that involves buffers while individual client values arrive randomly and in multiple parts? It seems if the client value size is large, the chance of failure increases. Best, Jinglei -----Original Message----- From: Daniele Sciascia [mailto:dan...@us...] Sent: Tuesday, June 14, 2016 3:07 AM To: Jinglei Ren <ji...@re...stems> Cc: libpaxos <lib...@li...> Subject: Re: [Libpaxos-general] Submit rate skew Hi Jinglei, I think it should be fine to use paxos_submit() from a different thread. Internally it makes no use of shared resources, except for the bufferevent which operates on a common the shared event_base. You have to setup libevent for multithreading: call evthread_use_pthreads *before* creating the event_base, and link to libevent_pthreads. Daniele > On Jun 12, 2016, at 10:40 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi guys, > > Sorry for my late reply, and thank you all very much! I am switching back to this issue. > > But before that, I have seen occasional segment faults or memory errors. Is there any problem if I call paxos_submit() in a different thread than the thread blocking on event_base_dispatch()? In contrast, the library's sample code in sample/client.c calls paxos_submit() on deliver callback and seems working well. > > More details about my implementation: > The main thread creates an event_base, and initializes an evpaxos_replica with it. > The main thread spawns another thread that calls event_base_dispatch(). > The main thread connects to a proposer and creates a new bufferevent on the same event_base. > The main thread submit data to the bufferevent by paxos_submit(). > > Any potential concurrency bug? > > Some typical exception paths: > 1. "free(): invalid pointer" -- in evbuffer_add () -- in > bufferevent_write () -- in bufferevent_pack_data () -- in > msgpack_pack_raw_body () -- in msgpack_pack_string () -- in > msgpack_pack_paxos_value () -- in msgpack_pack_paxos_client_value () > -- in msgpack_pack_paxos_message () -- in send_paxos_message () -- in > paxos_submit () > > 2. SIGSEGV in evbuffer_drain () -- in evbuffer_drain () -- in > evbuffer_write_atmost () -- in event_base_loop () > > 3. SIGSEGV in __memcpy_sse2_unaligned () -- in > msgpack_unpack_string_at () -- in msgpack_unpack_paxos_value_at () -- > in msgpack_unpack_paxos_client_value () -- in > msgpack_unpack_paxos_message () -- in recv_paxos_message () -- in > on_read () -- in event_base_loop () > > Thanks, > Jinglei > From: Daniele Sciascia <dan...@us...> > Sent: Sunday, May 8, 2016 8:39:31 PM > To: Sciascia Daniele > Cc: Jinglei Ren; libpaxos > Subject: Re: [Libpaxos-general] Submit rate skew > > Hi Jinglei, > I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. > I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. > So far I can't tell if this problem is due to libpaxos itself. > > Cheers, > > Daniele > > > On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: > > > > Hi Jinglei, > > > > Thanks for reporting your issue. I have never observed this behavior, so I don't know why is that happening. > > Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. > > > > Your replicas are running on the same machine, are the clients also in the same machine? > > > > I can't easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). > > Would it be possible for you to share a minimal client that reproduces your issue? > > > > Thanks, > > > > Daniele > > > >> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: > >> > >> Hi guys, > >> > >> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? > >> > >> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. > >> > >> Many thanks, > >> Jinglei > >> > >> ------------------------------------------------------------------- > >> ----------- Find and fix application performance issues faster with > >> Applications Manager Applications Manager provides deep performance > >> insights into multiple tiers of your business applications. It > >> resolves application problems quickly and reduces your MTTR. Get > >> your free trial! > >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___________ > >> ____________________________________ > >> Libpaxos-general mailing list > >> Lib...@li... > >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > > > > > -------------------------------------------------------------------- > > ---------- Find and fix application performance issues faster with > > Applications Manager Applications Manager provides deep performance > > insights into multiple tiers of your business applications. It > > resolves application problems quickly and reduces your MTTR. Get > > your free trial! > > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > > _______________________________________________ > > Libpaxos-general mailing list > > Lib...@li... > > https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > ---------------------------------------------------------------------- > -------- What NetFlow Analyzer can do for you? Monitors network > bandwidth and traffic patterns at an interface-level. Reveals which > users, apps, and protocols are consuming the most bandwidth. Provides > multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make > informed decisions using capacity planning reports. > https://ad.doubleclick.net/ddm/clk/305295220;132659582;e______________ > _________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Daniele S. <dan...@us...> - 2016-06-13 19:07:03
|
Hi Jinglei, I think it should be fine to use paxos_submit() from a different thread. Internally it makes no use of shared resources, except for the bufferevent which operates on a common the shared event_base. You have to setup libevent for multithreading: call evthread_use_pthreads *before* creating the event_base, and link to libevent_pthreads. Daniele > On Jun 12, 2016, at 10:40 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi guys, > > Sorry for my late reply, and thank you all very much! I am switching back to this issue. > > But before that, I have seen occasional segment faults or memory errors. Is there any problem if I call paxos_submit() in a different thread than the thread blocking on event_base_dispatch()? In contrast, the library's sample code in sample/client.c calls paxos_submit() on deliver callback and seems working well. > > More details about my implementation: > The main thread creates an event_base, and initializes an evpaxos_replica with it. > The main thread spawns another thread that calls event_base_dispatch(). > The main thread connects to a proposer and creates a new bufferevent on the same event_base. > The main thread submit data to the bufferevent by paxos_submit(). > > Any potential concurrency bug? > > Some typical exception paths: > 1. "free(): invalid pointer" -- in evbuffer_add () -- in bufferevent_write () -- in bufferevent_pack_data () -- in msgpack_pack_raw_body () -- in msgpack_pack_string () -- in msgpack_pack_paxos_value () -- in msgpack_pack_paxos_client_value () -- in msgpack_pack_paxos_message () -- in send_paxos_message () -- in paxos_submit () > > 2. SIGSEGV in evbuffer_drain () -- in evbuffer_drain () -- in evbuffer_write_atmost () -- in event_base_loop () > > 3. SIGSEGV in __memcpy_sse2_unaligned () -- in msgpack_unpack_string_at () -- in msgpack_unpack_paxos_value_at () -- in msgpack_unpack_paxos_client_value () -- in msgpack_unpack_paxos_message () -- in recv_paxos_message () -- in on_read () -- in event_base_loop () > > Thanks, > Jinglei > From: Daniele Sciascia <dan...@us...> > Sent: Sunday, May 8, 2016 8:39:31 PM > To: Sciascia Daniele > Cc: Jinglei Ren; libpaxos > Subject: Re: [Libpaxos-general] Submit rate skew > > Hi Jinglei, > I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. > I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. > So far I can’t tell if this problem is due to libpaxos itself. > > Cheers, > > Daniele > > > On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: > > > > Hi Jinglei, > > > > Thanks for reporting your issue. I have never observed this behavior, so I don’t know why is that happening. > > Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. > > > > Your replicas are running on the same machine, are the clients also in the same machine? > > > > I can’t easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). > > Would it be possible for you to share a minimal client that reproduces your issue? > > > > Thanks, > > > > Daniele > > > >> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: > >> > >> Hi guys, > >> > >> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? > >> > >> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. > >> > >> Many thanks, > >> Jinglei > >> > >> ------------------------------------------------------------------------------ > >> Find and fix application performance issues faster with Applications Manager > >> Applications Manager provides deep performance insights into multiple tiers of > >> your business applications. It resolves application problems quickly and > >> reduces your MTTR. Get your free trial! > >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________ > >> Libpaxos-general mailing list > >> Lib...@li... > >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > > > > > ------------------------------------------------------------------------------ > > Find and fix application performance issues faster with Applications Manager > > Applications Manager provides deep performance insights into multiple tiers of > > your business applications. It resolves application problems quickly and > > reduces your MTTR. Get your free trial! > > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > > _______________________________________________ > > Libpaxos-general mailing list > > Lib...@li... > > https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e_______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Jinglei R. <ji...@re...> - 2016-06-12 09:10:54
|
Hi guys, Sorry for my late reply, and thank you all very much! I am switching back to this issue. But before that, I have seen occasional segment faults or memory errors. Is there any problem if I call paxos_submit() in a different thread than the thread blocking on event_base_dispatch()? In contrast, the library's sample code in sample/client.c calls paxos_submit() on deliver callback and seems working well. More details about my implementation: The main thread creates an event_base, and initializes an evpaxos_replica with it. The main thread spawns another thread that calls event_base_dispatch(). The main thread connects to a proposer and creates a new bufferevent on the same event_base. The main thread submit data to the bufferevent by paxos_submit(). Any potential concurrency bug? Some typical exception paths: 1. "free(): invalid pointer" -- in evbuffer_add () -- in bufferevent_write () -- in bufferevent_pack_data () -- in msgpack_pack_raw_body () -- in msgpack_pack_string () -- in msgpack_pack_paxos_value () -- in msgpack_pack_paxos_client_value () -- in msgpack_pack_paxos_message () -- in send_paxos_message () -- in paxos_submit () 2. SIGSEGV in evbuffer_drain () -- in evbuffer_drain () -- in evbuffer_write_atmost () -- in event_base_loop () 3. SIGSEGV in __memcpy_sse2_unaligned () -- in msgpack_unpack_string_at () -- in msgpack_unpack_paxos_value_at () -- in msgpack_unpack_paxos_client_value () -- in msgpack_unpack_paxos_message () -- in recv_paxos_message () -- in on_read () -- in event_base_loop () Thanks, Jinglei ________________________________ From: Daniele Sciascia <dan...@us...> Sent: Sunday, May 8, 2016 8:39:31 PM To: Sciascia Daniele Cc: Jinglei Ren; libpaxos Subject: Re: [Libpaxos-general] Submit rate skew Hi Jinglei, I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. So far I can’t tell if this problem is due to libpaxos itself. Cheers, Daniele > On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: > > Hi Jinglei, > > Thanks for reporting your issue. I have never observed this behavior, so I don’t know why is that happening. > Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. > > Your replicas are running on the same machine, are the clients also in the same machine? > > I can’t easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). > Would it be possible for you to share a minimal client that reproduces your issue? > > Thanks, > > Daniele > >> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? >> >> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. >> >> Many thanks, >> Jinglei >> >> ------------------------------------------------------------------------------ >> Find and fix application performance issues faster with Applications Manager >> Applications Manager provides deep performance insights into multiple tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Daniele S. <dan...@gm...> - 2016-05-30 07:32:22
|
Hi Xinxi, If you would like to contribute, please submit a pull request and I will review your changes. You can open pull requests here https://bitbucket.org/sciascid/libpaxos/pull-requests/ Thanks for submitting the bug report, I will look into it. Daniele > On May 30, 2016, at 8:58 AM, Xinxi Wang <wan...@me...> wrote: > > Hi Daniele, > > I implemented RAFT’s heartbeat based leader election function on top of your code. It seems working. But I still need to test more. Is there anyway to submit the code after the test? > > The instance IDs problem are still there. I don’t know how to solve and just submitted a bug report: https://bitbucket.org/sciascid/libpaxos/issues/5/wrong-instance-ids-generated-by-a. > > Best Regards, > Xinxi Wang > >> On 30 May 2016, at 2:45 pm, Daniele Sciascia <dan...@gm...> wrote: >> >> Hi, >> >> There is currently no plan for adding leader election. Perhaps one approach that might be slightly easier is the following: >> >> If you deploy a few proposers/replicas, your clients should try to submit values to one proposer only. You could achieve this by having your clients connect to all of the proposers. Initially the clients submit their values to the proposer with the smallest id. Should this proposer fail, then the clients will eventually notice because their connection to the proposer drops. When this happens, clients should switch to the proposer with smallest id among the non faulty one. >> If at some point one client thinks that proposer 0 is the leader, and another client thinks that proposer 1 is the leader, Paxos will still guarantee safety, but progress is might not be guaranteed. >> >> >> I’m not sure I understand the issue with recovering replicas. What do you mean when you say that a recovering replica generates wrong ids? If you could open a bug report here https://bitbucket.org/sciascid/libpaxos/issues?status=new&status=open, and explain the problem and how to reproduce that would be appreciated. >> >> Thanks, >> >> Daniele >> >>> On May 23, 2016, at 10:00 AM, Xinxi Wang <an...@gm...> wrote: >>> >>> Hi guys, >>> >>> I am trying to build a distributed system on top of Paxos/Raft but find no good library for Raft although Raft seems simpler and gives a natural way for leader election. >>> >>> I have been studying the library for a few days. The quality of the code is pretty good. It is also fantastic for understanding multi-paxos. However, I am wondering if there is a plan for adding leader election function into the code or not. >>> >>> I also tested the node recovering function. It seems that when a replica is being recovered, I can still connect a client to that replica, which generates wrong instance ids. If there is a leader election algorithm, the recovering node shouldn’t become the leader, avoiding being requested by a client. >>> >>> If I try to implement the leader election algorithm myself. What’s the easiest way to do that? Shall we copy the heartbeat approach used by Raft? How to solve the recovering node issue? >>> >>> >>> Best Regards, >>> Xinxi Wang >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> _______________________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > |
From: Daniele S. <dan...@gm...> - 2016-05-30 06:45:14
|
Hi, There is currently no plan for adding leader election. Perhaps one approach that might be slightly easier is the following: If you deploy a few proposers/replicas, your clients should try to submit values to one proposer only. You could achieve this by having your clients connect to all of the proposers. Initially the clients submit their values to the proposer with the smallest id. Should this proposer fail, then the clients will eventually notice because their connection to the proposer drops. When this happens, clients should switch to the proposer with smallest id among the non faulty one. If at some point one client thinks that proposer 0 is the leader, and another client thinks that proposer 1 is the leader, Paxos will still guarantee safety, but progress is might not be guaranteed. I’m not sure I understand the issue with recovering replicas. What do you mean when you say that a recovering replica generates wrong ids? If you could open a bug report here https://bitbucket.org/sciascid/libpaxos/issues?status=new&status=open, and explain the problem and how to reproduce that would be appreciated. Thanks, Daniele > On May 23, 2016, at 10:00 AM, Xinxi Wang <an...@gm...> wrote: > > Hi guys, > > I am trying to build a distributed system on top of Paxos/Raft but find no good library for Raft although Raft seems simpler and gives a natural way for leader election. > > I have been studying the library for a few days. The quality of the code is pretty good. It is also fantastic for understanding multi-paxos. However, I am wondering if there is a plan for adding leader election function into the code or not. > > I also tested the node recovering function. It seems that when a replica is being recovered, I can still connect a client to that replica, which generates wrong instance ids. If there is a leader election algorithm, the recovering node shouldn’t become the leader, avoiding being requested by a client. > > If I try to implement the leader election algorithm myself. What’s the easiest way to do that? Shall we copy the heartbeat approach used by Raft? How to solve the recovering node issue? > > > Best Regards, > Xinxi Wang |
From: Xinxi W. <an...@gm...> - 2016-05-23 08:00:41
|
Hi guys, I am trying to build a distributed system on top of Paxos/Raft but find no good library for Raft although Raft seems simpler and gives a natural way for leader election. I have been studying the library for a few days. The quality of the code is pretty good. It is also fantastic for understanding multi-paxos. However, I am wondering if there is a plan for adding leader election function into the code or not. I also tested the node recovering function. It seems that when a replica is being recovered, I can still connect a client to that replica, which generates wrong instance ids. If there is a leader election algorithm, the recovering node shouldn’t become the leader, avoiding being requested by a client. If I try to implement the leader election algorithm myself. What’s the easiest way to do that? Shall we copy the heartbeat approach used by Raft? How to solve the recovering node issue? Best Regards, Xinxi Wang |
From: Daniele S. <dan...@us...> - 2016-05-08 12:39:40
|
Hi Jinglei, I modified the sample client included in libpaxos such that it to report statistics about its own delivered values only. I tried to start 3 replicas and 2 clients and it looks like the clients manage to submit values at approximately the same rate. So far I can’t tell if this problem is due to libpaxos itself. Cheers, Daniele > On Apr 28, 2016, at 7:29 PM, Daniele Sciascia <dan...@us...> wrote: > > Hi Jinglei, > > Thanks for reporting your issue. I have never observed this behavior, so I don’t know why is that happening. > Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. > > Your replicas are running on the same machine, are the clients also in the same machine? > > I can’t easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). > Would it be possible for you to share a minimal client that reproduces your issue? > > Thanks, > > Daniele > >> On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? >> >> Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. >> >> Many thanks, >> Jinglei >> >> ------------------------------------------------------------------------------ >> Find and fix application performance issues faster with Applications Manager >> Applications Manager provides deep performance insights into multiple tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Daniele S. <dan...@us...> - 2016-04-28 17:29:24
|
Hi Jinglei, Thanks for reporting your issue. I have never observed this behavior, so I don’t know why is that happening. Generally speaking, the proposer processes incoming client values in arrival order, so I doubt that something is wrong in there. Your replicas are running on the same machine, are the clients also in the same machine? I can’t easily verify this right now (the provided sample client.c does not keep track of which client id submitted the delivered value). Would it be possible for you to share a minimal client that reproduces your issue? Thanks, Daniele > On Apr 22, 2016, at 2:03 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi guys, > > When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? > > Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. > > Many thanks, > Jinglei > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Jinglei R. <ji...@re...> - 2016-04-22 00:18:27
|
Hi guys, When multiple clients submit values to libpaxos3, I find the submit rate of each client is biased, not level. For example, to submit 1 million values in total, Node 0 makes 439k, Node 1 makes only 81k and Node 2 makes 480k. Sometimes even worse. Each client submits a new value only when its previous value is delivered. All clients submit to Node 0. What's the problem? Setup: Three processes (or "nodes") on a 4-core dedicated machine. Each process spawns a thread to act as a replica, blocking on event_base_dispatch(). Then the main thread submits values by paxos_submit(), using a separate bufferevent but sharing the event_base (multi-threading support in libevent should be correctly configured). It does not trigger submission by on-deliver callback as in sample/client.c. Instead, it uses a condition variable to sync between the main thread and the replica thread. Any submitted value includes the replica ID so one replica knows which delivered value is from local and which is from peers. Many thanks, Jinglei |
From: Daniele S. <dan...@us...> - 2016-03-16 06:37:06
|
Right, it should go in all sample programs. Thanks for the pull requests! Daniele > On Mar 15, 2016, at 7:07 AM, Jinglei Ren <ji...@re...stems> wrote: > > Daniele, Thank you very much for taking care of these issues. Your responsiveness adds to our confidence on building our work atop libpaxos3 :D > > Right, signal(SIGPIPE, SIG_IGN) solves the crash issue -- I noticed it because you used it in counter.c. So, maybe it should be applied to all sample programs, right? > > Another tiny defect is that the learner of a client is not correctly freed. > > I send out two pull requests for the above. > > Best, > Jinglei > > ________________________________________ > From: Daniele Sciascia <dan...@us...> > Sent: Monday, March 14, 2016 4:16 PM > To: Jinglei Ren > Cc: libpaxos > Subject: Re: [Libpaxos-general] Issues with libpaxos3? > > Hi Jinglei, > > Thanks for reporting these issue. I managed to reproduce and fix them. > There are two things going on: > > 1) Replicas sometimes just exit when you kill client. This is due to SIGPIPE. Basically the replica tries to write into a socket which has been closed on the other side. This has very simple fix, just ignore the signal. > > 2) The huge delays when you connect from one proposer to another are due to the fact that the new proposer doesn’t know the current paxos instance number, and therefore it will start to proposing from instance 1. Since these instances are already taken, paxos mandates that they are re-proposed with their original value. Which will take some time. > This also has a simple fix, but I’m not totally satisfied with it. I will think about it a bit more and see if I can come up with something better. > > I made these changes in a branch called ‘replica-fixes’, if you want to give it a try. I’ll try to properly incorporate these fixes later this week. > > Cheers, > Daniele > >> On Mar 12, 2016, at 9:16 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi Daniele, >> >> Right, I also feel it is strange. Nothing is changed of the latest commit in your repo. >> Below is a failure path, and attached is the output of the client. >> >> git clone https://bitbucket.org/sciascid/libpaxos.git >> mkdir libpaxos/build >> cd libpaxos/build >> cmake .. >> make >> ./sample/replica 0 ../paxos.conf -v > output0.txt & >> ./sample/replica 1 ../paxos.conf -v > output1.txt & >> ./sample/replica 2 ../paxos.conf -v > output2.txt & >> ./sample/client ../paxos.conf -o 10000 -p 1 >> output-c.txt # normal >> ./sample/client ../paxos.conf -o 10000 -p 0 >> output-c.txt # delay >> ./sample/client ../paxosonf -o 10000 -p 1 >> output-c.txt # delay; replica 1 fails on end >> ./sample/client ../paxosonf -o 10000 -p 2 >> output-c.txt # run on the rest two >> >> I test it on a physical Dell PC (4 cores, 8GB mem) and on a VM. Both use Ubuntu 14.04 desktop. I feel a higher chance of failure in a VM. >> >> Any recent commit pollutes the lib? >> >> Many thanks, >> Jinglei >> >> ________________________________________ >> From: Daniele Sciascia <dan...@us...> >> Sent: Saturday, March 12, 2016 2:10 AM >> To: Jinglei Ren >> Cc: libpaxos >> Subject: Re: [Libpaxos-general] Issues with libpaxos3? >> >> Hi Jinglei, >> >> What you describe should not happen. >> Are you using the samples provided with libpaxos? If so, can you provide precise steps to reproduce? >> >> Cheers, >> Daniele >> >>> On Mar 11, 2016, at 3:34 AM, Jinglei Ren <ji...@re...stems> wrote: >>> >>> Hi guys, >>> >>> I am testing libpaxos3. Just one client submits requests to one of three replicas. >>> >>> Here are some phenomena I encountered. Are they known issues, or my mistakes in use? Thanks in advance for your info! >>> >>> 1. When I specify a very large outstanding value of the client, say, 1000, replicas work well. But when I stop the client, the "leader" replica (i.e., the one the client connects to) stops as well. Should the outstanding value be set below the pre-execute window or something? >>> >>> 2. When the client connects to the replica with minimal ID (i.e., 0 in my case), everything runs well. But if I choose another replica by, say, "-p 1", it seems more likely that the connected replica would fail when I stop the client. What typically causes a replica to fail? >>> >>> 3. If I first connect a client to one replica, then stop it and start a new one connecting to another replica, there will be a long delay before replicas output delivered records. But the new client already begins to output statistics during the delay. Why is there a delay? Are client statistics broken? >>> >>> Best regards, >>> Jinglei >> >> <output-c.txt>------------------------------------------------------------------------------ >> Transform Data into Opportunity. >> Accelerate data analysis in your applications with >> Intel Data Analytics Acceleration Library. >> Click to learn more. >> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140_______________________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Jinglei R. <ji...@re...> - 2016-03-15 06:23:11
|
Daniele, Thank you very much for taking care of these issues. Your responsiveness adds to our confidence on building our work atop libpaxos3 :D Right, signal(SIGPIPE, SIG_IGN) solves the crash issue -- I noticed it because you used it in counter.c. So, maybe it should be applied to all sample programs, right? Another tiny defect is that the learner of a client is not correctly freed. I send out two pull requests for the above. Best, Jinglei ________________________________________ From: Daniele Sciascia <dan...@us...> Sent: Monday, March 14, 2016 4:16 PM To: Jinglei Ren Cc: libpaxos Subject: Re: [Libpaxos-general] Issues with libpaxos3? Hi Jinglei, Thanks for reporting these issue. I managed to reproduce and fix them. There are two things going on: 1) Replicas sometimes just exit when you kill client. This is due to SIGPIPE. Basically the replica tries to write into a socket which has been closed on the other side. This has very simple fix, just ignore the signal. 2) The huge delays when you connect from one proposer to another are due to the fact that the new proposer doesn’t know the current paxos instance number, and therefore it will start to proposing from instance 1. Since these instances are already taken, paxos mandates that they are re-proposed with their original value. Which will take some time. This also has a simple fix, but I’m not totally satisfied with it. I will think about it a bit more and see if I can come up with something better. I made these changes in a branch called ‘replica-fixes’, if you want to give it a try. I’ll try to properly incorporate these fixes later this week. Cheers, Daniele > On Mar 12, 2016, at 9:16 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi Daniele, > > Right, I also feel it is strange. Nothing is changed of the latest commit in your repo. > Below is a failure path, and attached is the output of the client. > > git clone https://bitbucket.org/sciascid/libpaxos.git > mkdir libpaxos/build > cd libpaxos/build > cmake .. > make > ./sample/replica 0 ../paxos.conf -v > output0.txt & > ./sample/replica 1 ../paxos.conf -v > output1.txt & > ./sample/replica 2 ../paxos.conf -v > output2.txt & > ./sample/client ../paxos.conf -o 10000 -p 1 >> output-c.txt # normal > ./sample/client ../paxos.conf -o 10000 -p 0 >> output-c.txt # delay > ./sample/client ../paxosonf -o 10000 -p 1 >> output-c.txt # delay; replica 1 fails on end > ./sample/client ../paxosonf -o 10000 -p 2 >> output-c.txt # run on the rest two > > I test it on a physical Dell PC (4 cores, 8GB mem) and on a VM. Both use Ubuntu 14.04 desktop. I feel a higher chance of failure in a VM. > > Any recent commit pollutes the lib? > > Many thanks, > Jinglei > > ________________________________________ > From: Daniele Sciascia <dan...@us...> > Sent: Saturday, March 12, 2016 2:10 AM > To: Jinglei Ren > Cc: libpaxos > Subject: Re: [Libpaxos-general] Issues with libpaxos3? > > Hi Jinglei, > > What you describe should not happen. > Are you using the samples provided with libpaxos? If so, can you provide precise steps to reproduce? > > Cheers, > Daniele > >> On Mar 11, 2016, at 3:34 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> I am testing libpaxos3. Just one client submits requests to one of three replicas. >> >> Here are some phenomena I encountered. Are they known issues, or my mistakes in use? Thanks in advance for your info! >> >> 1. When I specify a very large outstanding value of the client, say, 1000, replicas work well. But when I stop the client, the "leader" replica (i.e., the one the client connects to) stops as well. Should the outstanding value be set below the pre-execute window or something? >> >> 2. When the client connects to the replica with minimal ID (i.e., 0 in my case), everything runs well. But if I choose another replica by, say, "-p 1", it seems more likely that the connected replica would fail when I stop the client. What typically causes a replica to fail? >> >> 3. If I first connect a client to one replica, then stop it and start a new one connecting to another replica, there will be a long delay before replicas output delivered records. But the new client already begins to output statistics during the delay. Why is there a delay? Are client statistics broken? >> >> Best regards, >> Jinglei > > <output-c.txt>------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140_______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Daniele S. <dan...@us...> - 2016-03-14 08:16:26
|
Hi Jinglei, Thanks for reporting these issue. I managed to reproduce and fix them. There are two things going on: 1) Replicas sometimes just exit when you kill client. This is due to SIGPIPE. Basically the replica tries to write into a socket which has been closed on the other side. This has very simple fix, just ignore the signal. 2) The huge delays when you connect from one proposer to another are due to the fact that the new proposer doesn’t know the current paxos instance number, and therefore it will start to proposing from instance 1. Since these instances are already taken, paxos mandates that they are re-proposed with their original value. Which will take some time. This also has a simple fix, but I’m not totally satisfied with it. I will think about it a bit more and see if I can come up with something better. I made these changes in a branch called ‘replica-fixes’, if you want to give it a try. I’ll try to properly incorporate these fixes later this week. Cheers, Daniele > On Mar 12, 2016, at 9:16 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi Daniele, > > Right, I also feel it is strange. Nothing is changed of the latest commit in your repo. > Below is a failure path, and attached is the output of the client. > > git clone https://bitbucket.org/sciascid/libpaxos.git > mkdir libpaxos/build > cd libpaxos/build > cmake .. > make > ./sample/replica 0 ../paxos.conf -v > output0.txt & > ./sample/replica 1 ../paxos.conf -v > output1.txt & > ./sample/replica 2 ../paxos.conf -v > output2.txt & > ./sample/client ../paxos.conf -o 10000 -p 1 >> output-c.txt # normal > ./sample/client ../paxos.conf -o 10000 -p 0 >> output-c.txt # delay > ./sample/client ../paxosonf -o 10000 -p 1 >> output-c.txt # delay; replica 1 fails on end > ./sample/client ../paxosonf -o 10000 -p 2 >> output-c.txt # run on the rest two > > I test it on a physical Dell PC (4 cores, 8GB mem) and on a VM. Both use Ubuntu 14.04 desktop. I feel a higher chance of failure in a VM. > > Any recent commit pollutes the lib? > > Many thanks, > Jinglei > > ________________________________________ > From: Daniele Sciascia <dan...@us...> > Sent: Saturday, March 12, 2016 2:10 AM > To: Jinglei Ren > Cc: libpaxos > Subject: Re: [Libpaxos-general] Issues with libpaxos3? > > Hi Jinglei, > > What you describe should not happen. > Are you using the samples provided with libpaxos? If so, can you provide precise steps to reproduce? > > Cheers, > Daniele > >> On Mar 11, 2016, at 3:34 AM, Jinglei Ren <ji...@re...stems> wrote: >> >> Hi guys, >> >> I am testing libpaxos3. Just one client submits requests to one of three replicas. >> >> Here are some phenomena I encountered. Are they known issues, or my mistakes in use? Thanks in advance for your info! >> >> 1. When I specify a very large outstanding value of the client, say, 1000, replicas work well. But when I stop the client, the "leader" replica (i.e., the one the client connects to) stops as well. Should the outstanding value be set below the pre-execute window or something? >> >> 2. When the client connects to the replica with minimal ID (i.e., 0 in my case), everything runs well. But if I choose another replica by, say, "-p 1", it seems more likely that the connected replica would fail when I stop the client. What typically causes a replica to fail? >> >> 3. If I first connect a client to one replica, then stop it and start a new one connecting to another replica, there will be a long delay before replicas output delivered records. But the new client already begins to output statistics during the delay. Why is there a delay? Are client statistics broken? >> >> Best regards, >> Jinglei > > <output-c.txt>------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140_______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Jinglei R. <ji...@re...> - 2016-03-12 08:16:13
|
Hi Daniele, Right, I also feel it is strange. Nothing is changed of the latest commit in your repo. Below is a failure path, and attached is the output of the client. git clone https://bitbucket.org/sciascid/libpaxos.git mkdir libpaxos/build cd libpaxos/build cmake .. make ./sample/replica 0 ../paxos.conf -v > output0.txt & ./sample/replica 1 ../paxos.conf -v > output1.txt & ./sample/replica 2 ../paxos.conf -v > output2.txt & ./sample/client ../paxos.conf -o 10000 -p 1 >> output-c.txt # normal ./sample/client ../paxos.conf -o 10000 -p 0 >> output-c.txt # delay ./sample/client ../paxosonf -o 10000 -p 1 >> output-c.txt # delay; replica 1 fails on end ./sample/client ../paxosonf -o 10000 -p 2 >> output-c.txt # run on the rest two I test it on a physical Dell PC (4 cores, 8GB mem) and on a VM. Both use Ubuntu 14.04 desktop. I feel a higher chance of failure in a VM. Any recent commit pollutes the lib? Many thanks, Jinglei ________________________________________ From: Daniele Sciascia <dan...@us...> Sent: Saturday, March 12, 2016 2:10 AM To: Jinglei Ren Cc: libpaxos Subject: Re: [Libpaxos-general] Issues with libpaxos3? Hi Jinglei, What you describe should not happen. Are you using the samples provided with libpaxos? If so, can you provide precise steps to reproduce? Cheers, Daniele > On Mar 11, 2016, at 3:34 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi guys, > > I am testing libpaxos3. Just one client submits requests to one of three replicas. > > Here are some phenomena I encountered. Are they known issues, or my mistakes in use? Thanks in advance for your info! > > 1. When I specify a very large outstanding value of the client, say, 1000, replicas work well. But when I stop the client, the "leader" replica (i.e., the one the client connects to) stops as well. Should the outstanding value be set below the pre-execute window or something? > > 2. When the client connects to the replica with minimal ID (i.e., 0 in my case), everything runs well. But if I choose another replica by, say, "-p 1", it seems more likely that the connected replica would fail when I stop the client. What typically causes a replica to fail? > > 3. If I first connect a client to one replica, then stop it and start a new one connecting to another replica, there will be a long delay before replicas output delivered records. But the new client already begins to output statistics during the delay. Why is there a delay? Are client statistics broken? > > Best regards, > Jinglei |
From: Daniele S. <dan...@us...> - 2016-03-11 18:23:59
|
Hi Jinglei, What you describe should not happen. Are you using the samples provided with libpaxos? If so, can you provide precise steps to reproduce? Cheers, Daniele > On Mar 11, 2016, at 3:34 AM, Jinglei Ren <ji...@re...stems> wrote: > > Hi guys, > > I am testing libpaxos3. Just one client submits requests to one of three replicas. > > Here are some phenomena I encountered. Are they known issues, or my mistakes in use? Thanks in advance for your info! > > 1. When I specify a very large outstanding value of the client, say, 1000, replicas work well. But when I stop the client, the "leader" replica (i.e., the one the client connects to) stops as well. Should the outstanding value be set below the pre-execute window or something? > > 2. When the client connects to the replica with minimal ID (i.e., 0 in my case), everything runs well. But if I choose another replica by, say, "-p 1", it seems more likely that the connected replica would fail when I stop the client. What typically causes a replica to fail? > > 3. If I first connect a client to one replica, then stop it and start a new one connecting to another replica, there will be a long delay before replicas output delivered records. But the new client already begins to output statistics during the delay. Why is there a delay? Are client statistics broken? > > Best regards, > Jinglei |
From: Jinglei R. <ji...@re...> - 2016-03-11 03:09:23
|
Hi guys, I am testing libpaxos3. Just one client submits requests to one of three replicas. Here are some phenomena I encountered. Are they known issues, or my mistakes in use? Thanks in advance for your info! 1. When I specify a very large outstanding value of the client, say, 1000, replicas work well. But when I stop the client, the "leader" replica (i.e., the one the client connects to) stops as well. Should the outstanding value be set below the pre-execute window or something? 2. When the client connects to the replica with minimal ID (i.e., 0 in my case), everything runs well. But if I choose another replica by, say, "-p 1", it seems more likely that the connected replica would fail when I stop the client. What typically causes a replica to fail? 3. If I first connect a client to one replica, then stop it and start a new one connecting to another replica, there will be a long delay before replicas output delivered records. But the new client already begins to output statistics during the delay. Why is there a delay? Are client statistics broken? Best regards, Jinglei |
From: Gaurav A. <ga...@co...> - 2015-06-28 00:04:14
|
Hi All, I am trying to use libpaxos as per the sample code of replica and client. I supply following configuration in the config file: replica 0 127.0.0.1 8800 replica 1 127.0.0.1 8801 replica 2 127.0.0.1 8802 Then I start Paxos Replicas by providing IDs 0, 1 and 2 respectively in the function evpaxos_replica_init. I do not register timers because I do not need them. I start client similar to the sample code and use paxos_submit to send messages. When I start sending messages, replicas stop receiving them after few hundred messages. Am I missing something or doing something wrong? Is it necessary to send next message using paxos_submit after on_deliver gets called? Thanks, Gaurav |
From: JFW <ww...@as...> - 2014-06-17 08:22:45
|
Hi Maurice, this message is not strictly related to libpaxos, but it might be interesting to you, as you seem to be in an experimenting stage too. I understand you are looking for some DHT semantics with resilience against byzantine faults, correct? If so, you may be interested in http://ball.askemos.org . For PUT/GET/DELETE semantics, look for the WebDAV component. BTW: it does not use libpaxos (yet). I'm lurking here, because I'm looking for alternatives to our homegrown atomic broadcast implementation. Best /Jörg Am 16.06.2014 18:27, schrieb Marco Primi: > Hello Maurice, > > Paxos does not offer GET/PUT/DELETE interface, paxos implements Atomic > broadcast, which is a messaging primitive, not a database/storage kind > of thing. > The interface offered to applications built on top of paxos are > generally submit() (to send a message to all listeners) and deliver() > (a callback invoked by all listeners when the next message is accepted). > > The functions you see in storage are reserved for internal usage (by > the processes running the "acceptor" role). > > If you want to implement a database-like system, it's not hard to > imagine one built on top of this submit/deliver interface, but you're > going to have to implement in your own application. > > The file acceptor_0 and similar are not logfiles, I believe they are > the actual (binary) storage databases maintained by acceptors. > > Cheers, > Marco > > > > On 6/15/14, 6:25 AM, Nitin Arya wrote: >> Hello, >> >> I am new to Paxos learning i was going through the sample code and there is >> a function paxos_submit() that client.c calls . As i wish to perform the >> GET , PUT and DELETE operations on the database how can I implement that ? >> >> There is no option to invoke storage_get or a delete function and I would >> like to call a paxos_delete() function so a value gets deleted from >> database in all the replicas by using the paxos algorithm. So how can i do >> that ? Please guide. >> Also the log file there is a log file in /tmp/acceptor_0 but its garbage is >> there anywhere else the log file is getting saved ? >> >> Thanking you for your excellent libpaxos. >> Maurice >> >> >> >> ------------------------------------------------------------------------------ >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions >> Find What Matters Most in Your Big Data with HPCC Systems >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration >> http://p.sf.net/sfu/hpccsystems >> >> >> _______________________________________________ >> Libpaxos-general mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libpaxos-general > > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > > > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general -- A Sch(K)EMatic Operating System |
From: Marco P. <mar...@gm...> - 2014-06-16 16:28:19
|
Sorry forgot to include you in the reply (unless you're subscribed to the list). On 6/15/14, 6:25 AM, Nitin Arya wrote: > Hello, > > I am new to Paxos learning i was going through the sample code and there is > a function paxos_submit() that client.c calls . As i wish to perform the > GET , PUT and DELETE operations on the database how can I implement that ? > > There is no option to invoke storage_get or a delete function and I would > like to call a paxos_delete() function so a value gets deleted from > database in all the replicas by using the paxos algorithm. So how can i do > that ? Please guide. > Also the log file there is a log file in /tmp/acceptor_0 but its garbage is > there anywhere else the log file is getting saved ? > > Thanking you for your excellent libpaxos. > Maurice > > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > > > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Marco P. <mar...@gm...> - 2014-06-16 16:27:29
|
Hello Maurice, Paxos does not offer GET/PUT/DELETE interface, paxos implements Atomic broadcast, which is a messaging primitive, not a database/storage kind of thing. The interface offered to applications built on top of paxos are generally submit() (to send a message to all listeners) and deliver() (a callback invoked by all listeners when the next message is accepted). The functions you see in storage are reserved for internal usage (by the processes running the "acceptor" role). If you want to implement a database-like system, it's not hard to imagine one built on top of this submit/deliver interface, but you're going to have to implement in your own application. The file acceptor_0 and similar are not logfiles, I believe they are the actual (binary) storage databases maintained by acceptors. Cheers, Marco On 6/15/14, 6:25 AM, Nitin Arya wrote: > Hello, > > I am new to Paxos learning i was going through the sample code and there is > a function paxos_submit() that client.c calls . As i wish to perform the > GET , PUT and DELETE operations on the database how can I implement that ? > > There is no option to invoke storage_get or a delete function and I would > like to call a paxos_delete() function so a value gets deleted from > database in all the replicas by using the paxos algorithm. So how can i do > that ? Please guide. > Also the log file there is a log file in /tmp/acceptor_0 but its garbage is > there anywhere else the log file is getting saved ? > > Thanking you for your excellent libpaxos. > Maurice > > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > > > _______________________________________________ > Libpaxos-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libpaxos-general |
From: Nitin A. <ar...@uw...> - 2014-06-15 13:25:13
|
Hello, I am new to Paxos learning i was going through the sample code and there is a function paxos_submit() that client.c calls . As i wish to perform the GET , PUT and DELETE operations on the database how can I implement that ? There is no option to invoke storage_get or a delete function and I would like to call a paxos_delete() function so a value gets deleted from database in all the replicas by using the paxos algorithm. So how can i do that ? Please guide. Also the log file there is a log file in /tmp/acceptor_0 but its garbage is there anywhere else the log file is getting saved ? Thanking you for your excellent libpaxos. Maurice |
From: Daniele S. <dan...@gm...> - 2013-09-02 13:26:43
|
Hi Jörg, > Nevertheless I went over the source, trying to figure out > how hard it would be to use it. However I ran into some > corners where I figured I'd rather ask. I'm glad you asked. > What I want to do is replace the implementation of the protocol for > byzantine agreement in http://ball.askemos.org . BALL is a practical > programming environment on top of a persistent memory updated in BA. > The protocol implementation does work "good enough" in daily use > (delivers it's own website among others), but _does_ have some > problems when it comes to large loads. To avoid reinventing the wheel > I'd rather like go with something else. Libpaxos does not guarantee anything with respect to byzantine processes. Although there are version of the Paxos protocol that tolerate byzantine faults, that is not what is implemented in libpaxos. > Browsing the libpaxos source I'm left unsure; can somebody judge the > feasibility to tweak it to do the following things (which are used by > BALL's upper level): > > 1. BALL provides communicating sequential processes. Each process > implemented as a state machine of it's own. Thereby the set of > hosts out of which a majority has to agree upon progress is a > property of the process/part of the state of the particular > machine. > > Hence the implementation must not use one set of hosts for > everything. So far libpaxos appears toe me to rely on a single > config file containing the peers. Am I missing something? Libpaxos does not provide CSP. Libpaxos gives you primitives that implement Atomic Broadcast. You have a primitive that let's you submit messages through paxos, and you have a set of learner processes that are guaranteed to deliver messages in total order, through a callback. The configuration file lists acceptors and proposers. These are typically fixed and known to the rest of the system. An application on top of libpaxos implements a learner. Learners are not listed in the configuration file. > 2. BALL goes to possibly two upcalls per state request. The first is > a pure function from request and state to some state update and a > hash code to be used within the agreement process. (This is so to > easier catch accidental inclusion of non-deterministic input, > e.g. local file dates. If that happens, we get different hash > values and therefore no agreement.) > > This first upcall is typically called in it's own thread right > after a node receives a 'prepare' request. It's also kinda > optional: replace it with the identity function and use a hash > over the input request to end up exactly like those paxos > deployments I've seen so far. > > I'm not aware that this trick is used by any other consensus > protocol implementation. Ergo I doubt the API is already in > libpaxos. Would it be hard to add it? > > (The second upcall is just like with any paxos driven state > machine: do whatever with the request. Called when the majority > is confirmed.) In libpaxos, learners provide a callback that is called whenever a value has been delivered (accepted by a majority of acceptors). It should not be a big deal to add a callback for other purposes. Since this looks like application specific, can't you simply include a hash value in the payload? Or is this upcall called before delivery (i.e. before a majority of accepts?) > 3. BALL normally uses encrypted communication among nodes. > (typically SSL, though that's a plugin.) The identity of the > other nodes is derived from the certificates in use. How would > one do this in libpaxos? Libpaxos does not use encrypted communication. However, the library is divided in two parts: - The actual paxos implementation does not depend on any particular network library. See libpaxos/paxos. - On top of that there's a layer which is responsible for sending out and receiving messages. This part is implemented using libevent. See libpaxos/evpaxos. You could modify this part to provide SSL communication (libevent provides some SSL stuff, but I have never played with that). You could perhaps modify the config file accordingly. Another option is to replace this part altogether, for instance if you want to rely on a different a networking library. > 4. Wrt. testing: BALL was never really tested on local networks. > Typically nodes are plug computers behind some customer grade > ADSL. This has resulted in quite some debug efforts to handle > failing network connections. How is that with libpaxos? Should I > expect to have to go through this again? Libpaxos has been used in some research projects, and has been deployed over local area networks and over EC2 instances in different availability zones. I have never tested it over consumer grade ADSL. The library includes some unit tests that cover the protocol part (i.e. the part that does not rely on networking). There's no automated test that covers nodes that need to reconnect. However this *should* work. > 5. I have not yet found any code to actually re-sync state among > nodes. Should I look harder? Recovery is a bit a rough at the moment, and probably not very practical. Learners can ask acceptors for *old* messages. If a learner has to start from zero (i.e. after a failure), without any initial state, it has to deliver every message that has been submitted through paxos. Libpaxos handles that for you. In a practical system, a learner would probably take snapshots of its state periodically. A snapshot can then be used as the initial state for the learner (assuming that the persistent snapshot has not been destroyed as part of the failure). From this snapshot a learner should then re-deliver only missing the messages that have been delivered after the snapshot has been taken (i.e. you can tag the snapshot with the paxos instance number and re-deliver only subsequent paxos instances). Alternatively a learner could use the state of another learner as initial state. However, snapshots or recovery from another learner are application specific. What is missing is some API or parameter that let's you start a learner from a given paxos instance number. > 6. On reentrance: especially our SQLite engine will regularly call > back into the replication layer during the upcalls (see 2. above). > Does libpaxos expect such a reentry or should I expect havoc? Libpaxos is based on libevent which provides you with a single threaded event loop (although, as I mentioned you could rewrite the upper layer using something else). A learner in libpaxos delivers messages through a callback, within the same thread. . If you have to do heavy processing on the messages you deliver, you should hand the messages over to another thread. But, there are no other calls into libpaxos, at least for learners. > 7. The performance of libpaxos is stunning in comparsion to the > current implementation of BALL. However the latter was merly > meant as a prototype. For instance it goes though HTTPS to convey > even those small messages in the agreement protocol. That makes > for a lot of overhead. Nevertheless I wonder: are those timings > taken on a local, low latency network? How badly would they be > affected in a WAN deployment? Which timings? Are you referring to the graph on the sourceforge project page? If so, notice that graph refers to a version of a protocol called ringpaxos, which is optimized for throughput over LANs. In terms of latency it would probably not do very well on a WAN. The graph also shows throughput of an older version of libpaxos. Notice that those numbers were taken on a cluster of machines connected through a 1gigabit LAN. I don't have any timings specifically for libpaxos 3, although it probably performs better the older version of libpaxos shown on the graph. The performance really depends on the network, in the best case paxos requires two message delays to deliver a message. You can do the math from here... Libpaxos can pipeline the paxos instances in parallel, which can significantly speedup message delivery, especially in WANS. In WANS, the overall throughput is probably limited by the network capacity and the number of acceptors and learners in the system. Please notice that Libpaxos is also a prototype, as far as I know it has been used only in our research projects. It has rough corners, probably some features are missing, things might break. Though, it's slowly improving! Daniele |