bigdata-developers Mailing List for Blazegraph (powered by bigdata) (Page 67)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

Brian,

No, I was confused.  I did not see zookeeper and had assumed that it was not running.  Hence the rest of my questions.

I do think that we should schedule a call to talk about how services will be started and restarted because this all interacts with the HA quorum logic.  For example, hot spare recruitment, the target replication factor and the actual replication factor for a highly available service all interact.  The logic for starting those services therefore has to coordinate with the HA logic.

The quorums depend on having a simple majority.  This is built around a service replication factor, k.  k is an odd positive integer. k:=1 is not highly available.  k:=3 is highly available and there must be a minimum of (k+1)/2 = 2 services running for the quorum to meet.

If we start more than k services, then this can break the quorum logic.  Right now I have presumed a dependency on zookeeper and the existing services manager service to provide a distributed guarantee that we start exactly k services.

Planned down time and hot spare recruitment are both tricky issues for HA.  We have to actually annotate the service when it is brought down, e.g., for a rolling code base update, to prevent it being treated as a failure and having a hot spare automatically recruited.  Likewise, we have to pay careful attention when a hot spare is recruited to how it joins the write replication pipeline and when it joins the quorum.  If we follow a path where service start is not linked to the configuration information in zookeeper and the service management services, then this is all stuff that we need to work through together.  I think that we should do this soon -- ideally before I proceed with the zookeeper quorum integration based on the existing design.

I'd be happy to talk through the quorum design on the call as well.  Maybe we can do this in three pieces.  One on the quorum work that I have been doing, one on the deploy/config work that you have been doing, and then an open discussion on how these things could be used to provide the flexibility and high availability and how they interact with hot spare recruitment.

Thanks,
Bryan

________________________________
From: Brian Murphy [mailto:btm...@gm...]
Sent: Thursday, June 24, 2010 11:07 AM
To: big...@li...
Subject: Re: [Bigdata-developers] Alternate install/deploy mechanism

On Wed, Jun 23, 2010 at 8:23 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote:

Right now, bigdata depends on leader election semantics from zookeeper to start the appropriate mixture of services.  I did not see zookeeper running so I presume that you are handling that differently in this example.

No, zookeeper was running. If you run the disco-tool (or a jini browser),
you should see a service of type com.bigdata.service.QuorumPeerService;
which is zookeeper wrapped in a Jini service. Wrapping zookeeper in
a Jini service not only provides a means to more easily start and stop
zookeeper, but also provides a means to dynamically discover zookeeper
in the federation. Furthermore, the QuorumPeerService interface provides
a mechanism to customize how the services interact with zookeeper if
desired.

I would like to understand how we would handle the distributed decision making necessary to start an appropriate mixture of services with this proposal and also how we would handle the distributed decision making required to support the HA quorums.  I've attached an updated version of my draft for the HA quorum design and the proposed zookeeper integration.

Rather than using zookeeper to decide what gets started, this
mechanism allows one to configure what individual services get
started where, including the appropriate number of zookeeper
instances. Zookeeper would then be viewed as a discoverable
resource that can be used by the other services to determine who
is the leader and whether or not a quorum exists before those
services are used.

I realize that some jini implementation do provide capabilities similar to what zookeeper provides.

I'm not sure what jini implementations you're talking about.
Something not in the Jini starter kit?

Are you suggesting that or did you simply leave zookeeper and its roles in configuration management, leader elections, etc. out of the demo?

As I said above, zookeeper was not left out. But I also
said in my original posting that this work is not anywhere
near complete, and was posted to give folks an idea of what
could be done with install and deployment if the services
are re-implemented to a smart proxy model and move to
a shared nothing architecture; all of which I believe will be
a significant amount of work.

Perhaps in the future I should hold off on posting until the
work is more complete. Sorry if I caused confusion.

BrianM

On Wed, Jun 23, 2010 at 8:23 PM, Bryan Thompson <br...@sy...> wrote:

Right now, bigdata depends on leader election semantics from zookeeper to
> start the appropriate mixture of services.  I did not see zookeeper running
> so I presume that you are handling that differently in this example.
>

No, zookeeper was running. If you run the disco-tool (or a jini browser),
you should see a service of type com.bigdata.service.QuorumPeerService;
which is zookeeper wrapped in a Jini service. Wrapping zookeeper in
a Jini service not only provides a means to more easily start and stop
zookeeper, but also provides a means to dynamically discover zookeeper
in the federation. Furthermore, the QuorumPeerService interface provides
a mechanism to customize how the services interact with zookeeper if
desired.

I would like to understand how we would handle the distributed decision
> making necessary to start an appropriate mixture of services with this
> proposal and also how we would handle the distributed decision making
> required to support the HA quorums.  I've attached an updated version of my
> draft for the HA quorum design and the proposed zookeeper integration.
>

Rather than using zookeeper to decide what gets started, this
mechanism allows one to configure what individual services get
started where, including the appropriate number of zookeeper
instances. Zookeeper would then be viewed as a discoverable
resource that can be used by the other services to determine who
is the leader and whether or not a quorum exists before those
services are used.

> I realize that some jini implementation do provide capabilities similar to
> what zookeeper provides.
>

I'm not sure what jini implementations you're talking about.
Something not in the Jini starter kit?

> Are you suggesting that or did you simply leave zookeeper and its roles in
> configuration management, leader elections, etc. out of the demo?
>

As I said above, zookeeper was not left out. But I also
said in my original posting that this work is not anywhere
near complete, and was posted to give folks an idea of what
could be done with install and deployment if the services
are re-implemented to a smart proxy model and move to
a shared nothing architecture; all of which I believe will be
a significant amount of work.

Perhaps in the future I should hold off on posting until the
work is more complete. Sorry if I caused confusion.

BrianM

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

Brian,

Do you have any insight on how to ensure that only and all processes started by the services manager are taken down by a 'bigdata stop' without using "killall -9 java"?  E.g., do you know if child processes will be destroyed when the parent terminates across platforms?

When running this, I had the following errors related to the bigdata init.d script.

[root@dutl-57 ~]# cp /opt/bigdata/etc/bigdata.initd /etc/init.d/bigdata
[root@dutl-57 ~]# /etc/init.d/bigdata start
/etc/init.d/bigdata: line 15: /lib/lsb/init-functions: No such file or directory
/opt/bigdata/bin/initd-processes.sh: line 7: log_begin_msg: command not found
/opt/bigdata/bin/initd-processes.sh: line 14: sudo: command not found
/opt/bigdata/bin/initd-processes.sh: line 15: sudo: command not found
/opt/bigdata/bin/initd-processes.sh: line 33: sudo: command not found
I took the following steps to resolve those dependencies.

yum install sudo
yum install redhat-lsb
However, that gives me only:

[root@dutl-57 ~]# ls -l /etc/redhat-lsb
total 32
-rwxr-xr-x 1 root root  70 Nov 10  2007 lsb_killproc
-rwxr-xr-x 1 root root 243 Nov 10  2007 lsb_log_message
-rwxr-xr-x 1 root root  59 Nov 10  2007 lsb_pidofproc
-rwxr-xr-x 1 root root 254 Nov 10  2007 lsb_start_daemon
I do not have log_begin_msg.  I worked around this using "echo".

However, things are still not starting.  Maybe you can take a look at this host and see what is wrong with the configuration?

Thanks,
Bryan

________________________________
From: Brian Murphy [mailto:btm...@gm...]
Sent: Monday, June 21, 2010 11:42 AM
To: big...@li...
Subject: [Bigdata-developers] Alternate install/deploy mechanism

Just an FYI to those who might be interested.

Over the last few weeks I've been looking into a deployment
mechanism that might be used as an alternative to 'ant install'.
The investigation has currently taken the form of some code
that I've recently checked in to a personal branch (dev-btm).
If anyone is interested in taking a look at this work and
seeing how it might be used, one can follow the steps
outlined below.

BrianM

---------------------------------------------------------------------------
> cd <baseDir>

> svn checkout
  https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/dev-btm
    <baseDir>/bigdata/branches

> ant release-dist

> ls -al <baseDir>/bigdata/branches/dev-btm

  REL.bigdata-<version>-<date>.tgz (ex. REL.bigdata-0.82.0-210610.tgz)

Open 3 command windows, WinA, WinB, and WinC (use sudo or login as root)

---------------------------------------------------------------------------
-- WinA -- [install and deploy]

> su

# tar xzvof
  <baseDir>/bigdata/branches/dev-btm/REL.bigdata-<version>-<data>.tgz -C /opt

# cp /opt/bigdata/var/config/deploy/example-deploy.properties
     /opt/bigdata/var/config/deploy/deploy.properties

# vi /opt/bigdata/var/config/deploy/deploy.properties

Un-comment the following items in deploy.properties:

#federation.name<http://federation.name>=com.bigdata.group.0

#node.type=standalone
#node.layout=1-of-1
#node.role=bigdata

Uncomment and set the node.serviceNetwork item to the name
of the node's network interface card (NIC); which can be
found by typing 'ifconfig' on linux or 'ipconfig /all' on
Windows.

#node.serviceNetwork=eth0

Next,

# cp /opt/bigdata/etc/bigdata.initd /etc/init.d/bigdata

# /etc/init.d/bigdata start

# /etc/init.d/bigdata status

---------------------------------------------------------------------------
-- WinB -- [for non-graphical command line discovery tool]

> su

# /opt/bigdata/bin/disco-tool -v -g com.bigdata.group.0

---------------------------------------------------------------------------
-- WinC -- [for testing restart capability]

> su

# ps -elf | grep java

Pick one of the pids from the output and kill that process. For example,
suppose the java process associated with the "shardlocator" process
is 24539 (that is, '# ps -elf | grep java | grep shardlocator' ==> 24539)

# kill -9 24539

Observe the removed-then-added events displayed by the discovery tool
in WinB.

# /etc/init.d/bigdata status

Observe that the output indicates that the shardlocator process is
in the RUNNING state.

# ps -elf | grep java | grep sharelocator

Note that the service with process tag "shardlocator" appears,
but its pid is no longer 24539; because the process was restarted
upon the death of process 24539.

# /etc/init.d/bigdata stop

# /etc/init.d/bigdata status

Observe that all processes are in the STOPPED state

---------------------------------------------------------------------------
-- START ON BOOT --

To achieve start/restart on boot/reboot, for X=0-5, one can
creaate the appropriate soft links from /etc/rcX.d/KXXbigdata
and /etc/rcX.d/SXXbigdata to /etc/init.d/bigdata

For example, on Ubuntu, one would do the following:

# update-rc.d bigdata defaults

[to remove the soft links type, '# update-rc.d -f bigdata remove']

---------------------------------------------------------------------------
-- NOTES & CAVEATS --

- The mechanism above is intended to support the installation
  and deployment of a system that may include other components
  as well as bigdata, or a system that includes only bigdata.
  Thus, although the files named default-deploy.properties and
  example-deploy.properties reference only a role value of
  "bigdata", other roles can be easily added.

- The file example-deploy.properties is intended to be a
  template for the deploy.properties file that is used to
  communicate the top-level configuration to the mechanism.

  A single deploy.properties file cannot be used since the
  contents of that file will generally be different on different
  nodes in the system; although the goal of the deploy.properties
  file is to minimize the number of items that do differ from
  node to node.

  The deploy.properties file can be created by copying
  example-deploy.properties and then modifying the resulting
  file for the desired configuration (as shown above). Or
  it can be auto-generated by some tool (ex. scripts, awk/sed,
  puppet, etc.)

- To avoid breaking existing code, the beginnings of smart
  proxy based counterparts to the bigdata services were
  created. Currently, those smart proxy based implementations
  include only the smart proxy pattern, the required public
  service interfaces, a common service attribute, and the
  necessary infrastructure for starting and stopping each
  service. None of these service implementations currently
  provide any bigdata-specific functionality.

  The smart proxy based service implementations are intended
  to share nothing but convenient helper utilities and the
  top-level deploy.properties configuration file (when different
  service implementations run on the same node). In addition
  to sharing a common Jini configuration file, the current,
  purely remote, service implementations share up to eight
  layers of common ancestry in the form of abstract and
  concrete super classes; which makes it unclear how much
  work it will take to either add the necessary functionality
  (and tests) to the smart proxy based implementations, or
  convert the current layered implementations to a smart
  proxy model. Thus, much more investigation and work needs
  to be done.

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

See <http://localhost/job/BigData/changes>

2010	Jan	Feb (19)	Mar (8)	Apr (25)	May (16)	Jun (77)	Jul (131)	Aug (76)	Sep (30)	Oct (7)	Nov (3)	Dec
2011	Jan	Feb	Mar	Apr	May (2)	Jun (2)	Jul (16)	Aug (3)	Sep (1)	Oct	Nov (7)	Dec (7)
2012	Jan (10)	Feb (1)	Mar (8)	Apr (6)	May (1)	Jun (3)	Jul (1)	Aug	Sep (1)	Oct	Nov (8)	Dec (2)
2013	Jan (5)	Feb (12)	Mar (2)	Apr (1)	May (1)	Jun (1)	Jul (22)	Aug (50)	Sep (31)	Oct (64)	Nov (83)	Dec (28)
2014	Jan (31)	Feb (18)	Mar (27)	Apr (39)	May (45)	Jun (15)	Jul (6)	Aug (27)	Sep (6)	Oct (67)	Nov (70)	Dec (1)
2015	Jan (3)	Feb (18)	Mar (22)	Apr (121)	May (42)	Jun (17)	Jul (8)	Aug (11)	Sep (26)	Oct (15)	Nov (66)	Dec (38)
2016	Jan (14)	Feb (59)	Mar (28)	Apr (44)	May (21)	Jun (12)	Jul (9)	Aug (11)	Sep (4)	Oct (2)	Nov (1)	Dec
2017	Jan (20)	Feb (7)	Mar (4)	Apr (18)	May (7)	Jun (3)	Jul (13)	Aug (2)	Sep (4)	Oct (9)	Nov (2)	Dec (5)
2018	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

bigdata-developers Mailing List for Blazegraph (powered by bigdata) (Page 67)

Fast, scalable, robust graph database platform

bigdata-developers — List for bigdata developers