bigdata-developers Mailing List for Blazegraph (powered by bigdata) (Page 63)

Fast, scalable, robust graph database platform

Brought to you by: beebs, hyandell, mrpersonick, thompsonbry

bigdata-developers — List for bigdata developers

This list is closed, nobody may subscribe to it.

2010	Jan	Feb (19)	Mar (8)	Apr (25)	May (16)	Jun (77)	Jul (131)	Aug (76)	Sep (30)	Oct (7)	Nov (3)	Dec
2011	Jan	Feb	Mar	Apr	May (2)	Jun (2)	Jul (16)	Aug (3)	Sep (1)	Oct	Nov (7)	Dec (7)
2012	Jan (10)	Feb (1)	Mar (8)	Apr (6)	May (1)	Jun (3)	Jul (1)	Aug	Sep (1)	Oct	Nov (8)	Dec (2)
2013	Jan (5)	Feb (12)	Mar (2)	Apr (1)	May (1)	Jun (1)	Jul (22)	Aug (50)	Sep (31)	Oct (64)	Nov (83)	Dec (28)
2014	Jan (31)	Feb (18)	Mar (27)	Apr (39)	May (45)	Jun (15)	Jul (6)	Aug (27)	Sep (6)	Oct (67)	Nov (70)	Dec (1)
2015	Jan (3)	Feb (18)	Mar (22)	Apr (121)	May (42)	Jun (17)	Jul (8)	Aug (11)	Sep (26)	Oct (15)	Nov (66)	Dec (38)
2016	Jan (14)	Feb (59)	Mar (28)	Apr (44)	May (21)	Jun (12)	Jul (9)	Aug (11)	Sep (4)	Oct (2)	Nov (1)	Dec
2017	Jan (20)	Feb (7)	Mar (4)	Apr (18)	May (7)	Jun (3)	Jul (13)	Aug (2)	Sep (4)	Oct (9)	Nov (2)	Dec (5)
2018	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 61 62 63 64 65 .. 72 > >> (Page 63 of 72)

Re: [Bigdata-developers] Errors in bulk load tests (regression)

From: Bryan T. <br...@sy...> - 2010-07-28 21:53:47

Fred, I was trying to setup the ci performance machine for this purpose but ran into OS configuration issues which I passed to Brad to look at.

We really need to test against a cluster to have confidence here.  Can I get access to the 16 node cluster for a few days?

Bryan

Fred Oliver <fko...@gm...> wrote:



I'm running a test which generates (essentially lubm) data files and invokes MappedRDFDataLoadMaster.main(). The configuration is multiple services running on a single machine.

After svn update to the current branch, the test appears to run forever with many exceptions seen in the log file, apparently relating to IO. I've attached two versions of the error.log file.

One shows errors from FileChannelUtility, BufferOverflowException, etc.
The other shows AssertionErrors being throw between processes.

Would one of you take a look, please?

Fred

[Bigdata-developers] Hudson build is still unstable: BigData #102

From: husdon <no...@no...> - 2010-07-28 19:40:16

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #101

From: husdon <no...@no...> - 2010-07-28 18:48:51

See <http://localhost/job/BigData/changes>

[Bigdata-developers] bigdata(r) snapshot release: 0.83.1

From: Mike P. <mi...@sy...> - 2010-07-28 16:58:23

We've just release a new version of bigdata(r).  This is a bigdata(r) snapshot release.  This release is capable of loading 1B triples in under one hour on a 15 node cluster and has been used to load up to 13B triples on the same cluster.  JDK 1.6 is required.

See [1] for instructions on installing bigdata(r), [2] for the javadoc and [3] and [4] for news, questions, and the latest developments.  For more information about SYSTAP, LLC and bigdata, see [5].

Please note that we recommend checking out the code from SVN using the tag for this release.  The code will build automatically under eclipse.  You can also build the code using the ant script.  The cluster installer requires the use of the ant script.  You can checkout this release from the following URL:

https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_0_83_1

New features:

- Inlining XSD numerics, xsd:boolean, or custom datatype extensions into the statement indices.  Inlining provides a smaller footprint and faster queries for data using XSD numeric datatypes.  In order to introduce inlining we were forced to make a change in the physical schema for the RDF database which breaks binary compatibility for existing stores.  The recommended migration path is to export the data and import it into a new bigdata instance.

- Refactor of the dynamic sharding mechanism for higher performance.

- The SparseRowStore has been modified to make Unicode primary keys decodable by representing Unicode primary keys using UTF8 rather than Unicode sort keys.  This change also allows the SparseRowStore to work with the JDK collator option which embeds nul bytes into Unicode sort keys.  This change breaks binary compatibility, but there is an option for historical compatibility.

The roadmap for the next releases include:

- Query optimizations;

- Support for high-volume analytic query workloads and SPARQL aggregations;

- High availability for the journal and the cluster;

- Simplified deployment, configuration, and administration for clusters.

For more information, please see the following links:

[1] http://bigdata.wiki.sourceforge.net/GettingStarted
[2] http://www.bigdata.com/bigdata/docs/api/
[3] http://sourceforge.net/projects/bigdata/
[4] http://www.bigdata.com/blog
[5] http://www.systap.com/bigdata.htm

About bigdata:

Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata(r) may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata(r) RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance.

---
Mike Personick
SYSTAP, LLC.
801.328.3945 (office)
801.243.3678 (mobile)
801.938.5320 (skype)
mi...@sy...<mailto:mi...@sy...>

[Bigdata-developers] Hudson build is still unstable: BigData #100

From: husdon <no...@no...> - 2010-07-28 13:59:34

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #99

From: husdon <no...@no...> - 2010-07-27 23:33:28

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #98

From: husdon <no...@no...> - 2010-07-27 22:19:58

See <http://localhost/job/BigData/changes>

[Bigdata-developers] RELEASE 0.83.1

From: Bryan T. <br...@sy...> - 2010-07-27 21:33:50

This fixes two import statements which referenced a type which had been removed from the source tree.

Bryan

Re: [Bigdata-developers] RELEASE 0.83.0

From: Bryan T. <br...@sy...> - 2010-07-27 21:30:11

Oops.  It looks like a last minute edit broke the build on this.  I am going to tag 0.83.1. Bryan 

> -----Original Message-----
> From: Bryan Thompson [mailto:br...@sy...] 
> Sent: Tuesday, July 27, 2010 5:23 PM
> To: Bigdata Developers
> Subject: [Bigdata-developers] RELEASE 0.83.0
> 
> This is a bigdata (R) snapshot release.  This release is 
> capable of loading 1B triples in under one hour on a 15 node 
> cluster and has been used to load up to 13B triples on the 
> same cluster.  JDK 1.6 is required.
> 
> See [1] for instructions on installing bigdata(R), [2] for 
> the javadoc and [3] and [4] for news, questions, and the 
> latest developments.  For more information about SYSTAP, LLC 
> and bigdata, see [5].
> 
> Please note that we recommend checking out the code from SVN 
> using the tag for this release.  The code will build 
> automatically under eclipse.  You can also build the code 
> using the ant script.  The cluster installer requires the use 
> of the ant script.  You can checkout this release from the 
> following URL:
> 
> 	
> https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/B
> IGDATA_RELEASE_0_83_0
> 
> This corresponds to revision 3326.
> 
> New features:
> 
> - Inlining XSD numerics, xsd:boolean, or custom datatype extensions
>   into the statement indices.  Inlining provides a smaller footprint
>   and faster queries for data using XSD numeric datatypes.  In order
>   to introduce inlining we were forced to make a change in the
>   physical schema for the RDF database which breaks binary
>   compatibility for existing stores.  The recommended migration path
>   is to export the data and import it into a new bigdata instance.
> 
> - Refactor of the dynamic sharding mechanism for higher performance.
> 
> - The SparseRowStore has been modified to make Unicode primary keys
>   decodable by representing Unicode primary keys using UTF8 rather
>   than Unicode sort keys.  This change also allows the SparseRowStore
>   to work with the JDK collator option which embeds nul bytes into
>   Unicode sort keys.  This change breaks binary compatibility, but
>   there is an option for historical compatibility.
> 
> The roadmap for the next releases include:
> 
> - Query optimizations;
> 
> - Support for high-volume analytic query workloads and SPARQL 
> aggregations;
> 
> - High availability for the journal and the cluster;
> 
> - Simplified deployment, configuration, and administration 
> for clusters.
> 
> For more information, please see the following links:
> 
> [1] http://bigdata.wiki.sourceforge.net/GettingStarted
> [2] http://www.bigdata.com/bigdata/docs/api/
> [3] http://sourceforge.net/projects/bigdata/
> [4] http://www.bigdata.com/blog
> [5] http://www.systap.com/bigdata.htm
> 
> About bigdata: 
> 
> Bigdata(r) is a horizontally-scaled, general purpose storage 
> and computing fabric for ordered data (B+Trees), designed to 
> operate on either a single server or a cluster of commodity 
> hardware. Bigdata(r) uses dynamically partitioned key-range 
> shards in order to remove any realistic scaling limits - in 
> principle, bigdata(r) may be deployed on 10s, 100s, or even 
> thousands of machines and new capacity may be added 
> incrementally without requiring the full reload of all data. 
> The bigdata(r) RDF database supports RDFS and OWL Lite 
> reasoning, high-level query (SPARQL), and datum level provenance. 
> --------------------------------------------------------------
> ----------------
> The Palm PDK Hot Apps Program offers developers who use the 
> Plug-In Development Kit to bring their C/C++ apps to Palm for 
> a share of $1 Million in cash or HP Products. Visit us here 
> for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>

[Bigdata-developers] Build failed in Hudson: BigData #96

From: husdon <no...@no...> - 2010-07-27 21:26:20

See <http://localhost/job/BigData/96/changes>

Changes:

[thompsonbry] bumped the revision number....

[thompsonbry] Added the release notes.

[thompsonbry] Cleaned up IndexMetadata to remove support for IAddressSerializer and ISplitHandler.  Those interfaces and their implementations are now gone.

[mrpersonick] getting rid of ITermIdCodes

------------------------------------------
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Updating https://bigdata.svn.sourceforge.net/svnroot/bigdata/trunk
U         bigdata-jini/src/java/com/bigdata/service/jini/benchmark/ThroughputMaster.java
U         build.properties
D         bigdata/src/java/com/bigdata/btree/AddressSerializer.java
D         bigdata/src/java/com/bigdata/btree/IAddressSerializer.java
D         bigdata/src/java/com/bigdata/btree/PackedAddressSerializer.java
D         bigdata/src/java/com/bigdata/btree/ISplitHandler.java
U         bigdata/src/java/com/bigdata/btree/IndexMetadata.java
U         bigdata/src/java/com/bigdata/btree/BloomFilterFactory.java
U         bigdata/src/java/com/bigdata/btree/isolation/IsolatedFusedView.java
U         bigdata/src/java/com/bigdata/btree/view/FusedView.java
U         bigdata/src/java/com/bigdata/search/ReadIndexTask.java
D         bigdata/src/java/com/bigdata/resources/DefaultSplitHandler.java
U         bigdata/src/java/com/bigdata/resources/ViewMetadata.java
U         bigdata/src/releases/RELEASE_0_83_0.txt
U         bigdata-rdf/src/test/com/bigdata/rdf/spo/TestSPORelation.java
U         bigdata-rdf/src/test/com/bigdata/rdf/spo/TestSPO.java
U         bigdata-rdf/src/test/com/bigdata/rdf/spo/TestSPOTupleSerializer.java
U         bigdata-rdf/src/test/com/bigdata/rdf/spo/TestSPOValueCoders.java
A         bigdata-rdf/src/test/com/bigdata/rdf/internal/ITermIdCodes.java
U         bigdata-rdf/src/test/com/bigdata/rdf/internal/LegacyTermIdUtility.java
D         bigdata-rdf/src/java/com/bigdata/rdf/lexicon/ITermIdFilter.java
D         bigdata-rdf/src/java/com/bigdata/rdf/lexicon/ITermIdCodes.java
U         bigdata-rdf/src/java/com/bigdata/rdf/lexicon/TermIdEncoder.java
A         bigdata-rdf/src/java/com/bigdata/rdf/lexicon/ITermIVFilter.java
U         bigdata-rdf/src/java/com/bigdata/rdf/spo/SPORelation.java
U         bigdata-rdf/src/java/com/bigdata/rdf/inf/BackchainTypeResourceIterator.java
U         bigdata-rdf/src/java/com/bigdata/rdf/internal/VTE.java
U         bigdata-rdf/src/java/com/bigdata/rdf/store/IRawTripleStore.java
U         bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java
At revision 3329
[trunk] $ ant clean jar junit javadoc
Buildfile: <http://localhost/job/BigData/ws/trunk/build.xml>

clean:
   [delete] Deleting directory <http://localhost/job/BigData/ws/trunk/ant-build>
   [delete] Deleting directory <http://localhost/job/BigData/ws/trunk/bigdata-test>
   [delete] Deleting directory <http://localhost/job/BigData/ws/trunk/dist>

prepare:
     [echo] version=bigdata-0.83.0-270710
    [mkdir] Created dir: <http://localhost/job/BigData/ws/trunk/ant-build>
    [mkdir] Created dir: <http://localhost/job/BigData/ws/trunk/ant-build/classes>
    [mkdir] Created dir: <http://localhost/job/BigData/ws/trunk/ant-build/docs>
    [mkdir] Created dir: <http://localhost/job/BigData/ws/trunk/ant-build/lib>

compile:
    [javac] <http://localhost/job/BigData/ws/trunk/build.xml>:75: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
    [javac] Compiling 1261 source files to <http://localhost/job/BigData/ws/trunk/ant-build/classes>
    [javac] javac 1.6.0_17
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:48: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac] import sun.misc.Signal;
    [javac]                ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:49: warning: sun.misc.SignalHandler is Sun proprietary API and may be removed in a future release
    [javac] import sun.misc.SignalHandler;
    [javac]                ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:383: warning: sun.misc.SignalHandler is Sun proprietary API and may be removed in a future release
    [javac]     private class SigHUPHandler implements SignalHandler {
    [javac]                                            ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:385: warning: sun.misc.SignalHandler is Sun proprietary API and may be removed in a future release
    [javac]         private final SignalHandler oldHandler;
    [javac]                       ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:417: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]         public void handle(final Signal sig) {
    [javac]                                  ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:26: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac] import sun.misc.Signal;
    [javac]                ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:27: warning: sun.misc.SignalHandler is Sun proprietary API and may be removed in a future release
    [javac] import sun.misc.SignalHandler;
    [javac]                ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:163: warning: sun.misc.SignalHandler is Sun proprietary API and may be removed in a future release
    [javac]     private class SigHUPHandler implements SignalHandler {
    [javac]                                            ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:165: warning: sun.misc.SignalHandler is Sun proprietary API and may be removed in a future release
    [javac]         private final SignalHandler oldHandler;
    [javac]                       ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:194: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]         public void handle(final Signal sig) {
    [javac]                                  ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java>:83: cannot find symbol
    [javac] symbol  : class ITermIdCodes
    [javac] location: package com.bigdata.rdf.internal
    [javac] import com.bigdata.rdf.internal.ITermIdCodes;
    [javac]                                ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-rdf/src/java/com/bigdata/rdf/store/IRawTripleStore.java>:32: cannot find symbol
    [javac] symbol  : class ITermIdCodes
    [javac] location: package com.bigdata.rdf.internal
    [javac] import com.bigdata.rdf.internal.ITermIdCodes;
    [javac]                                ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:406: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]             final Signal signal = new Signal(signalName);
    [javac]                   ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:406: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]             final Signal signal = new Signal(signalName);
    [javac]                                       ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/jini/start/ServicesManagerServer.java>:408: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]             this.oldHandler = Signal.handle(signal, this);
    [javac]                               ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:183: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]             final Signal signal = new Signal(signalName);
    [javac]                   ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:183: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]             final Signal signal = new Signal(signalName);
    [javac]                                       ^
    [javac] <http://localhost/job/BigData/ws/trunk/bigdata-jini/src/java/com/bigdata/service/jini/LoadBalancerServer.java>:185: warning: sun.misc.Signal is Sun proprietary API and may be removed in a future release
    [javac]             this.oldHandler = Signal.handle(signal, this);
    [javac]                               ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 2 errors
    [javac] 16 warnings

BUILD FAILED
<http://localhost/job/BigData/ws/trunk/build.xml>:75: Compile failed; see the compiler error output for details.

Total time: 6 seconds
Publishing Javadoc
Archiving artifacts
Recording test results
Performing Post build task...
Could not match :JUNIT RUN COMPLETE  : False
Logical operation result is FALSE
Skipping script  : /root/rsync.sh
END OF POST BUILD TASK 	: 0

[Bigdata-developers] RELEASE 0.83.0

From: Bryan T. <br...@sy...> - 2010-07-27 21:24:10

This is a bigdata (R) snapshot release.  This release is capable of loading 1B
triples in under one hour on a 15 node cluster and has been used to load up to
13B triples on the same cluster.  JDK 1.6 is required.

See [1] for instructions on installing bigdata(R), [2] for the javadoc and [3]
and [4] for news, questions, and the latest developments.  For more information
about SYSTAP, LLC and bigdata, see [5].

Please note that we recommend checking out the code from SVN using the tag for
this release.  The code will build automatically under eclipse.  You can also
build the code using the ant script.  The cluster installer requires the use of
the ant script.  You can checkout this release from the following URL:

	https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_0_83_0

This corresponds to revision 3326.

New features:

- Inlining XSD numerics, xsd:boolean, or custom datatype extensions
  into the statement indices.  Inlining provides a smaller footprint
  and faster queries for data using XSD numeric datatypes.  In order
  to introduce inlining we were forced to make a change in the
  physical schema for the RDF database which breaks binary
  compatibility for existing stores.  The recommended migration path
  is to export the data and import it into a new bigdata instance.

- Refactor of the dynamic sharding mechanism for higher performance.

- The SparseRowStore has been modified to make Unicode primary keys
  decodable by representing Unicode primary keys using UTF8 rather
  than Unicode sort keys.  This change also allows the SparseRowStore
  to work with the JDK collator option which embeds nul bytes into
  Unicode sort keys.  This change breaks binary compatibility, but
  there is an option for historical compatibility.

The roadmap for the next releases include:

- Query optimizations;

- Support for high-volume analytic query workloads and SPARQL aggregations;

- High availability for the journal and the cluster;

- Simplified deployment, configuration, and administration for clusters.

For more information, please see the following links:

[1] http://bigdata.wiki.sourceforge.net/GettingStarted
[2] http://www.bigdata.com/bigdata/docs/api/
[3] http://sourceforge.net/projects/bigdata/
[4] http://www.bigdata.com/blog 
[5] http://www.systap.com/bigdata.htm

About bigdata: 

Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric
for ordered data (B+Trees), designed to operate on either a single server or a
cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range
shards in order to remove any realistic scaling limits - in principle, bigdata(r)
may be deployed on 10s, 100s, or even thousands of machines and new capacity may
be added incrementally without requiring the full reload of all data. The bigdata(r)
RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL),
and datum level provenance.

[Bigdata-developers] Hudson build is still unstable: BigData #94

From: husdon <no...@no...> - 2010-07-27 20:58:05

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #93

From: husdon <no...@no...> - 2010-07-27 20:08:49

See <http://localhost/job/BigData/changes>

[Bigdata-developers] lubm and bsbm benchmarks

From: Bryan T. <br...@sy...> - 2010-07-27 19:33:28

All,

I've moved these benchmarks from their top-level directories into the bigdata-perf directory.

Bryan

[Bigdata-developers] Hudson build is still unstable: BigData #91

From: husdon <no...@no...> - 2010-07-27 18:59:15

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #90

From: husdon <no...@no...> - 2010-07-27 18:11:14

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #89

From: husdon <no...@no...> - 2010-07-27 17:26:48

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #88

From: husdon <no...@no...> - 2010-07-27 16:43:53

See <http://localhost/job/BigData/changes>

[Bigdata-developers] Hudson build is still unstable: BigData #87

From: husdon <no...@no...> - 2010-07-27 15:58:04

See <http://localhost/job/BigData/changes>

[Bigdata-developers] lexicon refactor branch closed / trunk commits suspended for release.

From: Bryan T. <br...@sy...> - 2010-07-27 13:27:06

The lexicon branch is now closed.  I am going to merge it into the trunk and prepare a release.  Please suspend commits on the trunk other than bug fixes for this release.

Thanks,
Bryan

Re: [Bigdata-developers] Why zookeeper?

From: Bryan T. <br...@sy...> - 2010-07-26 22:22:10

Attachments: Bigdata-HA-Quorum-Detailed-Design.pdf

Fred,

Can you provide some examples for the HA enabled per-instance configuration?  I would like the instance based configuration to be compatible with the rules-based approach.  In addition, the services will need to publish certain metadata about the logical service instances into zookeeper for the HA quorums.    I've attach a current copy of the HA/zookeeper integration document which specifies the zookeeper paths that are used by the quorum.  Everything is organized under the zpath of the logical service.  Take a look and then let's see if we can bring this down to some concrete points for both mechanisms to operate.

Bryan

> -----Original Message-----
> From: Bryan Thompson 
> Sent: Monday, July 26, 2010 6:16 PM
> To: 'Fred Oliver'
> Cc: Bigdata Developers
> Subject: RE: [Bigdata-developers] Why zookeeper?
> 
> Fred,
> 
> > The term
> > "logical service" seems overloaded in that (as far as I 
> have figured 
> > out) it has different meanings in the pre-HA and post-HA 
> discussions.
> 
> It is the same usage.  The pre-HA code base was designed with 
> the HA feature set in mind.  A logical service corresponds to 
> some collection of actual service instances which provide the 
> highly available logical service.
> 
> I get that you are not fond of the rules-based scheme.  What 
> I would like to know is how HA will be handled within the 
> scheme that you are proposing.
> 
> Thanks,
> Bryan
> 
> > -----Original Message-----
> > From: Fred Oliver [mailto:fko...@gm...]
> > Sent: Monday, July 26, 2010 6:05 PM
> > To: Bryan Thompson
> > Cc: Bigdata Developers
> > Subject: Re: [Bigdata-developers] Why zookeeper?
> > 
> > On Mon, Jul 26, 2010 at 2:37 PM, Bryan Thompson <br...@sy...> 
> > wrote:
> > > I've been out for a bit with my head wrapped around other
> > things.  Can you remind me how we are going to handle the 
> assignment 
> > of physical service nodes to logical service nodes in this design?
> > 
> > What is a node (physical or logical) in this context? I 
> think you mean 
> > that a physical node is a machine. If so, then what is a 
> logical node?
> > 
> > As I understand the services, all service instances are 
> physical. The 
> > logical service construct exists only as an abstraction on 
> which the 
> > rules in the rules based specification may operate. If I understand 
> > correctly, then with the instance level specification, there are no 
> > logical services and no logical nodes. But still, what's a logical 
> > node?
> > 
> > > Concerning your points below, either scheme can be made
> > fully deterministic.  It is only a matter of specifying that a 
> > specific service must run on a specific host (a constraint on what 
> > services can run on a given host).
> > 
> > If you can make the rules based scheme deterministic, then 
> please do!
> > But I think meant only that you can write rules that constrain the 
> > behavior such that the result in those particular cases are 
> > deterministic, which is an entirely different matter. The 
> rules based 
> > scheme makes for much more code, locking and synchronization, very 
> > difficult testing, and a less maintainable environment.
> > 
> > > I see rules based specification as more flexible because
> > you can administer the rule set rather than making a 
> decision for each 
> > node that you bring online.  I agree that it is more adaptive since 
> > the constraints are being specified at a level above the individual 
> > node. I see the rules as globally transparent because they are just 
> > data which could be edited using, for example, a web 
> browser backed by 
> > an application looking at the data in zookeeper where as 
> the instance 
> > level specification must be edited on each node.  I think 
> of rules as 
> > more scalable because you do not have to figure out what 
> you are going 
> > to do with each node.  The node will be put to a purpose 
> for which it 
> > is fit and for which there is a need.
> > 
> > I think we're going to disagree about merits of instance vs. 
> > rules schemes, and I hope we can modularize the system so that the 
> > schemes are separate modules and independent of the core 
> functionality 
> > (which wouldn't need zookeeper).
> > 
> > My biggest concern about that last paragraph (or the whole
> > message?) is that this use of zookeeper seemed  unnecessary and 
> > confusing. That is, why wouldn't the web app interact with 
> the service 
> > instances directly to get/set configurations using well defined, 
> > testable public interfaces, rather than use zookeeper as a hub? 
> > (That's the secret messages in dead drops thing.)
> > 
> > > However, as long as we have a reasonable path for HA
> > service allocation which respects the need to associate specific 
> > physical service instances with specific logical service instances 
> > then it seems reasonable that either approach could be 
> used.  It just 
> > becomes a matter of how we describe what services the node 
> will start 
> > and whether or not we run the ServicesManagerService on that node.
> > 
> > Clearly HA needs set of like service instances to work in 
> > active/active or active/passive arrangements. The term "logical 
> > service" seems overloaded in that (as far as I have figured out) it 
> > has different meanings in the pre-HA and post-HA discussions. I can 
> > see that an HA logical data service would refer to the 
> group of data 
> > service instances which together host a single shard. But this 
> > definition is very specific and differs from the more 
> general meaning 
> > in the rules based specification discussion, which is confusing.
> > 
> > The instance-based scheme can be used for HA as well as long as the 
> > service configurations are extended to indicate which "HA logical"
> > group a service belonged to.
> > 
> > Fred
> > 
> > > PS: Concerning "flex", the big leverage for flexing the
> > cluster will come with a shared disk architecture (rather than the 
> > present shared nothing architecture).  Using a shared disk 
> > architecture, the nodes can then be started or stopped 
> without regard 
> > to the persistent state, which would be on managed storage.  That 
> > would make it possible to tradeoff dynamically which nodes were 
> > assigned to which application, where the application might 
> be bigdata, 
> > hadoop, etc.  In that kind of scenario I find it difficult 
> to imagine 
> > that an operator will be in the loop when a node is torn 
> down and then 
> > repurposed to a different application.  However, this kind 
> of "flex" 
> > is outside the scope of the current effort.
> > 
> > OK. I see the primary benefit of this arrangement as making 
> hot spares 
> > become operational much more quickly, but I don't see how 
> this applies 
> > to the rules vs. instance based specifications discussion. Both 
> > schemes can handle this arrangement.
> > 
> > Fred
> >

Re: [Bigdata-developers] Why zookeeper?

From: Bryan T. <br...@sy...> - 2010-07-26 22:17:18

Fred,

> The term 
> "logical service" seems overloaded in that (as far as I have 
> figured out) it has different meanings in the pre-HA and 
> post-HA discussions.  

It is the same usage.  The pre-HA code base was designed with the HA feature set in mind.  A logical service corresponds to some collection of actual service instances which provide the highly available logical service.

I get that you are not fond of the rules-based scheme.  What I would like to know is how HA will be handled within the scheme that you are proposing.

Thanks,
Bryan

> -----Original Message-----
> From: Fred Oliver [mailto:fko...@gm...] 
> Sent: Monday, July 26, 2010 6:05 PM
> To: Bryan Thompson
> Cc: Bigdata Developers
> Subject: Re: [Bigdata-developers] Why zookeeper?
> 
> On Mon, Jul 26, 2010 at 2:37 PM, Bryan Thompson 
> <br...@sy...> wrote:
> > I've been out for a bit with my head wrapped around other 
> things.  Can you remind me how we are going to handle the 
> assignment of physical service nodes to logical service nodes 
> in this design?
> 
> What is a node (physical or logical) in this context? I think 
> you mean that a physical node is a machine. If so, then what 
> is a logical node?
> 
> As I understand the services, all service instances are 
> physical. The logical service construct exists only as an 
> abstraction on which the rules in the rules based 
> specification may operate. If I understand correctly, then 
> with the instance level specification, there are no logical 
> services and no logical nodes. But still, what's a logical node?
> 
> > Concerning your points below, either scheme can be made 
> fully deterministic.  It is only a matter of specifying that 
> a specific service must run on a specific host (a constraint 
> on what services can run on a given host).
> 
> If you can make the rules based scheme deterministic, then please do!
> But I think meant only that you can write rules that 
> constrain the behavior such that the result in those 
> particular cases are deterministic, which is an entirely 
> different matter. The rules based scheme makes for much more 
> code, locking and synchronization, very difficult testing, 
> and a less maintainable environment.
> 
> > I see rules based specification as more flexible because 
> you can administer the rule set rather than making a decision 
> for each node that you bring online.  I agree that it is more 
> adaptive since the constraints are being specified at a level 
> above the individual node. I see the rules as globally 
> transparent because they are just data which could be edited 
> using, for example, a web browser backed by an application 
> looking at the data in zookeeper where as the instance level 
> specification must be edited on each node.  I think of rules 
> as more scalable because you do not have to figure out what 
> you are going to do with each node.  The node will be put to 
> a purpose for which it is fit and for which there is a need.
> 
> I think we're going to disagree about merits of instance vs. 
> rules schemes, and I hope we can modularize the system so 
> that the schemes are separate modules and independent of the 
> core functionality (which wouldn't need zookeeper).
> 
> My biggest concern about that last paragraph (or the whole 
> message?) is that this use of zookeeper seemed  unnecessary 
> and confusing. That is, why wouldn't the web app interact 
> with the service instances directly to get/set configurations 
> using well defined, testable public interfaces, rather than 
> use zookeeper as a hub? (That's the secret messages in dead 
> drops thing.)
> 
> > However, as long as we have a reasonable path for HA 
> service allocation which respects the need to associate 
> specific physical service instances with specific logical 
> service instances then it seems reasonable that either 
> approach could be used.  It just becomes a matter of how we 
> describe what services the node will start and whether or not 
> we run the ServicesManagerService on that node.
> 
> Clearly HA needs set of like service instances to work in 
> active/active or active/passive arrangements. The term 
> "logical service" seems overloaded in that (as far as I have 
> figured out) it has different meanings in the pre-HA and 
> post-HA discussions. I can see that an HA logical data 
> service would refer to the group of data service instances 
> which together host a single shard. But this definition is 
> very specific and differs from the more general meaning in 
> the rules based specification discussion, which is confusing.
> 
> The instance-based scheme can be used for HA as well as long 
> as the service configurations are extended to indicate which 
> "HA logical"
> group a service belonged to.
> 
> Fred
> 
> > PS: Concerning "flex", the big leverage for flexing the 
> cluster will come with a shared disk architecture (rather 
> than the present shared nothing architecture).  Using a 
> shared disk architecture, the nodes can then be started or 
> stopped without regard to the persistent state, which would 
> be on managed storage.  That would make it possible to 
> tradeoff dynamically which nodes were assigned to which 
> application, where the application might be bigdata, hadoop, 
> etc.  In that kind of scenario I find it difficult to imagine 
> that an operator will be in the loop when a node is torn down 
> and then repurposed to a different application.  However, 
> this kind of "flex" is outside the scope of the current effort.
> 
> OK. I see the primary benefit of this arrangement as making 
> hot spares become operational much more quickly, but I don't 
> see how this applies to the rules vs. instance based 
> specifications discussion. Both schemes can handle this arrangement.
> 
> Fred
>

Re: [Bigdata-developers] Why zookeeper?

From: Fred O. <fko...@gm...> - 2010-07-26 22:05:22

On Mon, Jul 26, 2010 at 2:37 PM, Bryan Thompson <br...@sy...> wrote:
> I've been out for a bit with my head wrapped around other things. Can you remind me how we are going to handle the assignment of physical service nodes to logical service nodes in this design?

What is a node (physical or logical) in this context? I think you mean
that a physical node is a machine. If so, then what is a logical node?

As I understand the services, all service instances are physical. The
logical service construct exists only as an abstraction on which the
rules in the rules based specification may operate. If I understand
correctly, then with the instance level specification, there are no
logical services and no logical nodes. But still, what's a logical
node?

> Concerning your points below, either scheme can be made fully deterministic. It is only a matter of specifying that a specific service must run on a specific host (a constraint on what services can run on a given host).

If you can make the rules based scheme deterministic, then please do!
But I think meant only that you can write rules that constrain the
behavior such that the result in those particular cases are
deterministic, which is an entirely different matter. The rules based
scheme makes for much more code, locking and synchronization, very
difficult testing, and a less maintainable environment.

> I see rules based specification as more flexible because you can administer the rule set rather than making a decision for each node that you bring online. I agree that it is more adaptive since the constraints are being specified at a level above the individual node. I see the rules as globally transparent because they are just data which could be edited using, for example, a web browser backed by an application looking at the data in zookeeper where as the instance level specification must be edited on each node. I think of rules as more scalable because you do not have to figure out what you are going to do with each node. The node will be put to a purpose for which it is fit and for which there is a need.

I think we're going to disagree about merits of instance vs. rules
schemes, and I hope we can modularize the system so that the schemes
are separate modules and independent of the core functionality (which
wouldn't need zookeeper).

My biggest concern about that last paragraph (or the whole message?)
is that this use of zookeeper seemed unnecessary and confusing. That
is, why wouldn't the web app interact with the service instances
directly to get/set configurations using well defined, testable public
interfaces, rather than use zookeeper as a hub? (That's the secret
messages in dead drops thing.)

> However, as long as we have a reasonable path for HA service allocation which respects the need to associate specific physical service instances with specific logical service instances then it seems reasonable that either approach could be used. It just becomes a matter of how we describe what services the node will start and whether or not we run the ServicesManagerService on that node.

Clearly HA needs set of like service instances to work in
active/active or active/passive arrangements. The term "logical
service" seems overloaded in that (as far as I have figured out) it
has different meanings in the pre-HA and post-HA discussions. I can
see that an HA logical data service would refer to the group of data
service instances which together host a single shard. But this
definition is very specific and differs from the more general meaning
in the rules based specification discussion, which is confusing.

The instance-based scheme can be used for HA as well as long as the
service configurations are extended to indicate which "HA logical"
group a service belonged to.

Fred

> PS: Concerning "flex", the big leverage for flexing the cluster will come with a shared disk architecture (rather than the present shared nothing architecture). Using a shared disk architecture, the nodes can then be started or stopped without regard to the persistent state, which would be on managed storage. That would make it possible to tradeoff dynamically which nodes were assigned to which application, where the application might be bigdata, hadoop, etc. In that kind of scenario I find it difficult to imagine that an operator will be in the loop when a node is torn down and then repurposed to a different application. However, this kind of "flex" is outside the scope of the current effort.

OK. I see the primary benefit of this arrangement as making hot spares
become operational much more quickly, but I don't see how this applies
to the rules vs. instance based specifications discussion. Both
schemes can handle this arrangement.

Fred

Re: [Bigdata-developers] Why zookeeper?

From: Bryan T. <br...@sy...> - 2010-07-26 18:38:44

Fred,

I've been out for a bit with my head wrapped around other things.  Can you remind me how we are going to handle the assignment of physical service nodes to logical service nodes in this design?

Concerning your points below, either scheme can be made fully deterministic.  It is only a matter of specifying that a specific service must run on a specific host (a constraint on what services can run on a given host).

Maybe another way to look at this is instance level specification versus rules based specification.  With instance level specification, you directly state what services are running on each node.  With rules based specification, you describe which kinds of services can run on which kinds of nodes.

I see rules based specification as more flexible because you can administer the rule set rather than making a decision for each node that you bring online.  I agree that it is more adaptive since the constraints are being specified at a level above the individual node. I see the rules as globally transparent because they are just data which could be edited using, for example, a web browser backed by an application looking at the data in zookeeper where as the instance level specification must be edited on each node.  I think of rules as more scalable because you do not have to figure out what you are going to do with each node.  The node will be put to a purpose for which it is fit and for which there is a need.

However, as long as we have a reasonable path for HA service allocation which respects the need to associate specific physical service instances with specific logical service instances then it seems reasonable that either approach could be used.  It just becomes a matter of how we describe what services the node will start and whether or not we run the ServicesManagerService on that node.

Bryan

PS: Concerning "flex", the big leverage for flexing the cluster will come with a shared disk architecture (rather than the present shared nothing architecture).  Using a shared disk architecture, the nodes can then be started or stopped without regard to the persistent state, which would be on managed storage.  That would make it possible to tradeoff dynamically which nodes were assigned to which application, where the application might be bigdata, hadoop, etc.  In that kind of scenario I find it difficult to imagine that an operator will be in the loop when a node is torn down and then repurposed to a different application.  However, this kind of "flex" is outside the scope of the current effort.

> -----Original Message-----
> From: Fred Oliver [mailto:fko...@gm...] 
> Sent: Monday, July 26, 2010 2:19 PM
> To: Bryan Thompson
> Cc: Bigdata Developers
> Subject: Re: [Bigdata-developers] Why zookeeper?
> 
> "Flexibility" seems to be in the eyes of the beholder.  I 
> would characterize the existing mechanism as more adaptive, 
> but less flexible. That is, you get "flexing", but you lose 
> fine grained control. I think that adaptivity and fine 
> grained control are mutually exclusive goals.
> 
> The finer grained control has a number of advantages:
> 
> * It is deterministic. Services always run on the same host 
> and with the same configuration every time. Easier to 
> diagnose faults. Easier to test. Every time the system starts 
> up, it starts up the same way, and deviations are errors.
> 
> * There is no need for locking. That is, I think the locking 
> is needed as a result of the non-deterministic behavior. (No 
> need for a service manager to wait to find out if another 
> host's service manager grabbed a lock to start a service 
> before attempting to start one itself.)
> 
> * It allows for matching the distribution of services to 
> heterogeneous hardware obtained to run them. The operators 
> knew which machines were purchased to run which services and 
> why, and should be able to specify that the services run on 
> specific machines.
> 
> * If the operator added hardware to a cluster for a specific 
> need, the operator should be able to specify that the 
> hardware be used to address the need.
> 
> * It allows for more specific control of individual services. 
> (eg. How would you separate the service directory from the 
> mass storage directory? How do you configure N data services 
> per host to run on N independent drives instead of a RAID? 
> How would that perform?)
> 
> On the flip side, I am not clear on what the benefits of the 
> adaptivity or "flexing" really are in this context. Flexing 
> seems more related to "cloud" environments where hardware is 
> instantly available but managing persistent data is very 
> difficult. Could you elaborate on the benefits you perceive 
> from adaptivity?
> 
> Removing or separating out the adaptive behavior (and 
> zookeeper, or limiting zookeeper's use to HA leader election) 
> removes moving parts, increases visibility and understanding 
> of the code, and improves maintainability significantly. We 
> would like to see bigdata become modular, to the point where 
> the service manager (and its use of
> zookeeper) can be implemented in its own optional module.
> 
> Is the starting of each of the services individually from the 
> command line or script possible without need for zookeeper 
> (if it really is limited to the services manager service)? If 
> so, then this isn't a second mechanism at all.
> 
> In either case, supporting a second service starting 
> arrangement seems like a small price relative to the 
> simplicity gained.
> 
> Fred
>

Re: [Bigdata-developers] Why zookeeper?

From: Fred O. <fko...@gm...> - 2010-07-26 18:18:53

"Flexibility" seems to be in the eyes of the beholder.  I would
characterize the existing mechanism as more adaptive, but less
flexible. That is, you get "flexing", but you lose fine grained
control. I think that adaptivity and fine grained control are mutually
exclusive goals.

The finer grained control has a number of advantages:

* It is deterministic. Services always run on the same host and with
the same configuration every time. Easier to diagnose faults. Easier
to test. Every time the system starts up, it starts up the same way,
and deviations are errors.

* There is no need for locking. That is, I think the locking is needed
as a result of the non-deterministic behavior. (No need for a service
manager to wait to find out if another host's service manager grabbed
a lock to start a service before attempting to start one itself.)

* It allows for matching the distribution of services to heterogeneous
hardware obtained to run them. The operators knew which machines were
purchased to run which services and why, and should be able to specify
that the services run on specific machines.

* If the operator added hardware to a cluster for a specific need, the
operator should be able to specify that the hardware be used to
address the need.

* It allows for more specific control of individual services. (eg. How
would you separate the service directory from the mass storage
directory? How do you configure N data services per host to run on N
independent drives instead of a RAID? How would that perform?)

On the flip side, I am not clear on what the benefits of the
adaptivity or "flexing" really are in this context. Flexing seems more
related to "cloud" environments where hardware is instantly available
but managing persistent data is very difficult. Could you elaborate on
the benefits you perceive from adaptivity?

Removing or separating out the adaptive behavior (and zookeeper, or
limiting zookeeper's use to HA leader election) removes moving parts,
increases visibility and understanding of the code, and improves
maintainability significantly. We would like to see bigdata become
modular, to the point where the service manager (and its use of
zookeeper) can be implemented in its own optional module.

Is the starting of each of the services individually from the command
line or script possible without need for zookeeper (if it really is
limited to the services manager service)? If so, then this isn't a
second mechanism at all.

In either case, supporting a second service starting arrangement seems
like a small price relative to the simplicity gained.

Fred

27 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 61 62 63 64 65 .. 72 > >> (Page 63 of 72)