This list is closed, nobody may subscribe to it.
| 2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
| 2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
| 2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
| 2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
| 2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
| 2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
| 2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Bryan T. <br...@sy...> - 2012-01-23 18:48:29
|
Hello, I've implemented a peer for ganglia 3.1 in Java (Apache 2.0 license) [1,2,3]. By a peer, I mean that it both reports metrics (as do embedded-ganglia and hadoop-commons) but it also listens to and participates in the ganglia protocol. This means that it is capable of producing load balanced host reports just like gstat. It does not yet respond to telnet requests with an XML dump of the soft state, but that it an easy enough thing to do. I also have not yet worked through the unicast setup in any depth. Right now it joins a configured multicast group. It supports the 3.1 wire format and has most of the code required to support the earlier wire format. The project does not yet come with any bundled host/app metrics collectors. However, we do have a variety of statistics collectors for the host level which I plan to port (vmstat, iostat, and typeperf (allows metrics collection under windows)). There is also a per-process collector based on pidstat and JVM specific collectors. Those collectors need to be decoupled from bigdata's internal performance counter hierarchy in order to refactor them into the bigdata-ganglia module. This is being managed as part of the bigdata [4] project, but the ganglia module is under the Apache 2.0 license. Thanks, bryan [1] http://www.bigdata.com/bigdata/blog/?p=359 (blog article on bigdata-ganglia) [2] https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_1_1_0/bigdata-ganglia/ (SVN) [3] https://sourceforge.net/projects/bigdata/files/bigdata-ganglia/1.1.0/ (download) [4] https://sourceforge.net/projects/bigdata/ |
|
From: Gerjon de V. <g.d...@op...> - 2012-01-17 16:07:21
|
Thanks, that's quite clear. And interestingly has considerable overlap with some work we did on the text indexes of our USeekM indexer (too bad I didn't notice this upcoming feature before). On Tue, Jan 17, 2012, at 07:08, Bryan Thompson wrote: > Gerjon, > > I've filed an issue to track this [1]. Please let us know if you have > questions. > > Thanks, > Bryan > > [1] https://sourceforge.net/apps/trac/bigdata/ticket/447 > > > -----Original Message----- > > From: Gerjon de Vries [mailto:g.d...@op...] > > Sent: Tuesday, January 17, 2012 4:45 AM > > To: big...@li... > > Subject: [Bigdata-developers] New (subject centric) search index > > > > Hi, > > > > I see on the roadmap that Bigdata plans a "New (subject > > centric) search index". Is it possible to give some details > > about this plan? > > > > Thanks! > > > > -------------------------------------------------------------- > > ---------------- > > Keep Your Developer Skills Current with LearnDevNow! > > The most comprehensive online learning library for Microsoft > > developers is just $99.99! Visual Studio, SharePoint, SQL - > > plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future > > releases when you subscribe now! > > http://p.sf.net/sfu/learndevnow-d2d > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
|
From: Bryan T. <br...@sy...> - 2012-01-17 13:08:55
|
Gerjon, I've filed an issue to track this [1]. Please let us know if you have questions. Thanks, Bryan [1] https://sourceforge.net/apps/trac/bigdata/ticket/447 > -----Original Message----- > From: Gerjon de Vries [mailto:g.d...@op...] > Sent: Tuesday, January 17, 2012 4:45 AM > To: big...@li... > Subject: [Bigdata-developers] New (subject centric) search index > > Hi, > > I see on the roadmap that Bigdata plans a "New (subject > centric) search index". Is it possible to give some details > about this plan? > > Thanks! > > -------------------------------------------------------------- > ---------------- > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft > developers is just $99.99! Visual Studio, SharePoint, SQL - > plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future > releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
|
From: Gerjon de V. <g.d...@op...> - 2012-01-17 10:02:11
|
Hi, I see on the roadmap that Bigdata plans a "New (subject centric) search index". Is it possible to give some details about this plan? Thanks! |
|
From: Bryan T. <br...@sy...> - 2012-01-06 15:11:42
|
Cut myself off there... Mina, The 1.1.0 release introduces a BLOBS index. This index is for storing large RDF Literals and URIs. The index is based on a hash code (or the RDF Value) plus a collision counter. This gives it a fixed size key (8 bytes). That addresses an issue which existed before the 1.1.0 release where large RDF Values would turn into large keys and cause slow performance in the TERM2ID and ID2TERM indices. The 1.1.0 release uses three different mechanisms for storing RDF Values. Very short values (basically, xsd numeric data type values) are inlined directly into the statement indices. Larger values are stored in the ID2TERM and TERM2ID indices. Above a threshold in size (default 256 bytes) the value is stored in the BLOBS index instead. This scheme makes it possible to have good fan out in all of the indices. The BLOBS index stored only the hash + collision counter in the key and the address of the blob on the backing store in the value. It is capable of storing large RDF Literals. Megabytes is not a problem. However, it is not a "blob store" and you should not use it to store gigabytes (e.g., video). The Bigdata File System was originally developed for exactly the purpose you are describing - a division between a metadata repository (the RDF database) and a content repository (blob store). If you have really big stuff, my suggestion is that you use webdav, S3, or openstack to store that stuff and then use the URI of the video content, etc. within your RDF. You can then reach through the URI to the very large objects in the blob storage system. Bigdata is not a good place to store those very large objects because the target size of a shard is ~ 200MB. A single very large object could easily be larger than a shard. Since the architecture is all B+Trees, you wind up with a shard having a single tuple and which is too large to move around among the nodes on the cluster. Thanks, Bryan ________________________________ From: Bryan Thompson Sent: Friday, January 06, 2012 10:05 AM To: 'Mina R Waheeb' Cc: big...@li... Subject: RE: [Bigdata-developers] BigdataFileSystem Mina, The 1.1.0 release introduces a BLOBS index. This index is for storing large RDF Literals and URIs. The index is based on a hash code (or the RDF Value) plus a collision counter. This gives it a fixed size key (8 bytes). That addresses an issue which existed before the 1.1.0 release where large RDF Values would turn into large keys and cause slow performance in the TERM2ID and ID2TERM indices. The 1.1.0 release uses three different mechanism ________________________________ From: Mina R Waheeb [mailto:sy...@gm...] Sent: Friday, January 06, 2012 10:01 AM To: Bryan Thompson Cc: big...@li... Subject: Re: [Bigdata-developers] BigdataFileSystem Thanks for the information. Actually, I'm not trying to use Bigdata as a true filesystem. I have large blobs and metadata describe it in RDF. the required function is to query the RDF to find the blobs. It will be much easier to store the blobs in the same store that handles the RDF to avoid a lot of problems such as distributed transactions. I tried to take a look on the SVN 1.1.0 tag but i still confused, dose Bigdata support BLOBs? if so, is there any API to access it in a stream manner? Thanks again for your reply Cheers, Mina R Waheeb On Wed, Jan 4, 2012 at 12:31 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Mina, Thanks for your interest. Unfortunately, we stopped development on the REST-ful bigdata file system a few years ago. There are now a variety of REST-ful options (open stack, s3, etc) in addition to the parallel file systems developed within the HPC space. Bigdata is primarily a system for dynamically key-range sharded B+Trees. Bigdata does include a key-value store, which is layered over the B+Tree. One instance of this is used to store global metadata about relations instantiated in the database. It can also be used to store application specific metadata within a different schema (aka column family). The bigdata FS was designed as a hybrid of the key-value store (metadata) and large objects. However, the objects for the FS were really too close in scale to the shards themselves. Combined with the emergence of other solutions, this led us to abandon the FS layer. Most of the focus of bigdata is on RDF. The RDF database is also layered over the distributed B+Trees. You can think of it as a graph database if you are not familiar with the W3C RDF standards. Query is through SPARQL, which is a high level language similar in expressivity to SQL but designed for "graph" structured data. The bigdata RDF database layer provides fast lookup, joins, etc. for graph query. There is a rest interface to the SPARQL engine which also provides for graph data insert/update/delete. The current release is 1.1.0. The trunk is not developed, so if you check out the code from SVN be sure to use the appropriate tag (releases) or branch (maintenance and development). Thanks, Bryan ________________________________ From: Mina R Waheeb [mailto:sy...@gm...<mailto:sy...@gm...>] Sent: Tuesday, January 03, 2012 11:32 PM To: big...@li...<mailto:big...@li...> Subject: [Bigdata-developers] BigdataFileSystem Hi Bigdata folks, I'm new to big data and really interested in the BigdataFileSystem API running in a single node cluster mode or embedded mode to store tons of files. After reading the wiki docs, I still have some questions: - Its not clear to me is the BigdataFileSystem designed for concurrent I/O? - How read OP works? dose it load the whole file blocks once read request happens? - is there Embedded mode API without client/server overhead? I tried the unit test EmbeddedFederation as an example but I came up with nothing seems I missed some configuration! - What are the minimum dependencies required to run BigdataFileSystem? because while playing around its looks requires JINI API? - is there any docs of there internal file format? Sorry, If the questions already answered in the docs please point me out there :) Great effort guys and thanks for sharing! Cheers, Mina R Waheeb |
|
From: Bryan T. <br...@sy...> - 2012-01-06 15:05:15
|
Mina, The 1.1.0 release introduces a BLOBS index. This index is for storing large RDF Literals and URIs. The index is based on a hash code (or the RDF Value) plus a collision counter. This gives it a fixed size key (8 bytes). That addresses an issue which existed before the 1.1.0 release where large RDF Values would turn into large keys and cause slow performance in the TERM2ID and ID2TERM indices. The 1.1.0 release uses three different mechanism ________________________________ From: Mina R Waheeb [mailto:sy...@gm...] Sent: Friday, January 06, 2012 10:01 AM To: Bryan Thompson Cc: big...@li... Subject: Re: [Bigdata-developers] BigdataFileSystem Thanks for the information. Actually, I'm not trying to use Bigdata as a true filesystem. I have large blobs and metadata describe it in RDF. the required function is to query the RDF to find the blobs. It will be much easier to store the blobs in the same store that handles the RDF to avoid a lot of problems such as distributed transactions. I tried to take a look on the SVN 1.1.0 tag but i still confused, dose Bigdata support BLOBs? if so, is there any API to access it in a stream manner? Thanks again for your reply Cheers, Mina R Waheeb On Wed, Jan 4, 2012 at 12:31 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Mina, Thanks for your interest. Unfortunately, we stopped development on the REST-ful bigdata file system a few years ago. There are now a variety of REST-ful options (open stack, s3, etc) in addition to the parallel file systems developed within the HPC space. Bigdata is primarily a system for dynamically key-range sharded B+Trees. Bigdata does include a key-value store, which is layered over the B+Tree. One instance of this is used to store global metadata about relations instantiated in the database. It can also be used to store application specific metadata within a different schema (aka column family). The bigdata FS was designed as a hybrid of the key-value store (metadata) and large objects. However, the objects for the FS were really too close in scale to the shards themselves. Combined with the emergence of other solutions, this led us to abandon the FS layer. Most of the focus of bigdata is on RDF. The RDF database is also layered over the distributed B+Trees. You can think of it as a graph database if you are not familiar with the W3C RDF standards. Query is through SPARQL, which is a high level language similar in expressivity to SQL but designed for "graph" structured data. The bigdata RDF database layer provides fast lookup, joins, etc. for graph query. There is a rest interface to the SPARQL engine which also provides for graph data insert/update/delete. The current release is 1.1.0. The trunk is not developed, so if you check out the code from SVN be sure to use the appropriate tag (releases) or branch (maintenance and development). Thanks, Bryan ________________________________ From: Mina R Waheeb [mailto:sy...@gm...<mailto:sy...@gm...>] Sent: Tuesday, January 03, 2012 11:32 PM To: big...@li...<mailto:big...@li...> Subject: [Bigdata-developers] BigdataFileSystem Hi Bigdata folks, I'm new to big data and really interested in the BigdataFileSystem API running in a single node cluster mode or embedded mode to store tons of files. After reading the wiki docs, I still have some questions: - Its not clear to me is the BigdataFileSystem designed for concurrent I/O? - How read OP works? dose it load the whole file blocks once read request happens? - is there Embedded mode API without client/server overhead? I tried the unit test EmbeddedFederation as an example but I came up with nothing seems I missed some configuration! - What are the minimum dependencies required to run BigdataFileSystem? because while playing around its looks requires JINI API? - is there any docs of there internal file format? Sorry, If the questions already answered in the docs please point me out there :) Great effort guys and thanks for sharing! Cheers, Mina R Waheeb |
|
From: Mina R W. <sy...@gm...> - 2012-01-06 15:01:29
|
Thanks for the information. Actually, I'm not trying to use Bigdata as a true filesystem. I have large blobs and metadata describe it in RDF. the required function is to query the RDF to find the blobs. It will be much easier to store the blobs in the same store that handles the RDF to avoid a lot of problems such as distributed transactions. I tried to take a look on the SVN 1.1.0 tag but i still confused, dose Bigdata support BLOBs? if so, is there any API to access it in a stream manner? Thanks again for your reply Cheers, Mina R Waheeb On Wed, Jan 4, 2012 at 12:31 PM, Bryan Thompson <br...@sy...> wrote: > ** > Mina, > > Thanks for your interest. Unfortunately, we stopped development on the > REST-ful bigdata file system a few years ago. There are now a variety of > REST-ful options (open stack, s3, etc) in addition to the parallel file > systems developed within the HPC space. > > Bigdata is primarily a system for dynamically key-range sharded B+Trees. Bigdata > does include a key-value store, which is layered over the B+Tree. One > instance of this is used to store global metadata about relations > instantiated in the database. It can also be used to store application > specific metadata within a different schema (aka column family). The > bigdata FS was designed as a hybrid of the key-value store (metadata) and > large objects. However, the objects for the FS were really too close in > scale to the shards themselves. Combined with the emergence of other > solutions, this led us to abandon the FS layer. > > Most of the focus of bigdata is on RDF. The RDF database is also layered > over the distributed B+Trees. You can think of it as a graph database if > you are not familiar with the W3C RDF standards. Query is through SPARQL, > which is a high level language similar in expressivity to SQL but designed > for "graph" structured data. The bigdata RDF database layer provides fast > lookup, joins, etc. for graph query. There is a rest interface to the > SPARQL engine which also provides for graph data insert/update/delete. > > The current release is 1.1.0. The trunk is not developed, so if you check > out the code from SVN be sure to use the appropriate tag (releases) or > branch (maintenance and development). > > Thanks, > Bryan > > ------------------------------ > *From:* Mina R Waheeb [mailto:sy...@gm...] > *Sent:* Tuesday, January 03, 2012 11:32 PM > *To:* big...@li... > *Subject:* [Bigdata-developers] BigdataFileSystem > > Hi Bigdata folks, > > I'm new to big data and really interested in the BigdataFileSystem API > running in a single node cluster mode or embedded mode to store tons of > files. After reading the wiki docs, I still have some questions: > > - Its not clear to me is the BigdataFileSystem designed for concurrent I/O? > - How read OP works? dose it load the whole file blocks once read request > happens? > - is there Embedded mode API without client/server overhead? I tried the > unit test EmbeddedFederation as an example but I came up with nothing seems > I missed some configuration! > - What are the minimum dependencies required to run BigdataFileSystem? > because while playing around its looks requires JINI API? > - is there any docs of there internal file format? > > Sorry, If the questions already answered in the docs please point me out > there :) > > Great effort guys and thanks for sharing! > > Cheers, > Mina R Waheeb > > |
|
From: Bryan T. <br...@sy...> - 2012-01-04 12:31:12
|
Mina, Thanks for your interest. Unfortunately, we stopped development on the REST-ful bigdata file system a few years ago. There are now a variety of REST-ful options (open stack, s3, etc) in addition to the parallel file systems developed within the HPC space. Bigdata is primarily a system for dynamically key-range sharded B+Trees. Bigdata does include a key-value store, which is layered over the B+Tree. One instance of this is used to store global metadata about relations instantiated in the database. It can also be used to store application specific metadata within a different schema (aka column family). The bigdata FS was designed as a hybrid of the key-value store (metadata) and large objects. However, the objects for the FS were really too close in scale to the shards themselves. Combined with the emergence of other solutions, this led us to abandon the FS layer. Most of the focus of bigdata is on RDF. The RDF database is also layered over the distributed B+Trees. You can think of it as a graph database if you are not familiar with the W3C RDF standards. Query is through SPARQL, which is a high level language similar in expressivity to SQL but designed for "graph" structured data. The bigdata RDF database layer provides fast lookup, joins, etc. for graph query. There is a rest interface to the SPARQL engine which also provides for graph data insert/update/delete. The current release is 1.1.0. The trunk is not developed, so if you check out the code from SVN be sure to use the appropriate tag (releases) or branch (maintenance and development). Thanks, Bryan ________________________________ From: Mina R Waheeb [mailto:sy...@gm...] Sent: Tuesday, January 03, 2012 11:32 PM To: big...@li... Subject: [Bigdata-developers] BigdataFileSystem Hi Bigdata folks, I'm new to big data and really interested in the BigdataFileSystem API running in a single node cluster mode or embedded mode to store tons of files. After reading the wiki docs, I still have some questions: - Its not clear to me is the BigdataFileSystem designed for concurrent I/O? - How read OP works? dose it load the whole file blocks once read request happens? - is there Embedded mode API without client/server overhead? I tried the unit test EmbeddedFederation as an example but I came up with nothing seems I missed some configuration! - What are the minimum dependencies required to run BigdataFileSystem? because while playing around its looks requires JINI API? - is there any docs of there internal file format? Sorry, If the questions already answered in the docs please point me out there :) Great effort guys and thanks for sharing! Cheers, Mina R Waheeb |
|
From: Mina R W. <sy...@gm...> - 2012-01-04 04:32:09
|
Hi Bigdata folks, I'm new to big data and really interested in the BigdataFileSystem API running in a single node cluster mode or embedded mode to store tons of files. After reading the wiki docs, I still have some questions: - Its not clear to me is the BigdataFileSystem designed for concurrent I/O? - How read OP works? dose it load the whole file blocks once read request happens? - is there Embedded mode API without client/server overhead? I tried the unit test EmbeddedFederation as an example but I came up with nothing seems I missed some configuration! - What are the minimum dependencies required to run BigdataFileSystem? because while playing around its looks requires JINI API? - is there any docs of there internal file format? Sorry, If the questions already answered in the docs please point me out there :) Great effort guys and thanks for sharing! Cheers, Mina R Waheeb |
|
From: Bryan T. <br...@sy...> - 2011-12-20 18:44:26
|
This is a major version release of bigdata(R). Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation). The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads. The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth. Both platforms support fully concurrent readers with snapshot isolation. Distributed processing offers greater throughput but does not reduce query or update latency. Choose the Journal when the anticipated scale and throughput requirements permit. Choose the Federation when the administrative and machine overhead associated with operating a cluster is an acceptable tradeoff to have essentially unlimited data scaling and throughput. See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and [3,5,6] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata, see [7]. Starting with the 1.0.0 release, we offer a WAR artifact [8] for easy installation of the single machine RDF database. For custom development and cluster installations we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse. You can also build the code using the ant script. The cluster installer requires the use of the ant script. You can download the WAR from: http://sourceforge.net/projects/bigdata/ You can checkout this release from: https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/BIGDATA_RELEASE_1_1_0 New features: - Fast, scalable native support for SPARQL 1.1 analytic queries; - %100 Java memory manager leverages the JVM native heap (no GC); - New extensible hash tree index structure; Feature summary: - Single machine data storage to ~50B triples/quads (RWStore); - Clustered data storage is essentially unlimited; - Simple embedded and/or webapp deployment (NanoSparqlServer); - Triples, quads, or triples with provenance (SIDs); - Fast 100% native SPARQL 1.0 evaluation; - Integrated "analytic" query package; - Fast RDFS+ inference and truth maintenance; - Fast statement level provenance mode (SIDs). Road map [3]: - Simplified deployment, configuration, and administration for clusters; and - High availability for the journal and the cluster. Change log: Note: Versions with (*) require data migration. For details, see [9]. 1.1.0 (*) - http://sourceforge.net/apps/trac/bigdata/ticket/23 (Lexicon joins) - http://sourceforge.net/apps/trac/bigdata/ticket/109 (Store large literals as "blobs") - http://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM "how to" in wiki and build.xml are out of date.) - http://sourceforge.net/apps/trac/bigdata/ticket/203 (Implement an persistence capable hash table to support analytic query) - http://sourceforge.net/apps/trac/bigdata/ticket/209 (AccessPath should visit binding sets rather than elements for high level query.) - http://sourceforge.net/apps/trac/bigdata/ticket/227 (SliceOp appears to be necessary when operator plan should suffice without) - http://sourceforge.net/apps/trac/bigdata/ticket/232 (Bottom-up evaluation semantics). - http://sourceforge.net/apps/trac/bigdata/ticket/246 (Derived xsd numeric data types must be inlined as extension types.) - http://sourceforge.net/apps/trac/bigdata/ticket/254 (Revisit pruning of intermediate variable bindings during query execution) - http://sourceforge.net/apps/trac/bigdata/ticket/261 (Lift conditions out of subqueries.) - http://sourceforge.net/apps/trac/bigdata/ticket/300 (Native ORDER BY) - http://sourceforge.net/apps/trac/bigdata/ticket/324 (Inline predeclared URIs and namespaces in 2-3 bytes) - http://sourceforge.net/apps/trac/bigdata/ticket/330 (NanoSparqlServer does not locate "html" resources when run from jar) - http://sourceforge.net/apps/trac/bigdata/ticket/334 (Support inlining of unicode data in the statement indices.) - http://sourceforge.net/apps/trac/bigdata/ticket/364 (Scalable default graph evaluation) - http://sourceforge.net/apps/trac/bigdata/ticket/368 (Prune variable bindings during query evaluation) - http://sourceforge.net/apps/trac/bigdata/ticket/370 (Direct translation of openrdf AST to bigdata AST) - http://sourceforge.net/apps/trac/bigdata/ticket/373 (Fix StrBOp and other IValueExpressions) - http://sourceforge.net/apps/trac/bigdata/ticket/377 (Optimize OPTIONALs with multiple statement patterns.) - http://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster) - http://sourceforge.net/apps/trac/bigdata/ticket/387 (Cluster does not compute closure) - http://sourceforge.net/apps/trac/bigdata/ticket/395 (HTree hash join performance) - http://sourceforge.net/apps/trac/bigdata/ticket/401 (inline xsd:unsigned datatypes) - http://sourceforge.net/apps/trac/bigdata/ticket/408 (xsd:string cast fails for non-numeric data) - http://sourceforge.net/apps/trac/bigdata/ticket/421 (New query hints model.) - http://sourceforge.net/apps/trac/bigdata/ticket/431 (Use of read-only tx per query defeats cache on cluster) 1.0.3 - http://sourceforge.net/apps/trac/bigdata/ticket/217 (BTreeCounters does not track bytes released) - http://sourceforge.net/apps/trac/bigdata/ticket/269 (Refactor performance counters using accessor interface) - http://sourceforge.net/apps/trac/bigdata/ticket/329 (B+Tree should delete bloom filter when it is disabled.) - http://sourceforge.net/apps/trac/bigdata/ticket/372 (RWStore does not prune the CommitRecordIndex) - http://sourceforge.net/apps/trac/bigdata/ticket/375 (Persistent memory leaks (RWStore/DISK)) - http://sourceforge.net/apps/trac/bigdata/ticket/385 (FastRDFValueCoder2: ArrayIndexOutOfBoundsException) - http://sourceforge.net/apps/trac/bigdata/ticket/391 (Release age advanced on WORM mode journal) - http://sourceforge.net/apps/trac/bigdata/ticket/392 (Add a DELETE by access path method to the NanoSparqlServer) - http://sourceforge.net/apps/trac/bigdata/ticket/393 (Add "context-uri" request parameter to specify the default context for INSERT in the REST API) - http://sourceforge.net/apps/trac/bigdata/ticket/394 (log4j configuration error message in WAR deployment) - http://sourceforge.net/apps/trac/bigdata/ticket/399 (Add a fast range count method to the REST API) - http://sourceforge.net/apps/trac/bigdata/ticket/422 (Support temp triple store wrapped by a BigdataSail) - http://sourceforge.net/apps/trac/bigdata/ticket/424 (NQuads support for NanoSparqlServer) - http://sourceforge.net/apps/trac/bigdata/ticket/425 (Bug fix to DEFAULT_RDF_FORMAT for bulk data loader in scale-out) - http://sourceforge.net/apps/trac/bigdata/ticket/426 (Support either lockfile (procmail) and dotlockfile (liblockfile1) in scale-out) - http://sourceforge.net/apps/trac/bigdata/ticket/427 (BigdataSail#getReadOnlyConnection() race condition with concurrent commit) - http://sourceforge.net/apps/trac/bigdata/ticket/435 (Address is 0L) - http://sourceforge.net/apps/trac/bigdata/ticket/436 (TestMROWTransactions failure in CI) 1.0.2 - http://sourceforge.net/apps/trac/bigdata/ticket/32 (Query time expansion of (foo rdf:type rdfs:Resource) drags in SPORelation for scale-out.) - http://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM "how to" in wiki and build.xml are out of date.) - http://sourceforge.net/apps/trac/bigdata/ticket/356 (Query not terminated by error.) - http://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.) - http://sourceforge.net/apps/trac/bigdata/ticket/361 (IRunningQuery not closed promptly.) - http://sourceforge.net/apps/trac/bigdata/ticket/371 (DataLoader fails to load resources available from the classpath.) - http://sourceforge.net/apps/trac/bigdata/ticket/376 (Support for the streaming of bigdata IBindingSets into a sparql query.) - http://sourceforge.net/apps/trac/bigdata/ticket/378 (ClosedByInterruptException during heavy query mix.) - http://sourceforge.net/apps/trac/bigdata/ticket/379 (NotSerializableException for SPOAccessPath.) - http://sourceforge.net/apps/trac/bigdata/ticket/382 (Change dependencies to Apache River 2.2.0) 1.0.1 (*) - http://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema names in the sparse row store). - http://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should use more bits for scale-out). - http://sourceforge.net/apps/trac/bigdata/ticket/225 (OSX requires specialized performance counter collection classes). - http://sourceforge.net/apps/trac/bigdata/ticket/348 (BigdataValueFactory.asValue() must return new instance when DummyIV is used). - http://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance). - http://sourceforge.net/apps/trac/bigdata/ticket/351 (SPO not Serializable exception in SIDS mode (scale-out)). - http://sourceforge.net/apps/trac/bigdata/ticket/352 (ClassCastException when querying with binding-values that are not known to the database). - http://sourceforge.net/apps/trac/bigdata/ticket/353 (UnsupportedOperatorException for some SPARQL queries). - http://sourceforge.net/apps/trac/bigdata/ticket/355 (Query failure when comparing with non materialized value). - http://sourceforge.net/apps/trac/bigdata/ticket/357 (RWStore reports "FixedAllocator returning null address, with freeBits".) - http://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.) - http://sourceforge.net/apps/trac/bigdata/ticket/362 (log4j - slf4j bridge.) For more information about bigdata(R), please see the following links: [1] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page [2] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=GettingStarted [3] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Roadmap [4] http://www.bigdata.com/bigdata/docs/api/ [5] http://sourceforge.net/projects/bigdata/ [6] http://www.bigdata.com/blog [7] http://www.systap.com/bigdata.htm [8] http://sourceforge.net/projects/bigdata/files/bigdata/ [9] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration About bigdata: Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata(r) may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata(r) RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance. |
|
From: Bryan T. <br...@sy...> - 2011-12-20 18:30:42
|
The 1.0.3 and 1.1.0 releases are out. 1.0.3 is a maintenance release which enjoys binary compatibilty with the 1.0.2 and 1.0.1 releases. 1.1.0 is a major release. It includes SPARQL 1.1 support and analytic query functionality. The 1.1.0 release notes are inline below. Those notes include the 1.0.3 change log. Maintenance and development for 1.1.x will continue on [1]. Developers, please switch to [1] now! Thanks, Bryan [1] https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/BIGDATA_RELEASE_1_1_0 This is a major version release of bigdata(R). Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation). The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads. The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth. Both platforms support fully concurrent readers with snapshot isolation. Distributed processing offers greater throughput but does not reduce query or update latency. Choose the Journal when the anticipated scale and throughput requirements permit. Choose the Federation when the administrative and machine overhead associated with operating a cluster is an acceptable tradeoff to have essentially unlimited data scaling and throughput. See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and [3,5,6] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata, see [7]. Starting with the 1.0.0 release, we offer a WAR artifact [8] for easy installation of the single machine RDF database. For custom development and cluster installations we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse. You can also build the code using the ant script. The cluster installer requires the use of the ant script. You can download the WAR from: http://sourceforge.net/projects/bigdata/ You can checkout this release from: https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/BIGDATA_RELEASE_1_1_0 New features: - Fast, scalable native support for SPARQL 1.1 analytic queries; - %100 Java memory manager leverages the JVM native heap (no GC); - New extensible hash tree index structure; Feature summary: - Single machine data storage to ~50B triples/quads (RWStore); - Clustered data storage is essentially unlimited; - Simple embedded and/or webapp deployment (NanoSparqlServer); - Triples, quads, or triples with provenance (SIDs); - Fast 100% native SPARQL 1.0 evaluation; - Integrated "analytic" query package; - Fast RDFS+ inference and truth maintenance; - Fast statement level provenance mode (SIDs). Road map [3]: - Simplified deployment, configuration, and administration for clusters; and - High availability for the journal and the cluster. Change log: Note: Versions with (*) require data migration. For details, see [9]. 1.1.0 (*) - http://sourceforge.net/apps/trac/bigdata/ticket/23 (Lexicon joins) - http://sourceforge.net/apps/trac/bigdata/ticket/109 (Store large literals as "blobs") - http://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM "how to" in wiki and build.xml are out of date.) - http://sourceforge.net/apps/trac/bigdata/ticket/203 (Implement an persistence capable hash table to support analytic query) - http://sourceforge.net/apps/trac/bigdata/ticket/209 (AccessPath should visit binding sets rather than elements for high level query.) - http://sourceforge.net/apps/trac/bigdata/ticket/227 (SliceOp appears to be necessary when operator plan should suffice without) - http://sourceforge.net/apps/trac/bigdata/ticket/232 (Bottom-up evaluation semantics). - http://sourceforge.net/apps/trac/bigdata/ticket/246 (Derived xsd numeric data types must be inlined as extension types.) - http://sourceforge.net/apps/trac/bigdata/ticket/254 (Revisit pruning of intermediate variable bindings during query execution) - http://sourceforge.net/apps/trac/bigdata/ticket/261 (Lift conditions out of subqueries.) - http://sourceforge.net/apps/trac/bigdata/ticket/300 (Native ORDER BY) - http://sourceforge.net/apps/trac/bigdata/ticket/324 (Inline predeclared URIs and namespaces in 2-3 bytes) - http://sourceforge.net/apps/trac/bigdata/ticket/330 (NanoSparqlServer does not locate "html" resources when run from jar) - http://sourceforge.net/apps/trac/bigdata/ticket/334 (Support inlining of unicode data in the statement indices.) - http://sourceforge.net/apps/trac/bigdata/ticket/364 (Scalable default graph evaluation) - http://sourceforge.net/apps/trac/bigdata/ticket/368 (Prune variable bindings during query evaluation) - http://sourceforge.net/apps/trac/bigdata/ticket/370 (Direct translation of openrdf AST to bigdata AST) - http://sourceforge.net/apps/trac/bigdata/ticket/373 (Fix StrBOp and other IValueExpressions) - http://sourceforge.net/apps/trac/bigdata/ticket/377 (Optimize OPTIONALs with multiple statement patterns.) - http://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster) - http://sourceforge.net/apps/trac/bigdata/ticket/387 (Cluster does not compute closure) - http://sourceforge.net/apps/trac/bigdata/ticket/395 (HTree hash join performance) - http://sourceforge.net/apps/trac/bigdata/ticket/401 (inline xsd:unsigned datatypes) - http://sourceforge.net/apps/trac/bigdata/ticket/408 (xsd:string cast fails for non-numeric data) - http://sourceforge.net/apps/trac/bigdata/ticket/421 (New query hints model.) - http://sourceforge.net/apps/trac/bigdata/ticket/431 (Use of read-only tx per query defeats cache on cluster) 1.0.3 - http://sourceforge.net/apps/trac/bigdata/ticket/217 (BTreeCounters does not track bytes released) - http://sourceforge.net/apps/trac/bigdata/ticket/269 (Refactor performance counters using accessor interface) - http://sourceforge.net/apps/trac/bigdata/ticket/329 (B+Tree should delete bloom filter when it is disabled.) - http://sourceforge.net/apps/trac/bigdata/ticket/372 (RWStore does not prune the CommitRecordIndex) - http://sourceforge.net/apps/trac/bigdata/ticket/375 (Persistent memory leaks (RWStore/DISK)) - http://sourceforge.net/apps/trac/bigdata/ticket/385 (FastRDFValueCoder2: ArrayIndexOutOfBoundsException) - http://sourceforge.net/apps/trac/bigdata/ticket/391 (Release age advanced on WORM mode journal) - http://sourceforge.net/apps/trac/bigdata/ticket/392 (Add a DELETE by access path method to the NanoSparqlServer) - http://sourceforge.net/apps/trac/bigdata/ticket/393 (Add "context-uri" request parameter to specify the default context for INSERT in the REST API) - http://sourceforge.net/apps/trac/bigdata/ticket/394 (log4j configuration error message in WAR deployment) - http://sourceforge.net/apps/trac/bigdata/ticket/399 (Add a fast range count method to the REST API) - http://sourceforge.net/apps/trac/bigdata/ticket/422 (Support temp triple store wrapped by a BigdataSail) - http://sourceforge.net/apps/trac/bigdata/ticket/424 (NQuads support for NanoSparqlServer) - http://sourceforge.net/apps/trac/bigdata/ticket/425 (Bug fix to DEFAULT_RDF_FORMAT for bulk data loader in scale-out) - http://sourceforge.net/apps/trac/bigdata/ticket/426 (Support either lockfile (procmail) and dotlockfile (liblockfile1) in scale-out) - http://sourceforge.net/apps/trac/bigdata/ticket/427 (BigdataSail#getReadOnlyConnection() race condition with concurrent commit) - http://sourceforge.net/apps/trac/bigdata/ticket/435 (Address is 0L) - http://sourceforge.net/apps/trac/bigdata/ticket/436 (TestMROWTransactions failure in CI) 1.0.2 - http://sourceforge.net/apps/trac/bigdata/ticket/32 (Query time expansion of (foo rdf:type rdfs:Resource) drags in SPORelation for scale-out.) - http://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM "how to" in wiki and build.xml are out of date.) - http://sourceforge.net/apps/trac/bigdata/ticket/356 (Query not terminated by error.) - http://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.) - http://sourceforge.net/apps/trac/bigdata/ticket/361 (IRunningQuery not closed promptly.) - http://sourceforge.net/apps/trac/bigdata/ticket/371 (DataLoader fails to load resources available from the classpath.) - http://sourceforge.net/apps/trac/bigdata/ticket/376 (Support for the streaming of bigdata IBindingSets into a sparql query.) - http://sourceforge.net/apps/trac/bigdata/ticket/378 (ClosedByInterruptException during heavy query mix.) - http://sourceforge.net/apps/trac/bigdata/ticket/379 (NotSerializableException for SPOAccessPath.) - http://sourceforge.net/apps/trac/bigdata/ticket/382 (Change dependencies to Apache River 2.2.0) 1.0.1 (*) - http://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema names in the sparse row store). - http://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should use more bits for scale-out). - http://sourceforge.net/apps/trac/bigdata/ticket/225 (OSX requires specialized performance counter collection classes). - http://sourceforge.net/apps/trac/bigdata/ticket/348 (BigdataValueFactory.asValue() must return new instance when DummyIV is used). - http://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance). - http://sourceforge.net/apps/trac/bigdata/ticket/351 (SPO not Serializable exception in SIDS mode (scale-out)). - http://sourceforge.net/apps/trac/bigdata/ticket/352 (ClassCastException when querying with binding-values that are not known to the database). - http://sourceforge.net/apps/trac/bigdata/ticket/353 (UnsupportedOperatorException for some SPARQL queries). - http://sourceforge.net/apps/trac/bigdata/ticket/355 (Query failure when comparing with non materialized value). - http://sourceforge.net/apps/trac/bigdata/ticket/357 (RWStore reports "FixedAllocator returning null address, with freeBits".) - http://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.) - http://sourceforge.net/apps/trac/bigdata/ticket/362 (log4j - slf4j bridge.) For more information about bigdata(R), please see the following links: [1] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page [2] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=GettingStarted [3] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Roadmap [4] http://www.bigdata.com/bigdata/docs/api/ [5] http://sourceforge.net/projects/bigdata/ [6] http://www.bigdata.com/blog [7] http://www.systap.com/bigdata.htm [8] http://sourceforge.net/projects/bigdata/files/bigdata/ [9] http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration About bigdata: Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata(r) may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata(r) RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance. |
|
From: Bryan T. <br...@sy...> - 2011-12-20 01:39:41
|
Jack, There is a setup guide for running a "cluster" on a single machine [1]. You can use that to experiment. I use it to run the SPARQL TCK and NanoSparqlServer test suites against a bigdata federation. I really recommend pretty heavy machines to run a "real" cluster. 4-8 cores, 32G+ RAM, 64bit OS. Fast disks. We have a "mini" cluster of 8 machines using Mac minis (low heat, low noise) which we use for testing and development. The minis are pretty short on RAM (8G) and that definitely limits what they can do. I did put SSDs into them to take out most of the IO Wait. You can read about the mini setup here [2,3]. Given the scale and performance of a single node, the real reasons to go to the cluster are either data scale beyond the single machine boundary or the throughput you get from a large number of nodes. The break even point where a cluster begins to do better than a single machine is somewhere around 3-4 nodes. However, remember that the cluster architecture is different -- shard-wise ACID, etc. You have to rethink the application somewhat in terms of patterns which scale-out. E.g., pinning a commit point which you use for readers while writers write ahead using a shard-wise ACID eventually consistent pattern. Then updating the readers when the writes checkpoint the change set (new globally consistent state). Thanks, Bryan [1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=SingleMachineCluster [2] http://www.bigdata.com/bigdata/blog/?p=311 [3] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=MiniCluster > -----Original Message----- > From: Jack Park [mailto:jac...@gm...] > Sent: Monday, December 19, 2011 8:25 PM > To: Bryan Thompson > Cc: big...@li... > Subject: Re: [Bigdata-developers] 1.0.3 and 1.1.1 releases > > To make a decent but small test of a cluster, how many boxes > would be required? > > Thanks > Jack > > On Mon, Dec 19, 2011 at 4:51 PM, Bryan Thompson > <br...@sy...> wrote: > > All, > > > > We will shortly be cutting a 1.0.3 (maintenance release) > and a 1.1.0 release (SPARQL 1.1 support, analytic query, > memory manager). We have one remaining issue open before we > can cut the 1.0.3 release. The 1.1.0 release looks like it > is ready to go now. I am currently running benchmarks on > these two releases. I will give the heads up when the > releases are out and also publish the new development branch > to carry forward the 1.1.0 release. > > > > Thanks, > > Bryan > > > ---------------------------------------------------------------------- > > -------- > > Write once. Port to many. > > Get the SDK and tools to simplify cross-platform app development. > > Create new or port existing apps to sell to consumers worldwide. > > Explore the Intel AppUpSM program developer opportunity. > > appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
|
From: Jack P. <jac...@gm...> - 2011-12-20 01:25:23
|
To make a decent but small test of a cluster, how many boxes would be required? Thanks Jack On Mon, Dec 19, 2011 at 4:51 PM, Bryan Thompson <br...@sy...> wrote: > All, > > We will shortly be cutting a 1.0.3 (maintenance release) and a 1.1.0 release (SPARQL 1.1 support, analytic query, memory manager). We have one remaining issue open before we can cut the 1.0.3 release. The 1.1.0 release looks like it is ready to go now. I am currently running benchmarks on these two releases. I will give the heads up when the releases are out and also publish the new development branch to carry forward the 1.1.0 release. > > Thanks, > Bryan > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2011-12-20 00:51:15
|
All, We will shortly be cutting a 1.0.3 (maintenance release) and a 1.1.0 release (SPARQL 1.1 support, analytic query, memory manager). We have one remaining issue open before we can cut the 1.0.3 release. The 1.1.0 release looks like it is ready to go now. I am currently running benchmarks on these two releases. I will give the heads up when the releases are out and also publish the new development branch to carry forward the 1.1.0 release. Thanks, Bryan |
|
From: tousif <to...@mo...> - 2011-12-08 06:37:09
|
Hi, I have installed zookeeper and started server at port 2181 and specified the same in build properties. when i start bigdata following your instructions at http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterGuide bigdata fails to starts reporting zoookeeper is not found and it is trying to connect to 192.168.16.0 , whereas my host is different and local ip set by router. where can i set zookeeper host?? -- Regards Tousif |
|
From: tousif <to...@mo...> - 2011-12-08 06:31:42
|
here is the log message org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1041): Opening socket connection to server 192.168.20.28/192.168.20.28:2181 Zookeeper: not connected: state=CONNECTING, elapsed=10 Zookeeper is not running. Discovered 0 jini service registrars. Discovered 0 services Discovered 0 stale bigdata services. Discovered 0 live bigdata services. ---------- Forwarded message ---------- From: tousif <to...@mo...> Date: Thu, Dec 8, 2011 at 11:42 AM Subject: Zookeeper connection problem in bigdata federation To: big...@li... Hi, I have installed zookeeper and started server at port 2181 and specified the same in build properties. when i start bigdata following your instructions at http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterGuide bigdata fails to starts reporting zoookeeper is not found and it is trying to connect to 192.168.16.0 , whereas my host is different and local ip set by router. where can i set zookeeper host?? -- Regards Tousif -- Regards Tousif |
|
From: Bryan T. <br...@sy...> - 2011-11-30 12:42:13
|
Not yet. If your question is about SPARQL 1.1 federation support, you could probably link bigdata 1.0.x against Sesame 2.6.1, but this is not tested. Our next release (bigdata 1.1.0) takes over all evaluation from Sesame. We will have to look at how to best support native SPARQL 1.1 federated queries. If we can delegate the graph pattern of a SERVICE addressed to a remote URI to Sesame then this will be an easy enough integration (we already have the SERVICE construct in our internal evaluation model and use it for integrating the bigdata free text search engine, but support for external SERVICEs is not yet there). Thanks, Bryan > -----Original Message----- > From: Neil Brittliff [mailto:Nei...@ho...] > Sent: Wednesday, November 30, 2011 6:01 AM > To: big...@li... > Subject: [Bigdata-developers] New to Big Data > > I am new to Big Data does Big Data support Sesame 2.6.1 ? > > > -------------------------------------------------------------- > ---------------- > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application > performance, security threats, fraudulent activity, and more. > Splunk takes this data and makes sense of it. IT sense. And > common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
|
From: Bryan T. <br...@sy...> - 2011-11-30 12:08:52
|
Please see [1,2,3]. The cluster (federation) is installed from SVN per the instructions available from the links below. There are also OS setup guides linked from [1]. Thanks, Bryan [1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page [2] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterGuide [3] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterStartupFAQ ________________________________ From: tousif [mailto:to...@mo...] Sent: Tuesday, November 29, 2011 1:08 AM To: Bryan Thompson Cc: big...@li... Subject: Re: [Bigdata-developers] is there a way to give hdfs path of jnl file in com.bigdata.journal.AbstractJournal.file Where can i download bigdata federation, Are there any cluster setup instructions for the same. On Mon, Nov 28, 2011 at 5:31 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Yes, right now you need to deploy against local storage on the instance nodes, SAN, NAS, or a parallel file system. That will change with this refactor. I need to update our road map, but somewhere around the end of Q2 would be my guess. Thanks, Bryan ________________________________ From: tousif [mailto:to...@mo...<mailto:to...@mo...>] Sent: Monday, November 28, 2011 6:43 AM To: Bryan Thompson Subject: Re: [Bigdata-developers] is there a way to give hdfs path of jnl file in com.bigdata.journal.AbstractJournal.file Thank you Brayan. I wanted to cluster bigdata so was thinking to store jnl file in hdfs rather than nas. Since my jnl file is going to be bigger, I want to store it distributed. Is there any better way ? On Mon, Nov 28, 2011 at 4:51 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Bigdata journals are expected to be on real file systems. If you use FUSE to map HDFS into a file system then you could provide that path. However, the last time I looked, HDFS does not provide some important guarantees (such as flush actually flushing through) and is oriented toward large block operations rather than the fine grained read/write model used by the bigdata journal. You would be much better off using a parallel file system [1]. There are several that would be suitable. In fact, AWS, recognizing the difference between a blob store and a true parallel file system, has recently release a parallel file system service. We are looking at a refactor to support "cloud" style blob stores, such as HDFS or S3. However that would be only for the bigdata federation, not an individual Journal file. The federation architecture is very different. Each file has a maximum size of ~ 200M. This gives them a good size for efficient block fetch from a blob store without excessive latency. With the refactor, the authoritative copy of the data will be in the cloud/blob store but the working copies will be cached on the instance nodes in the compute side of the cluster. Thanks, Bryan [1] http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems ________________________________ From: tousif [mailto:to...@mo...<mailto:to...@mo...>] Sent: Monday, November 28, 2011 2:11 AM To: big...@li...<mailto:big...@li...> Subject: [Bigdata-developers] is there a way to give hdfs path of jnl file in com.bigdata.journal.AbstractJournal.file -- Regards Tousif -- Regards Tousif -- Regards Tousif |
|
From: Neil B. <Nei...@ho...> - 2011-11-30 11:01:04
|
I am new to Big Data does Big Data support Sesame 2.6.1 ? |
|
From: tousif <to...@mo...> - 2011-11-29 06:07:45
|
Where can i download bigdata federation, Are there any cluster setup instructions for the same. On Mon, Nov 28, 2011 at 5:31 PM, Bryan Thompson <br...@sy...> wrote: > ** > Yes, right now you need to deploy against local storage on the instance > nodes, SAN, NAS, or a parallel file system. That will change with this > refactor. I need to update our road map, but somewhere around the end of > Q2 would be my guess. > > Thanks, > Bryan > > ------------------------------ > *From:* tousif [mailto:to...@mo...] > *Sent:* Monday, November 28, 2011 6:43 AM > *To:* Bryan Thompson > *Subject:* Re: [Bigdata-developers] is there a way to give hdfs path of > jnl file in com.bigdata.journal.AbstractJournal.file > > Thank you Brayan. > > I wanted to cluster bigdata so was thinking to store jnl file in hdfs > rather than nas. Since my jnl file is going to be bigger, I want to store > it distributed. Is there any better way ? > > On Mon, Nov 28, 2011 at 4:51 PM, Bryan Thompson <br...@sy...> wrote: > >> ** >> Bigdata journals are expected to be on real file systems. If you use >> FUSE to map HDFS into a file system then you could provide that path. >> However, the last time I looked, HDFS does not provide some important >> guarantees (such as flush actually flushing through) and is oriented >> toward large block operations rather than the fine grained read/write model >> used by the bigdata journal. You would be much better off using a parallel >> file system [1]. There are several that would be suitable. In fact, AWS, >> recognizing the difference between a blob store and a true parallel file >> system, has recently release a parallel file system service. >> >> We are looking at a refactor to support "cloud" style blob stores, such >> as HDFS or S3. However that would be only for the bigdata federation, not >> an individual Journal file. The federation architecture is very >> different. Each file has a maximum size of ~ 200M. This gives them a good >> size for efficient block fetch from a blob store without excessive >> latency. With the refactor, the authoritative copy of the data will be in >> the cloud/blob store but the working copies will be cached on the instance >> nodes in the compute side of the cluster. >> >> Thanks, >> Bryan >> >> [1] >> http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems >> >> >> ------------------------------ >> *From:* tousif [mailto:to...@mo...] >> *Sent:* Monday, November 28, 2011 2:11 AM >> *To:* big...@li... >> *Subject:* [Bigdata-developers] is there a way to give hdfs path of jnl >> file in com.bigdata.journal.AbstractJournal.file >> >> >> >> >> >> -- >> Regards >> Tousif >> >> > > > -- > Regards > Tousif > > -- Regards Tousif |
|
From: Bryan T. <br...@sy...> - 2011-11-28 12:01:30
|
Yes, right now you need to deploy against local storage on the instance nodes, SAN, NAS, or a parallel file system. That will change with this refactor. I need to update our road map, but somewhere around the end of Q2 would be my guess. Thanks, Bryan ________________________________ From: tousif [mailto:to...@mo...] Sent: Monday, November 28, 2011 6:43 AM To: Bryan Thompson Subject: Re: [Bigdata-developers] is there a way to give hdfs path of jnl file in com.bigdata.journal.AbstractJournal.file Thank you Brayan. I wanted to cluster bigdata so was thinking to store jnl file in hdfs rather than nas. Since my jnl file is going to be bigger, I want to store it distributed. Is there any better way ? On Mon, Nov 28, 2011 at 4:51 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Bigdata journals are expected to be on real file systems. If you use FUSE to map HDFS into a file system then you could provide that path. However, the last time I looked, HDFS does not provide some important guarantees (such as flush actually flushing through) and is oriented toward large block operations rather than the fine grained read/write model used by the bigdata journal. You would be much better off using a parallel file system [1]. There are several that would be suitable. In fact, AWS, recognizing the difference between a blob store and a true parallel file system, has recently release a parallel file system service. We are looking at a refactor to support "cloud" style blob stores, such as HDFS or S3. However that would be only for the bigdata federation, not an individual Journal file. The federation architecture is very different. Each file has a maximum size of ~ 200M. This gives them a good size for efficient block fetch from a blob store without excessive latency. With the refactor, the authoritative copy of the data will be in the cloud/blob store but the working copies will be cached on the instance nodes in the compute side of the cluster. Thanks, Bryan [1] http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems ________________________________ From: tousif [mailto:to...@mo...<mailto:to...@mo...>] Sent: Monday, November 28, 2011 2:11 AM To: big...@li...<mailto:big...@li...> Subject: [Bigdata-developers] is there a way to give hdfs path of jnl file in com.bigdata.journal.AbstractJournal.file -- Regards Tousif -- Regards Tousif |
|
From: Bryan T. <br...@sy...> - 2011-11-28 11:49:15
|
Bigdata journals are expected to be on real file systems. If you use FUSE to map HDFS into a file system then you could provide that path. However, the last time I looked, HDFS does not provide some important guarantees (such as flush actually flushing through) and is oriented toward large block operations rather than the fine grained read/write model used by the bigdata journal. You would be much better off using a parallel file system [1]. There are several that would be suitable. In fact, AWS, recognizing the difference between a blob store and a true parallel file system, has recently release a parallel file system service. We are looking at a refactor to support "cloud" style blob stores, such as HDFS or S3. However that would be only for the bigdata federation, not an individual Journal file. The federation architecture is very different. Each file has a maximum size of ~ 200M. This gives them a good size for efficient block fetch from a blob store without excessive latency. With the refactor, the authoritative copy of the data will be in the cloud/blob store but the working copies will be cached on the instance nodes in the compute side of the cluster. Thanks, Bryan [1] http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems ________________________________ From: tousif [mailto:to...@mo...] Sent: Monday, November 28, 2011 2:11 AM To: big...@li... Subject: [Bigdata-developers] is there a way to give hdfs path of jnl file in com.bigdata.journal.AbstractJournal.file -- Regards Tousif |
|
From: tousif <to...@mo...> - 2011-11-28 07:36:47
|
-- Regards Tousif |
|
From: Bryan T. <br...@sy...> - 2011-09-27 19:55:34
|
This is a minor version release of bigdata(R). Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation). The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads. The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth. Both platforms support fully concurrent readers with snapshot isolation. Distributed processing offers greater throughput but does not reduce query or update latency. Choose the Journal when the anticipated scale and throughput requirements permit. Choose the Federation when the administrative and machine overhead associated with operating a cluster is an acceptable tradeoff to have essentially unlimited data scaling and throughput. See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and [3,5,6] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata, see [7]. Starting with the 1.0.0 release, we offer a WAR artifact [8] for easy installation of the single machine RDF database. For custom development and cluster installations we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse. You can also build the code using the ant script. The cluster installer requires the use of the ant script. You can download the WAR from: https://sourceforge.net/projects/bigdata/ You can checkout this release from: https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/BIGDATA_RELEASE_1_0_2 Feature summary: - Single machine data storage to 50 billion triples/quads (RWStore); - Clustered data storage is essentially unlimited; - Simple embedded and/or webapp deployment (NanoSparqlServer); - Triples, quads, or triples with provenance (SIDs); - 100% native SPARQL 1.0 evaluation with lots of query optimizations; - Fast RDFS+ inference and truth maintenance; - Fast statement level provenance mode (SIDs). The road map [3] for the next releases includes: - High-volume analytic query and SPARQL 1.1 query, including aggregations; - Simplified deployment, configuration, and administration for clusters; and - High availability for the journal and the cluster. Change log: 1.0.2 - https://sourceforge.net/apps/trac/bigdata/ticket/32 (Query time expansion of (foo rdf:type rdfs:Resource) drags in SPORelation for scale-out.) - https://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM "how to" in wiki and build.xml are out of date.) - https://sourceforge.net/apps/trac/bigdata/ticket/356 (Query not terminated by error.) - https://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.) - https://sourceforge.net/apps/trac/bigdata/ticket/361 (IRunningQuery not closed promptly.) - https://sourceforge.net/apps/trac/bigdata/ticket/371 (DataLoader fails to load resources available from the classpath.) - https://sourceforge.net/apps/trac/bigdata/ticket/376 (Support for the streaming of bigdata IBindingSets into a sparql query.) - https://sourceforge.net/apps/trac/bigdata/ticket/378 (ClosedByInterruptException during heavy query mix.) - https://sourceforge.net/apps/trac/bigdata/ticket/379 (NotSerializableException for SPOAccessPath.) - https://sourceforge.net/apps/trac/bigdata/ticket/382 (Change dependencies to Apache River 2.2.0) 1.0.1 - https://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema names in the sparse row store). - https://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should use more bits for scale-out). - https://sourceforge.net/apps/trac/bigdata/ticket/225 (OSX requires specialized performance counter collection classes). - https://sourceforge.net/apps/trac/bigdata/ticket/348 (BigdataValueFactory.asValue() must return new instance when DummyIV is used). - https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance). - https://sourceforge.net/apps/trac/bigdata/ticket/351 (SPO not Serializable exception in SIDS mode (scale-out)). - https://sourceforge.net/apps/trac/bigdata/ticket/352 (ClassCastException when querying with binding-values that are not known to the database). - https://sourceforge.net/apps/trac/bigdata/ticket/353 (UnsupportedOperatorException for some SPARQL queries). - https://sourceforge.net/apps/trac/bigdata/ticket/355 (Query failure when comparing with non materialized value). - https://sourceforge.net/apps/trac/bigdata/ticket/357 (RWStore reports "FixedAllocator returning null address, with freeBits".) - https://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.) - https://sourceforge.net/apps/trac/bigdata/ticket/362 (log4j - slf4j bridge.) Note: Some of these bug fixes in the 1.0.1 release require data migration. For details, see https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration For more information about bigdata, please see the following links: [1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page [2] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=GettingStarted [3] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Roadmap [4] http://www.bigdata.com/bigdata/docs/api/ [5] http://sourceforge.net/projects/bigdata/ [6] http://www.bigdata.com/blog [7] http://www.systap.com/bigdata.htm [8] https://sourceforge.net/projects/bigdata/files/bigdata/ About bigdata: Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata(r) may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata(r) RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance. |
|
From: Bryan T. <br...@sy...> - 2011-08-05 10:16:21
|
All, I've published a high level guide to performance tuning for queries on the blog [1]. Thanks, Bryan [1] http://www.bigdata.com/bigdata/blog/?p=281 |