From: <tho...@us...> - 2014-01-23 11:50:46
|
Revision: 7820 http://bigdata.svn.sourceforge.net/bigdata/?rev=7820&view=rev Author: thompsonbry Date: 2014-01-23 11:50:35 +0000 (Thu, 23 Jan 2014) Log Message: ----------- Removed the stale overview.html. This has been replaced by a 25 page whitepaper linked from the blog [1]. Modified the README to point people to the release notes. The release notes have pointers for getting started with the platform. Moved the JINI README into bigdata-jini/src/resources. [1] http://www.bigdata.com/whitepapers/bigdata_architecture_whitepaper.pdf Modified Paths: -------------- branches/BIGDATA_RELEASE_1_3_0/README Added Paths: ----------- branches/BIGDATA_RELEASE_1_3_0/bigdata-jini/src/resources/README-JINI Removed Paths: ------------- branches/BIGDATA_RELEASE_1_3_0/overview.html Modified: branches/BIGDATA_RELEASE_1_3_0/README =================================================================== --- branches/BIGDATA_RELEASE_1_3_0/README 2014-01-23 11:33:50 UTC (rev 7819) +++ branches/BIGDATA_RELEASE_1_3_0/README 2014-01-23 11:50:35 UTC (rev 7820) @@ -0,0 +1,4 @@ +Please see the release notes in bigdata/src/releases for getting started +links. This will point you to the installation instructions for the +different deployment modes, the online documentation, the wiki, etc. It +will also point you to resources for support, subscriptions, and licensing. Copied: branches/BIGDATA_RELEASE_1_3_0/bigdata-jini/src/resources/README-JINI (from rev 7775, branches/BIGDATA_RELEASE_1_3_0/README-JINI) =================================================================== --- branches/BIGDATA_RELEASE_1_3_0/bigdata-jini/src/resources/README-JINI (rev 0) +++ branches/BIGDATA_RELEASE_1_3_0/bigdata-jini/src/resources/README-JINI 2014-01-23 11:50:35 UTC (rev 7820) @@ -0,0 +1,160 @@ +Some notes on installation and use follow: + +JINI + +- jini is used as a service fabric for bigdata. <start up jini and + then configure your data and metadata services; clients then + discover those services> + +- jini 2.1 may report errors locating the shared libraries from awk, + dirname, basename and grep when installing under un*x. The problem + is an assumption about the kernel version. This problem is resolved + by editing the installer and the launchall script. + See http://www.jini.org/wiki/Category:Getting_Started and + http://www.linuxquestions.org/questions/showthread.php?t=370056 for + a resolution. Here is is in case that link goes away: + + Open the bin installer file in an editor. Look for the line + + export LD_ASSUME_KERNEL=2.2.5 + + and replace it with + + #xport LD_ASSUME_KERNEL=2.2.5 + + Save the file and launch. + + Once jini is installed, you need to do exactly the same thing for the + Launch-All script in the installverify directory - this is the script + that you use the start jini. + +- Here are some notes on getting things working on a Fedora Core 6 platform. + The symptom was that ifconfig was reporting MULTICAST for the interface + but the jini install was complaining that multicase was not enabled for + that interface. + + Here's what I did: + + First, verify that multicast is enabled on eth0 by typing ifconfig and looking for multicast + + if it is not enabled type ifconfig eth0 multicast + + after that add a default route for multicast broadcasts and bind it to eth0 + + route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0 + + additionally, I disabled the firewall and I disabled SELinux (but I think the firewall was the big culprit here). + +- Downloadable code is NOT required for deployments, but MAY be useful + for the following purposes: + + (a) exposing services to the Jini service browser; + + (b) running procedures against services which were deployed before + the procedures were written; + + If you have a complete classpath when running the various services + then jini will not seek to transfer code from the client as the code + will be resolved locally by the service. + + In order to support downloadable code you need to have an HTTP + server that will serve class files to jini, including the interfaces + for all remote services. You can use any HTTP server for this + purpose, and the server can be located on any machine accessible + from the host(s) on which you are running jini. As a convenience, + jini bundles a simple HTTP server that you can start using a command + like the following: + + java -jar ${JINI_HOME}/lib/classserver.jar -port 8080 -dir classes -trees -verbose& + + The javadoc describes the logging and command line options for this + HTTP server. + + https://java.sun.com/products/jini/2.1/doc/api/com/sun/jini/tool/ClassServer.html + + The directory from which downloadable code will be served should + contain at least the bigdata jar(s) plus any remote application code + that you have defined (i.e., code that will run in the server + process). + + The recommended approach to downloadable code is to extract the + relevant classes into a directory that will be named to the HTTP + server as follows. Assuming that bigdata.jar is located in the + current directory: + + mkdir classes + cd classes + jar xfz ../bigdata.jar + + If you deploy a new version of any JAR, then you SHOULD delete the + classes directory and redeploy all relevant JARs to make sure that + old class files are not left lying around. + +- You can enable NIO support with JERI using TCP by specifying the + following property to the JVM. Note that JRMP does NOT allow for + the possibility of NIO. + + -Dcom.sun.jini.jeri.tcp.useNIO=true + + More information on JERI and NIO is available using the following links. + + http://archives.java.sun.com/cgi-bin/wa?A2=ind0504&L=jini-users&P=33490 + http://archives.java.sun.com/cgi-bin/wa?A2=ind0506&L=jini-users&P=9626 + http://archives.java.sun.com/cgi-bin/wa?A2=ind0504&L=jini-users&D=0&P=26542 + http://java.sun.com/products/jini/2.0.1/doc/api/net/jini/jeri/tcp/package-summary.html + + Note that one server thread will still be required per concurrent RPC request + owing to the semantics of RPC (call and wait for response) and the definition + of JERI. + +- Clients downloadable code that will be run on the bigdata services MUST set: + + -Djava.rmi.server.codebase=http://.../ + + where "..." is your host and path + + in order for the correct codebase property to be communicated to clients that + will then download code from that HTTP server. Note: the trailing '/' is + REQUIRED in your codebase or the generated URLs will NOT resolve correctly. + + There is an example of how to do this with the "ant lubm-install" target and + the "lubmMaster.sh" script. + +- Debugging with jini. + + See http://java.sun.com/j2se/1.4.2/docs/guide/rmi/javarmiproperties.html for + some guidance. Among other things, it suggests: + + -Djava.rmi.server.logCalls=true + + as an aid to debugging. Also try setting + + -Dcom.sun.jini.reggie.proxy.debug=1 + + for the client, e.g., the service browser. Also see: + + http://www.adtmag.com/java/articleold.aspx?id=1159 + + for some (very good) guidance in debugging jini services. + + Note: You may have to restart jini locally in order to force download of + updated classes from the codebase! + + See http://archives.java.sun.com/cgi-bin/wa?A2=ind0512&L=jini-users&P=R391&I=-3 + for instructions on setting up an "download jar" (dljar) ANT task that can make + life much simpler (one supposes). + + See http://archives.java.sun.com/cgi-bin/wa?A2=ind0311&L=jini-users&F=&S=&P=7182 + for a description of policy files and + http://www.dancres.org/cottage/jini-start-examples-2_1.zip for the + policy files described. + + See http://jan.newmarch.name/java/jini/tutorial/Ant.xml for a description of + one (simple) approach to using ant for jini projects (it does not use the + dljar ant task but explicitly enumerates what goes where). + + See http://jan.newmarch.name/java/jini/tutorial/TroubleShooting.xml#RMI%20Stubs + for common errors when using RMI stubs. + + See https://java.sun.com/products/jini/2.1/doc/api/com/sun/jini/example/browser/package-summary.html + for the dirty on the jini Service Browser. Deleted: branches/BIGDATA_RELEASE_1_3_0/overview.html =================================================================== --- branches/BIGDATA_RELEASE_1_3_0/overview.html 2014-01-23 11:33:50 UTC (rev 7819) +++ branches/BIGDATA_RELEASE_1_3_0/overview.html 2014-01-23 11:50:35 UTC (rev 7820) @@ -1,422 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" -"http://www.w3.org/TR/html4/loose.dtd"> -<html> -<head> -<meta http-equiv="Content-Type" content="text/html;charset=utf-8" > -<title>bigdata®</title> -<!-- $Id$ --> -</head> -<body> - - <p> - - <em>bigdata®</em> is a scale-out data and computing - fabric designed for commodity hardware. The bigdata - architecture provides named scale-out indices that are - transparently and dynamically key-range partitioned and - distributed across a cluster or grid of commodity server - platforms. The scale-out indices are B+Trees and remain - balanced under insert and removal operations. Keys and - values for btrees are variable length byte[]s (the keys are - interpreted as unsigned byte[]s). - - Atomic "row" operations are supported for very high - concurrency. However, full transactions are also available - for applications needing less concurrency but requiring - atomic operations that read or write on more than one row, - index partition, or index. - - Writes are absorbed on mutable btree instances in append - only "journals" of ~200M capacity. On overflow, data in a - journal is evicted onto read-optimized, immutable "index - segments". The metadata service manages the index - definitions, the index partitions, and the assignment of - index partitions to data services. A data service - encapsulates a journal and zero or more index partitions - assigned to that data service, including the logic to handle - overflow. - -</p><p> - - A deployment is made up of a transaction service, a load - balancer service, and a metadata service (with failover - redundancy) and many data service instances. bigdata can - provide data redundancy internally (by pipelining writes - bound for a partition across primary, secondary, ... data - service instances for that partition) or it can be deployed - over NAS/SAN. bigdata itself is 100% Java and requires a - JDK 1.6. There are additional dependencies for the - installer (un*x) and for collecting performance counters - from the OS (<a - href="http://pagesperso-orange.fr/sebastien.godard/">sysstat</a>). - - </p> - - <h2>Architecture</h2> - - <p> - - The bigdata SOA defines three essential services and some - additional services. The essential services are the - metadata service (provides a locator service for index - partitions on data services), the data service (provides - read, write, and concurrency control for index partitions), - and the transaction service (provides consistent timestamps - for commits, facilitates the release of data associated with - older commit points, read locks, and transactions). Full - transactions are NOT required, so you can use bigdata as a - scale-out row store. The load balancer service guides the - dynamic redistribution of data across a bigdata federation. - There are also client services, which are containers for - executing distributed jobs. - - </p> - - <p> - - While other service fabric architectures are contemplated, - bigdata services today use JINI 2. to advertise themselves - and do service discovery. This means that you must be - running a JINI registrar in order for services to be able to - register themselves or discover other services. The JINI - integration is bundled and installed automatically by - default. - - </p> - - <p> - - Zookeeper handles master election, configuration management - and global synchronous locks. Zookeeper was developed by - Yahoo! as a distributed lock and configuration management - service and is now an Apache subproject (part of - Hadoop). Among other things, it gets master election - protocols right. Zookeeper is bundled and installed - automatically by default. - - </p> - - <p> - - The main building blocks for the bigdata architecture are - the journal (both an append-only persistence store and a recently - introduced read/write store with the ability to recycle historical - commit points), the mutable - B+Tree (used to absorb writes), and the read-optimized - immutable B+Tree (aka the index segment). Highly efficient - bulk index builds are used to transfer data absorbed by a - mutable B+Tree on a journal into index segment files. Each - for index segment contains data for a single partition of a - scale-out index. In order to read from an index partition, - a consistent view is created by dynamically fusing data for - that index partition, including any recent writes on the - current journal, any historical writes that are in the - process of being transferred onto index segments, and any - historical index segments that also contain data for that - view. Periodically, index segments are merged together, at - which point deleted tuples are purged from the view. - -</p> - -<p> - - Bigdata periodically releases data associated with older - commit points, freeing up disk resources. The transaction - service is configured with a minimum release age in - milliseconds. This can be ZERO (0L) milliseconds, in which - case historical views may be released if there are no read - locks for that commit point. The minimum release age can - also be hours or days if you want to keep historical states - around for a while. When a data service overflows, it will - consult the transaction service to determine the effective - release time and release any old journals or index segments - no longer required to maintain views GT that release time. -</p><p> - An immortal or temporal database can be realized by - specifying Long#MAX_VALUE for the minimum release age. In - this case, the old journals and index segments will all be - retained and you can query any historical commit point of - the database at any time. - - </p><p> - - An detailed architecture whitepaper for bigdata is posted only and - linked from our <a href="http://www.bigdata.com/blog">blog</a>. - - </p> - - <h2>Sparse row store</h2> - - <p> - - The <em>SparseRowStore</em> provides a column flexible row - store similar to Google's bigtable or HBase, including very - high read/write concurrency and ACID operations on logical - "rows". Internally, a "global row store" instance is used - to maintain metadata on relations declared within a bigdata - federation. You can use this instance to store your own - data, or you can create your own named row store instances. - However, there is no REST api for the row store at this time - (trivial to be sure, but not something that we have gotten - around to yet). - -</p><p> - - In fact, it is trivial to realize bigtable semantics with - bigdata - all you need to do is exercise a specific protocol - when forming the keys for your scale-out indices and then - you simply choose to NOT use transactions. A bigtable style - key-value is formed as: - - </p> - - <pre> - - [columnFamily][primaryKey][columnName][timestamp} : [value] - - </pre> - - <p> - - By placing the column family identifier up front, all data - in the same column family will be clustered together by the - index. The next component is the "row" identifier, what you - would think of as the primary key in a relational table. - The column name comes next - only column names for non-null - columns are written into the index. Finally, there is a - timestamp column that is used either to record a timestamp - specified by the application or a datum write time. The - value associated with the key is simply the datum for that - column in that row. The use of nul byte separators makes it - possible to parse the key, which is required for various - operations including index partition splits and filtering - key scans based on column names or timestamps. See the - <em>KeyBuilder</em> class in - <code>com.bigdata.btree.keys</code> for utilities that may - be used to construct keys for variety of data types. - - </p> - - <h2>Map/reduce, Asynchronous Write API, and Query</h2> - - <p> - - Google's map/reduce architecture has received a lot of - attention, along with its bigtable architecture. Map/reduce - provides a means to transparently decompose processing - across a cluster. The "map" process examines a series of - key-value pairs, emitting a set of intermediate key-value - pairs for each input. Those intermediate key-values are - then hashed (module R) onto R reduce processes. The inputs - for the reduce processes are pre-sorted. The reduce process - then runs some arbitrary operation on the sorted data, such - as computing an inverted index file or loading the data into - a scale-out index. - - </p> - - <p> - - bigdata® supports an <em>asynchronous index write - API</em>, which delivers extremely high throughput for - scattered writes. While map/reduce is tremendously - effective when there is good locality in the data, it is not - the right tool for processing ordered data. Instead, you - execute a master job, which spawns client(s) running in - <em>client service</em>(s) associated with the bigdata - federation. Those clients process data, writing onto - blocking buffers. The writes are automatically split and - buffered for each key-range shard. This maximizes the chunk - size for ordered writes and provides a tremendous throughput - boost. Bigdata can work well in combination with - map/reduce. The basic paradigm is you use map/reduce jobs - to generate data, which is then bulk loaded into a bigdata - federation using bigdata jobs and the asynchronous write - API. - - </p><p> - - bigdata® has built in support for distributed rule - execution. This can be used for high-level query or for - materializing derived views, including maintaining the RDFS+ - closure of a semantic web database. The implementation is - highly efficient and propagates binding sets to the data - service for each key-range shard touched by the query, so - the JOINs happen right up against the data. Unlike using - map/reduce for join processing, bigdata query processing is - very low latency. Distributed query execution can be - substantially faster than local query execution, even for - low-latency queries. - - </p> - - <h2>Standalone Journals</h2> - - <p> - - While bigdata® is targeted at scale-out federations, - it can also be deployed as simple persistence store using - just the <code>com.bigdata.journal.Journal</code> API. - - </p><p> - - The read-write (RWStore) version of the journal can scale up to 50 billion - triples or quads and is - capable of reclaiming storage by releasing historical commit points, aging - them out of the backing file in a manner very similar to how the scale-out - database releases historical commit points. The read/write store is good - for standalone database instances, especially when the data have a lot of - skew and when the new data are arriving continually while older data should - be periodically released (for example, in a monitoring application where - historical events may be released after 30 days). - - </p><p> - - The read/write store is - also used in the scale-out architecture for the transaction manager and the - aggregated performance counters. However, the data services use a WORM - store to buffer writes, asynchronously migrate the buffered writes onto - read-optimized B+Tree files using index segments builds and compacting. - One an index partition (aka shard) reaches ~ 200MB on the disk (dynamic - sharding). Index partitions are moved from time to time to load balance - the cluster. - - </p> - - <h2>Status</h2> - - <p> - - bigdata® is a petabyte scale database architecture. It has been - tested on clusters of up to 16 nodes. We have loaded data sets of 10B+ - rows, at rates of over 300,000 rows per second. Overall, 100s of billions - of rows have been put down safely on disk. - - </p> - - <ul> - - <li>Hadoop integration points, including a REST API for the sparse row - store and scanners for HDFS files making easier to target bigdata - distributed jobs from Hadoop map/reduce jobs.</li> - - <li>Online backup and point in time recovery.</li> - - <li>Full distributed read/write transaction support (read-only transaction - support is done, but we still have some work to do on the commit protocol - for read-write transactions).</li> - - <li>OODBMS. We will be introducing an OODBM layer based on the - Generic Object Model shortly. This will be layered over the RDF - database and will have bindings and clients for Java, JSON, and - other client environments.</li> - - </ul> - - <h2>Getting Started</h2> - - <p> - - See the wiki for <a - href="http://bigdata.wiki.sourceforge.net/GettingStarted">Getting - Started</a> and our <a - href="http://www.bigdata.com/bigdata/blog/">blog</a> for - what's new. The javadoc is <a - href="http://www.bigdata.com/bigdata/docs/api/">online</a> - and you can also build it with the ant script. If you have a - question, please post it on the blog or the forum. - - </p> - -<h2>Getting Involved</h2> - -<p> - - bigdata® is an open source project. Contributors and - contributions are welcome. Like most open source project, - contributions must be submitted under a contributor - agreement, which must be signed by someone with the - appropriate authority. This is necessary to ensure that the - code base remains open. - - </p><p> - - If you want to help out, please check out what is going on - our <a href="http://www.bigdata.com/bigdata/blog/">blog</a> - and on the <a - href="https://sourceforge.net/projects/bigdata/">main project - site</a>. Post your questions and we will help you figure - out where you can contribute or how to create that new - feature that you need. - - </p> - - <h2>Licenses and Services</h2> - - <p> - - bigdata® is distributed under GPL(v2). SYSTAP, LLC - offers commercial licenses for customers who either want the - value add (warranty, technical support, additional - regression testing), who want to redistribute bigdata with - their own commercial products, or who are not "comfortable" - with the GPL license. For inquiries or further information, - please write <a - href="mailto:lic...@bi...">lic...@bi...</a>. - - </p><p> - - Please let us know if you need specific feature development - or help in applying bigdata® to your problem. We are - especially interested in working directly with people who - are trying to handle massive data, especially for the - semantic web. Please <a - href="http://www.systap.com/contact.htm">contact us</a> - directly. - - </p> - - <h2>Related links</h2> - - <dl> - - <dt>CouchDB</dt> - <dd>http://couchdb.org/CouchDB/CouchDBWeb.nsf/Home?OpenForm</dd> - - <dt>bigtable</dt> - <dd>http://labs.google.com/papers/bigtable.html, http://www.techcrunch.com/2008/04/04/source-google-to-launch-bigtable-as-web-service/</dd> - - <dt>map/reduce</dt> - <dd>http://labs.google.com/papers/mapreduce.html</dd> - - <dt>Hadoop</dt> - <dd>http://lucene.apache.org/hadoop/</dd> - - <dt>Zookeeper</dt> - <dd>http://hadoop.apache.org/zookeeper/</dd> - - <dt>Jini/River</dt> - <dd>http://www.jini.org/wiki/Main_Page, http://incubator.apache.org/river/RIVER/index.html</dd> - - <dt>Pig</dt> - <dd>http://research.yahoo.com/node/90</dd> - - <dt>Sawzall</dt> - <dd>http://labs.google.com/papers/sawzall.html</dd> - - <dt>Boxwood</dt> - <dd>http://research.microsoft.com/research/sv/Boxwood/</dd> - - <dt>Blue Cloud</dt> - <dd>http://www.techcrunch.com/2007/11/15/ibms-blue-cloud-is-web-computng-by-another-name/</dd> - - <dt>SimpleDB</dt> - <dd>http://www.techcrunch.com/2007/12/14/amazon-takes-on-oracle-and-ibm-with-simple-db-beta/</dd> - - <dt>mg4j</dt> - <dd>http://mg4j.dsi.unimi.it/</dd> - - </dl> - -</body> -</html> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |