Open-source enterprise-proven platform for Big Data analysisRead More
HPCC Systems from LexisNexis Risk Solutions offers an open source, proven, data-intensive supercomputing platform designed for the enterprise to process & solve Big Data analytical problems. As an alternative to legacy technology, HPCC Systems is comprised of a single architecture, a consistent data-centric programming language, & two processing platforms: the Thor Data Refinery Cluster & the Roxie Rapid Data Delivery Cluster. The core of the technology platform is the Enterprise Control Language (ECL), which is a declarative, data-centric programming language optimized for large-scale data management & query processing. ECL is used to express data algorithms across the entire HPCC Systems platform. The built-in analytics libraries for Machine Learning & BI integration provide a complete integrated solution from data ingestion & data processing to data delivery. Both Thor & Roxie are offered on AWS & can be configured through the Instant Cloud Solution.
The download here is a VM.
- The HPCC Systems Data Refinery engine (Thor) helps clean, link, transform and analyze Big Data. Thor supports ETL (Extraction, Transformation and Loading) functions like ingesting unstructured/structured data out, data profiling, data hygiene, and data linking out of the box. In addition, Thor supports flexible record oriented data structures.
- The HPCC Systems Data Delivery engine (Roxie) provides highly concurrent and low latency real time query capability. The Thor processed data can be accessed by a large number of users concurrently in real time fashion using the Roxie. The Roxie queries are typically complex and could include embedded rules logic.
- The programming language, Enterprise Control Language (ECL), is used to program both the data processing jobs on Thor and the queries on Roxie. ECL is a declarative, implicitly parallel and data flow oriented programming language that abstracts complex data processing tasks by providing a simple programming interface.
- Free online, self-paced training on ECL programming language intro classes http://learn.lexisnexis.com/hpcc
- The benefits of the HPCC Systems platform can be understood by two words: Speed and Scale. Learn more at http://hpccsystems.com/why-hpcc/benefits
- See more features at http://hpccsystems.com/Why-HPCC/features
KEEP ME UPDATEDGet project updates, sponsored content from our select
partners, and more.
This has so far been the most fun and satisfying experience I have had when it comes to adopting a system for your everyday work. The tagline says it all, 'Simple, Fast, Scalable'. I have worked on datasets with billions of records in computing scenarios as complex as would make a Hadoop developer cringe. Granted that it uses a declarative language, a paradigm which is not the most familiar to programmers in 2013, and the system has its own non-standard quirks, it is still a platform which makes you feel at ease.
I give this a five star rating! My personal experience has been that I was able to download the HPCC VM image and get started with loading and transforming data in a few minutes. Thanks to their central deployment tool, setting up a 3 node Ubuntu cluster was really simple. The inherent parallelism and data flow nature of the powerful ECL language removes the worry about trying to parallelize my jobs, as was the case in my experience with Hadoop MapReduce. In fact, I have to say ECL is somewhat similar to SQL from the perspective both are declarative data programming languages. So if you are a good SQL developer, ECL should be a breeze to understand and use. It is a mature platform and provides for a data delivery engine together with a data transformation and linking system. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model.