Cascalog is a powerful Clojure (and Java) data processing and querying library built atop Hadoop (via Cascading), providing a high-level, Datalog-inspired abstraction for both big data processing and local computation. Cascalog is hosted at Clojars, and some of its dependencies are hosted at Conjars. Both Clo/Con-jars are maven repos that's easy to use with maven or leiningen. The Cascalog website contains more information and links to Various articles and tutorials. The best way to get started with Cascalog is experiment with the toy datasets that ship with the project.
Features
- Expressive, Datalog-like query language that runs on Hadoop or locally
- Simplified abstraction over Cascading to avoid low-level Hadoop complexity
- Seamless handling of distributed Big Data workflows
- Pure Java API (JCascalog) available for Java integration and experimentation
- Useful for prototyping data flows that scale from local tests to production clusters
- Draws inspiration from existing tools like Pig, Hive, and Cascading while providing richer abstraction
Categories
Data ManagementLicense
MIT LicenseFollow Cascalog
Other Useful Business Software
MongoDB Atlas runs apps anywhere
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of Cascalog!