sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.
Features
- Connects to Spark via YARN, Mesos, Kubernetes, Livy or local mode
- Enables dplyr-style data transformation on Spark DataFrames
- Supports SQL queries and ML pipelines (ml_* API)
- Includes tools for distributed computing, window functions, streaming
- Extensible with packages like sparkxgb, graphframes, H2O
- Handles reading/writing CSV, Parquet, JSON, and caching operations
Categories
Data ManagementLicense
Apache License V2.0Follow sparklyr
Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services
Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of sparklyr!