sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.
Features
- Connects to Spark via YARN, Mesos, Kubernetes, Livy or local mode
- Enables dplyr-style data transformation on Spark DataFrames
- Supports SQL queries and ML pipelines (ml_* API)
- Includes tools for distributed computing, window functions, streaming
- Extensible with packages like sparkxgb, graphframes, H2O
- Handles reading/writing CSV, Parquet, JSON, and caching operations
Categories
Data ManagementLicense
Apache License V2.0Follow sparklyr
Other Useful Business Software
Go From AI Idea to AI App Fast
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of sparklyr!