sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.

Features

  • Connects to Spark via YARN, Mesos, Kubernetes, Livy or local mode
  • Enables dplyr-style data transformation on Spark DataFrames
  • Supports SQL queries and ML pipelines (ml_* API)
  • Includes tools for distributed computing, window functions, streaming
  • Extensible with packages like sparkxgb, graphframes, H2O
  • Handles reading/writing CSV, Parquet, JSON, and caching operations

Project Samples

Project Activity

See All Activity >

Categories

Data Management

License

Apache License V2.0

Follow sparklyr

sparklyr Web Site

Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services Icon
$300 in Free Credit Towards Top Cloud Services

Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Get Started
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of sparklyr!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

R

Related Categories

R Data Management System

Registered

2025-07-30