SageMaker Spark

SageMaker Spark is an open-source Spark library for Amazon SageMaker. With SageMaker Spark you construct Spark ML Pipelines using Amazon SageMaker stages. These pipelines interleave native Spark ML stages and stages that interact with SageMaker training and model hosting. With SageMaker Spark, you can train on Amazon SageMaker from Spark DataFrames using Amazon-provided ML algorithms like K-Means clustering or XGBoost, and make predictions on DataFrames against SageMaker endpoints hosting your trained models, and, if you have your own ML algorithms built into SageMaker compatible Docker containers, you can use SageMaker Spark to train and infer on DataFrames with your own algorithms -- all at Spark scale. SageMaker Spark depends on hadoop-aws-2.8.1. To run Spark applications that depend on SageMaker Spark, you need to build Spark with Hadoop 2.8. However, if you are running Spark applications on EMR, you can use Spark built with Hadoop 2.7.

Features

SageMaker Spark needs to be added to both the driver and executor classpaths
You can run SageMaker Spark applications on an EMR cluster
EMR allows you to read and write data using the EMR FileSystem
Create your Spark Session and load your training and test data into DataFrames
SageMaker Spark provides several classes that extend SageMakerEstimator to run particular algorithms
Use SageMakerEstimator and SageMakerModel in a Spark Pipeline

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow SageMaker Spark

SageMaker Spark Web Site

User Reviews

Be the first to post a review of SageMaker Spark!

Additional Project Details

Programming Language

Scala

Related Categories

Scala Libraries

Registered

2022-07-11

Similar Business Software

MathJax

A JavaScript display engine for mathematics that works in all browsers. Beautiful and accessible math in all browsers No more setup for readers, it just works. MathJax provides tools to transform your content from traditional print sources into modern, accessible web content and ePubs. The...

See Software
Leaflet

Leaflet is the leading open-source JavaScript library for mobile-friendly interactive maps. Weighing just about 42 KB of JS, it has all the mapping features most developers ever need. Leaflet is designed with simplicity, performance, and usability in mind. It works efficiently across all major...

See Software
JsPHP

The free open-source JsPHP library. The JsPHP website is a 100% free community resource that provides a collaborative platform and web-based Integrated Development Environment (IDE) for building an open-source JavaScript library called JsPHP that provides an implementation of the PHP API for...

See Software

Report inappropriate content

SageMaker Spark

A Spark library for Amazon SageMaker

Features

Project Samples

Project Activity

Categories

License

Follow SageMaker Spark

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered