Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

Features

  • This project is licensed under the Apache-2.0 License
  • The simplest way to get started with the SageMaker Spark Container is to use the pre-built images via the SageMaker Python SDK
  • To get started building and testing the SageMaker Spark container, you will have to setup a local development environment
  • Many available SageMaker Spark Images
  • Build the pre-built container images that are used when running Spark jobs on Amazon SageMaker
  • It provides high-level APIs in Scala, Java, Python, and R

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow SageMaker Spark Container

SageMaker Spark Container Web Site

Other Useful Business Software
Cut Cloud Costs with Google Compute Engine Icon
Cut Cloud Costs with Google Compute Engine

Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
Try Compute Engine
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of SageMaker Spark Container!

Additional Project Details

Programming Language

Python

Related Categories

Python Frameworks, Python Business Performance Management Software, Python Data Analytics Tool, Python Stream Processing Tool

Registered

2022-07-04