Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

Features

  • This project is licensed under the Apache-2.0 License
  • The simplest way to get started with the SageMaker Spark Container is to use the pre-built images via the SageMaker Python SDK
  • To get started building and testing the SageMaker Spark container, you will have to setup a local development environment
  • Many available SageMaker Spark Images
  • Build the pre-built container images that are used when running Spark jobs on Amazon SageMaker
  • It provides high-level APIs in Scala, Java, Python, and R

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow SageMaker Spark Container

SageMaker Spark Container Web Site

Other Useful Business Software
Deploy Apps in Seconds with Cloud Run Icon
Deploy Apps in Seconds with Cloud Run

Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
Try Cloud Run Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of SageMaker Spark Container!

Additional Project Details

Programming Language

Python

Related Categories

Python Frameworks, Python Business Performance Management Software, Python Data Analytics Tool, Python Stream Processing Tool

Registered

2022-07-04