Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark. Elephas currently supports a number of applications. Elephas brings deep learning with Keras to Spark. Elephas intends to keep the simplicity and high usability of Keras, thereby allowing for fast prototyping of distributed models, which can be run on massive data sets. Elephas implements a class of data-parallel algorithms on top of Keras, using Spark's RDDs and data frames. Keras Models are initialized on the driver, then serialized and shipped to workers, alongside with data and broadcasted model parameters. Spark workers deserialize the model, train their chunk of data and send their gradients back to the driver. The "master" model on the driver is updated by an optimizer, which takes gradients either synchronously or asynchronously. Hyper-parameter optimization with elephas is based on hyperas, a convenience wrapper for hyperopt and keras.
Features
- After installing both Elephas, you can train a model
- Create an RDD from numpy arrays
- The basic model in Elephas is the SparkModel
- Train a model with a SparkML estimator on a data frame
- Distributed hyper-parameter optimization
- Distributed training of ensemble models