CoreWeave
CoreWeave is a cloud infrastructure provider specializing in GPU-based compute solutions tailored for AI workloads. The platform offers scalable, high-performance GPU clusters that optimize the training and inference of AI models, making it ideal for industries like machine learning, visual effects (VFX), and high-performance computing (HPC). CoreWeave provides flexible storage, networking, and managed services to support AI-driven businesses, with a focus on reliability, cost efficiency, and enterprise-grade security. The platform is used by AI labs, research organizations, and businesses to accelerate their AI innovations.
Learn more
Amazon EC2 Trn2 Instances
Amazon EC2 Trn2 instances, powered by AWS Trainium2 chips, are purpose-built for high-performance deep learning training of generative AI models, including large language models and diffusion models. They offer up to 50% cost-to-train savings over comparable Amazon EC2 instances. Trn2 instances support up to 16 Trainium2 accelerators, providing up to 3 petaflops of FP16/BF16 compute power and 512 GB of high-bandwidth memory. To facilitate efficient data and model parallelism, Trn2 instances feature NeuronLink, a high-speed, nonblocking interconnect, and support up to 1600 Gbps of second-generation Elastic Fabric Adapter (EFAv2) network bandwidth. They are deployed in EC2 UltraClusters, enabling scaling up to 30,000 Trainium2 chips interconnected with a nonblocking petabit-scale network, delivering 6 exaflops of compute performance. The AWS Neuron SDK integrates natively with popular machine learning frameworks like PyTorch and TensorFlow.
Learn more
Amazon EC2 Trn1 Instances
Amazon Elastic Compute Cloud (EC2) Trn1 instances, powered by AWS Trainium chips, are purpose-built for high-performance deep learning training of generative AI models, including large language models and latent diffusion models. Trn1 instances offer up to 50% cost-to-train savings over other comparable Amazon EC2 instances. You can use Trn1 instances to train 100B+ parameter DL and generative AI models across a broad set of applications, such as text summarization, code generation, question answering, image and video generation, recommendation, and fraud detection. The AWS Neuron SDK helps developers train models on AWS Trainium (and deploy models on the AWS Inferentia chips). It integrates natively with frameworks such as PyTorch and TensorFlow so that you can continue using your existing code and workflows to train models on Trn1 instances.
Learn more
TensorFlow
An end-to-end open source machine learning platform. TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Build and train ML models easily using intuitive high-level APIs like Keras with eager execution, which makes for immediate model iteration and easy debugging. Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no matter what language you use. A simple and flexible architecture to take new ideas from concept to code, to state-of-the-art models, and to publication faster. Build, deploy, and experiment easily with TensorFlow.
Learn more