Amazon EC2 UltraClusters
Amazon EC2 UltraClusters enable you to scale to thousands of GPUs or purpose-built machine learning accelerators, such as AWS Trainium, providing on-demand access to supercomputing-class performance. They democratize supercomputing for ML, generative AI, and high-performance computing developers through a simple pay-as-you-go model without setup or maintenance costs. UltraClusters consist of thousands of accelerated EC2 instances co-located in a given AWS Availability Zone, interconnected using Elastic Fabric Adapter (EFA) networking in a petabit-scale nonblocking network. This architecture offers high-performance networking and access to Amazon FSx for Lustre, a fully managed shared storage built on a high-performance parallel file system, enabling rapid processing of massive datasets with sub-millisecond latencies. EC2 UltraClusters provide scale-out capabilities for distributed ML training and tightly coupled HPC workloads, reducing training times.
Learn more
AWS ParallelCluster
AWS ParallelCluster is an open-source cluster management tool that simplifies the deployment and management of High-Performance Computing (HPC) clusters on AWS. It automates the setup of required resources, including compute nodes, a shared filesystem, and a job scheduler, supporting multiple instance types and job submission queues. Users can interact with ParallelCluster through a graphical user interface, command-line interface, or API, enabling flexible cluster configuration and management. The tool integrates with job schedulers like AWS Batch and Slurm, facilitating seamless migration of existing HPC workloads to the cloud with minimal modifications. AWS ParallelCluster is available at no additional charge; users only pay for the AWS resources consumed by their applications. With AWS ParallelCluster, you can use a simple text file to model, provision, and dynamically scale the resources needed for your applications in an automated and secure manner.
Learn more
UberCloud
Simr (formerly UberCloud) is a cutting-edge platform for Simulation Operations Automation (SimOps). It streamlines and automates complex simulation workflows, enhancing productivity and collaboration. Leveraging cloud-based infrastructure, Simr offers scalable, cost-effective solutions for industries like automotive, aerospace, and electronics. Trusted by leading global companies, Simr empowers engineers to innovate efficiently and effectively. Simr supports a variety of CFD, FEA and other CAE software including Ansys, COMSOL, Abaqus, CST, STAR-CCM+, MATLAB, Lumerical and more. Simr automates every major cloud including Microsoft Azure, Amazon AWS, and Google GCP.
Learn more
Rocky Linux
CIQ empowers people to do amazing things by providing innovative and stable software infrastructure solutions for all computing needs. From the base operating system, through containers, orchestration, provisioning, computing, and cloud applications, CIQ works with every part of the technology stack to drive solutions for customers and communities with stable, scalable, secure production environments. CIQ is the founding support and services partner of Rocky Linux, and the creator of the next generation federated computing stack.
- Rocky Linux, open, Secure Enterprise Linux
- Apptainer, application Containers for High Performance Computing
- Warewulf, cluster Management and Operating System Provisioning
- HPC2.0, the Next Generation of High Performance Computing, a Cloud Native Federated Computing Platform
- Traditional HPC, turnkey computing stack for traditional HPC
Learn more