The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. It remains a challenge for AI researchers to implement complex distributed training solutions for their models. Colossal-AI provides a collection of parallel components for you. We aim to support you to write your distributed deep learning models just like how you write your model on your laptop.

Features

  • Heterogeneous Memory Management
  • 24x larger model size on the same hardware
  • Pull from DockerHub
  • Build On Your Own
  • Parallelism strategies
  • Parallelism based on configuration file

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Colossal-AI

Colossal-AI Web Site

You Might Also Like
Find out just how much your login box can do for your customer | Auth0 Icon
Find out just how much your login box can do for your customer | Auth0

With over 53 social login options, you can fast-track the signup and login experience for users.

From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Colossal-AI!