FastChat is an open platform for training, serving, and evaluating large language model-based chatbots. If you do not have enough memory, you can enable 8-bit compression by adding --load-8bit to the commands above. This can reduce memory usage by around half with slightly degraded model quality. It is compatible with the CPU, GPU, and Metal backend. Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/T4/V100(16GB) GPU. In addition to that, you can add --cpu-offloading to commands above to offload weights that don't fit on your GPU onto the CPU memory. This requires 8-bit compression to be enabled and the bitsandbytes package to be installed, which is only available on linux operating systems.

Features

  • The weights, training code, and evaluation code for state-of-the-art models
  • A distributed multi-model serving system with Web UI and OpenAI-compatible RESTful APIs
  • For training, serving, and evaluating large language models
  • Reduce the CPU RAM requirement of weight conversion
  • Inference with Command Line Interface
  • Several supported models

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow FastChat

FastChat Web Site

Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services Icon
$300 in Free Credit Towards Top Cloud Services

Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Get Started
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of FastChat!

Additional Project Details

Operating Systems

Linux, Mac

Programming Language

Python

Related Categories

Python Artificial Intelligence Software, Python LLM Inference Tool

Registered

2023-06-01