How to inference on multi-gpu?

CodeGeeX: An Open Multilingual Code Generation Model

Brought to you by: darthsean

Status: open

Owner: nobody

Labels: None

Updated: 2022-10-21

Created: 2022-10-21

Creator: Anonymous

Private: No

Originally created by: Athena-I

I tried to make the inference on A30, while an error occurred: RuntimeError: CUDA out of memory. How to inference on multi cards?