Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.
Build generative AI apps with Vertex AI Studio. Switch between models without switching platforms.
Start Free
Easily Host LLMs and Web Apps on Cloud Run
Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.
Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.
...I realize that this is a workaround to the Authortiarian mentality that MS has switched to that does not care about users suggestions.
Note: It could take up to 120 seconds for the first move; from then on it will be every 60 seconds.
The readme file has additional information within it to compile and a few other things that are of importance.