Page 7 | gpu max performance free download

Grok-2.5

Large-scale xAI model for local inference with SGLang, Grok-2.5

...The model is distributed as raw weights that require specialized infrastructure to run, rather than being hosted by inference providers. To use it, users must download over 500 GB of files and set them up locally with the SGLang inference engine. Grok-2.5 supports advanced inference with multi-GPU configurations, requiring at least 8 GPUs with more than 40 GB of memory each for optimal performance. It integrates with the SGLang framework to enable serving, testing, and chat-style interactions. The model comes with a post-training architecture and requires the correct chat template to function properly. It is released under the Grok 2 Community License Agreement, encouraging community experimentation and responsible use.

Downloads: 0 This Week

Last Update: 2025-08-28

See Project

granite-timeseries-ttm-r2

Tiny pre-trained IBM model for multivariate time series forecasting

granite-timeseries-ttm-r2 is part of IBM’s TinyTimeMixers (TTM) series—compact, pre-trained models for multivariate time series forecasting. Unlike massive foundation models, TTM models are designed to be lightweight yet powerful, with only ~805K parameters, enabling high performance even on CPU or single-GPU machines. The r2 version is pre-trained on ~700M samples (r2.1 expands to ~1B), delivering up to 15% better accuracy than the r1 version. TTM supports both zero-shot and fine-tuned forecasting, handling minutely, hourly, daily, and weekly resolutions. It can integrate exogenous variables, static categorical features, and perform channel-mixing for richer multivariate forecasting. ...

Downloads: 0 This Week

Last Update: 2025-07-01

See Project

Mistral Large 3 675B Instruct 2512 NVFP4

Quantized 675B multimodal instruct model optimized for NVFP4

Mistral Large 3 675B Instruct 2512 NVFP4 is a frontier-scale multimodal Mixture-of-Experts model featuring 675B total parameters and 41B active parameters, trained from scratch on 3,000 H200 GPUs. This NVFP4 checkpoint is a post-training-activation quantized version of the original instruct model, created through a collaboration between Mistral AI, vLLM, and Red Hat using llm-compressor. It retains the same instruction-tuned behavior as the FP8 model, making it ideal for production...

Downloads: 0 This Week

Last Update: 2025-12-03

See Project

Ministral 3 3B Instruct 2512

Ultra-efficient 3B multimodal instruct model built for edge deployment

...As an FP8 instruct-fine-tuned model, it is optimized for chat, instruction following, and compact agentic tasks while maintaining strong adherence to system prompts. Despite its small size, it delivers efficient real-time performance and can run locally on a single 8GB GPU, with further memory reductions through quantization. It supports dozens of languages across major global regions, making it well-suited for multilingual and embedded applications. The model also provides function calling, clean JSON output, and stable tool-use behavior, enabling it to serve as a small but effective agentic system.

Downloads: 0 This Week

Last Update: 2025-12-03

See Project

Search Results for "gpu max performance" - Page 7

Showing 154 open source projects for "gpu max performance"

Grok-2.5

granite-timeseries-ttm-r2

Mistral Large 3 675B Instruct 2512 NVFP4

Ministral 3 3B Instruct 2512

Search Results for "gpu max performance" - Page 7

Showing 154 open source projects for "gpu max performance"

Grok-2.5

granite-timeseries-ttm-r2

Mistral Large 3 675B Instruct 2512 NVFP4

Ministral 3 3B Instruct 2512

Related Categories