2 projects for "cpu usage" with 2 filters applied:

  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    shimmy

    shimmy

    Python-free Rust inference server

    ...This compatibility enables developers to replace remote AI services with locally hosted models while keeping their existing software architecture intact. Shimmy focuses on performance and simplicity, using efficient runtime components to minimize memory usage and startup time compared to heavier inference frameworks. It supports modern model formats such as GGUF and SafeTensors and can automatically discover models stored locally or in common directories used by other AI tools. Advanced capabilities include CPU offloading for Mixture-of-Experts models and GPU acceleration, enabling large models to run on consumer hardware with limited VRAM.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    ...Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. The project is particularly useful for workloads that prioritize throughput over latency, including benchmarking experiments and large corpus analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo