latency free download

hls4ml

Machine learning on FPGAs using HLS

hls4ml is an open-source framework that enables machine learning models to be implemented directly on hardware such as FPGAs and ASICs using high-level synthesis techniques. The system converts trained neural network models from common machine learning frameworks into hardware description code suitable for ultra-low-latency inference. This approach allows machine learning algorithms to run directly on specialized hardware, making them suitable for applications that require extremely fast response times and minimal power consumption. The framework was originally developed for high-energy physics experiments where real-time decision systems must process large volumes of data with strict latency constraints. ...

Downloads: 2 This Week

Last Update: 2026-03-20

See Project

RF-DETR

RF-DETR is a real-time object detection and segmentation

...The model is designed to detect objects and segment them within images or video streams using a unified detection pipeline. RF-DETR emphasizes strong performance across both accuracy and latency benchmarks, allowing developers to deploy high-quality detection models in applications that require immediate processing such as robotics, autonomous systems, and industrial inspection. The repository includes Python packages, training scripts, and model configurations that enable researchers and engineers to train and deploy detection models on custom datasets.

Downloads: 1 This Week

Last Update: 17 hours ago

See Project

Nixtla TimeGPT

TimeGPT-1: production ready pre-trained Time Series Foundation Model

TimeGPT is a production ready, generative pretrained transformer for time series. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code. Whether you're a bank forecasting market trends or a startup predicting product demand, TimeGPT democratizes access to cutting-edge predictive insights, eliminating the need for a dedicated team of machine learning engineers. A generative model for time series. TimeGPT is capable of...

Downloads: 0 This Week

Last Update: 2026-02-13

See Project

Quantitative Trading System

A comprehensive quantitative trading system with AI-powered analysis

Quantitative Trading System is a comprehensive quantitative trading platform that integrates artificial intelligence, financial data analysis, and automated strategy execution within a unified software system. The project is designed to provide an end-to-end infrastructure for building and operating algorithmic trading strategies in financial markets. It includes tools for collecting and processing market data from multiple sources, performing statistical and machine learning analysis, and...

Downloads: 1 This Week

Last Update: 2026-03-12

See Project

FlexLLMGen

Running large language models on a single GPU

...This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. The project is particularly useful for workloads that prioritize throughput over latency, including benchmarking experiments and large corpus analysis.

Downloads: 1 This Week

Last Update: 2026-03-10

See Project

AWS Neuron

Powering Amazon custom machine learning chips

AWS Neuron is a software development kit (SDK) for running machine learning inference using AWS Inferentia chips. It consists of a compiler, run-time, and profiling tools that enable developers to run high-performance and low latency inference using AWS Inferentia-based Amazon EC2 Inf1 instances. Using Neuron developers can easily train their machine learning models on any popular framework such as TensorFlow, PyTorch, and MXNet, and run it optimally on Amazon EC2 Inf1 instances. You can continue to use the same ML frameworks you use today and migrate your software onto Inf1 instances with minimal code changes and without tie-in to vendor-specific solutions. ...

Downloads: 0 This Week

Last Update: 2026-03-14

See Project

TensorFlow Model Optimization Toolkit

A toolkit to optimize ML models for deployment for Keras & TensorFlow

The TensorFlow Model Optimization Toolkit is a suite of tools for optimizing ML models for deployment and execution. Among many uses, the toolkit supports techniques used to reduce latency and inference costs for cloud and edge devices (e.g. mobile, IoT). Deploy models to edge devices with restrictions on processing, memory, power consumption, network usage, and model storage space. Enable execution on and optimize for existing hardware or new special purpose accelerators. Choose the model and optimization tool depending on your task. ...

Downloads: 0 This Week

Last Update: 2024-02-08

See Project

CLIP-as-service

Embed images and sentences into fixed-length vectors

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions. Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks. Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing.

Downloads: 0 This Week

Last Update: 2023-12-20

See Project

FEDML Open Source

The unified and scalable ML library for large-scale training

A Unified and Scalable Machine Learning Library for Running Training and Deployment Anywhere at Any Scale. TensorOpera AI is the next-gen cloud service for LLMs & Generative AI. It helps developers to launch complex model training, deployment, and federated learning anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, easily, economically, and securely. Highly integrated with TensorOpera open source library, TensorOpera AI provides holistic support of three...

Downloads: 0 This Week

Last Update: 2024-08-05

See Project

Hugging Face Transformer

CPU/GPU inference server for Hugging Face transformer models

...At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI. Both are great tools but not very performant in inference. Then, if you spend some time, you can build something over ONNX Runtime and Triton inference server. ...

Downloads: 0 This Week

Last Update: 2022-08-22

See Project

Search Results for "latency"

Showing 10 open source projects for "latency"

hls4ml

RF-DETR

Nixtla TimeGPT

Quantitative Trading System

FlexLLMGen

AWS Neuron

TensorFlow Model Optimization Toolkit

CLIP-as-service

FEDML Open Source

Hugging Face Transformer

Search Results for "latency"

Showing 10 open source projects for "latency"

hls4ml

RF-DETR

Nixtla TimeGPT

Quantitative Trading System

FlexLLMGen

AWS Neuron

TensorFlow Model Optimization Toolkit

CLIP-as-service

FEDML Open Source

Hugging Face Transformer

Related Categories