throughput free download

FlexLLMGen

Running large language models on a single GPU

...This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. The project is particularly useful for workloads that prioritize throughput over latency, including benchmarking experiments and large corpus analysis.

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. ...

Downloads: 19 This Week

Last Update: 2026-03-25

See Project

FlashAttention

Fast and memory-efficient exact attention

...The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. By improving both forward and backward pass efficiency, it enables training and inference of large language models with longer sequence lengths and higher throughput. The library integrates with PyTorch and supports various attention configurations, including causal masking, multi-query attention, and rotary embeddings.

Downloads: 69 This Week

Last Update: 2026-03-18

See Project

OpenMLDB

OpenMLDB is an open-source machine learning database

...However, a feature engineering script developed by data scientists (Python scripts in most cases) cannot be directly deployed into production for online inference because it usually cannot meet the engineering requirements, such as low latency, high throughput and high availability.

Downloads: 0 This Week

Last Update: 2025-02-21

See Project

DALI

A GPU-accelerated library containing highly optimized building blocks

...DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.

Downloads: 0 This Week

Last Update: 2026-04-16

See Project

Synapse Machine Learning

Simple and distributed Machine Learning

...With the HTTP on Spark project, users can embed any web service into their SparkML models. For production-grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

Downloads: 0 This Week

Last Update: 2026-04-04

See Project

OnnxStream

Lightweight inference library for ONNX files, written in C++

...The recommended minimum RAM/VRAM for Stable Diffusion 1.5 is typically 8GB. Generally, major machine learning frameworks and libraries are focused on minimizing inference latency and/or maximizing throughput, all of which at the cost of RAM usage. So I decided to write a super small and hackable inference library specifically focused on minimizing memory consumption: OnnxStream. OnnxStream is based on the idea of decoupling the inference engine from the component responsible for providing the model weights, which is a class derived from WeightsProvider. ...

Downloads: 24 This Week

Last Update: 2024-08-14

See Project

FFCV

Fast Forward Computer Vision (and other ML workloads!)

ffcv is a drop-in data loading system that dramatically increases data throughput in model training. From gridding to benchmarking to fast research iteration, there are many reasons to want faster model training. Below we present premade codebases for training on ImageNet and CIFAR, including both (a) extensible codebases and (b) numerous premade training configurations.

Downloads: 0 This Week

Last Update: 2024-08-07

See Project

OmicSelector

Feature selection and deep learning modeling for omic biomarker study

OmicSelector is an environment, Docker-based web application, and R package for biomarker signature selection (feature selection) from high-throughput experiments and others. It was initially developed for miRNA-seq (small RNA, smRNA-seq; hence the name was miRNAselector), RNA-seq and qPCR, but can be applied for every problem where numeric features should be selected to counteract overfitting of the models. Using our tool, you can choose features, like miRNAs, with the most significant diagnostic potential (based on the results of miRNA-seq, for validation in qPCR experiments).

1 Review

Downloads: 0 This Week

Last Update: 2024-04-05

See Project

hora

Efficient approximate nearest neighbor search algorithm collections

...The library is written in Rust and emphasizes performance, safety, and efficient memory management, making it suitable for production-grade applications requiring low latency and high throughput.

Downloads: 0 This Week

Last Update: 2026-03-11

See Project

exchange-core

Ultra-fast matching engine written in Java based on LMAX Disruptor

Exchange-core is an open-source market exchange core based on LMAX Disruptor, Eclipse Collections (ex. Goldman Sachs GS Collections), Real Logic Agrona, OpenHFT Chronicle-Wire, LZ4 Java, and Adaptive Radix Trees. Designed for high scalability and pauseless 24/7 operation under high-load conditions and providing low-latency responses. Single order book configuration is capable to process 5M operations per second on 10-years old hardware (Intel® Xeon® X5690) with moderate latency degradation....

Downloads: 0 This Week

Last Update: 2022-04-15

See Project

X-DeepLearning

An industrial deep learning framework for high-dimension sparse data

...Complete streaming training features including feature admission, feature elimination, model incremental export, feature counting statistics, etc. Background: XDL1.0 focuses on throughput optimization and adopts the one request per thread processing model, which can significantly improve the limit throughput under ultra-high concurrency.

Downloads: 0 This Week

Last Update: 2022-02-02

See Project

Root Phenotyping Suite

Three different software tools for phenotyping plant root images

RootAnalyzer is a fully automated tool, for efficiently extracting and analyzing anatomical traits from root-cross section images. RootAnalyzer segments the plant root from the image's background, classifies and characterizes the cortex, stele, endodermis and metaxylem, and produces statistics about the morphological properties of the root cells and tissues. RTipC is a system for the fully automated detection and classification of root tips in root images obtained either by 2d flat bed...

Downloads: 0 This Week

Last Update: 2018-10-23

See Project

JAABA

The Janelia Automated Animal Behavior Annotator

...JAABA uses machine learning techniques to convert these manual labels into behavior detectors that can then be used to automatically classify the behaviors of animals in large data sets with high throughput. JAABA combines an intuitive graphical user interface, a fast and powerful machine learning algorithm, and visualizations of the classifier into an interactive, usable system for creating automatic behavior detectors. Documentation is available at: http://jaaba.sourceforge.net/

1 Review

Downloads: 6 This Week

Last Update: 2015-09-08

See Project

Search Results for "throughput"

Showing 14 open source projects for "throughput"

FlexLLMGen

TensorRT

FlashAttention

OpenMLDB

DALI

Synapse Machine Learning

OnnxStream

FFCV

OmicSelector

hora

exchange-core

X-DeepLearning

Root Phenotyping Suite

JAABA

Search Results for "throughput"

Showing 14 open source projects for "throughput"

FlexLLMGen

TensorRT

FlashAttention

OpenMLDB

DALI

Synapse Machine Learning

OnnxStream

FFCV

OmicSelector

hora

exchange-core

X-DeepLearning

Root Phenotyping Suite

JAABA

Related Searches

Related Categories