Showing 173 open source projects for "transformers"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    BitNet

    BitNet

    BitNet: Scaling 1-bit Transformers for Large Language Models

    BitNet is a machine learning research implementation that explores extremely low-precision neural network architectures designed to dramatically reduce the computational cost of large language models. The project implements the BitNet architecture described in research on scaling transformer models using extremely low-bit quantization techniques. In this approach, neural network weights are quantized to approximately one bit per parameter, allowing models to operate with far lower memory...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Karpathy

    Karpathy

    An agentic Machine Learning Engineer

    karpathy is an experimental agentic machine learning engineer framework designed to automate many aspects of the ML development workflow. The project sets up a sandboxed environment where an AI agent can access datasets, run experiments, and generate machine learning artifacts through a web interface. Its startup script automatically prepares the environment by creating a sandbox directory, installing key ML libraries, and launching the agent interface. The system is tightly integrated with...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Transformer Engine

    Transformer Engine

    A library for accelerating Transformer models on NVIDIA GPUs

    ...TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++ API that can be integrated with other deep-learning libraries to enable FP8 support for Transformers. As the number of parameters in Transformer models continues to grow, training and inference for architectures such as BERT, GPT, and T5 become very memory and compute-intensive. Most deep learning frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for many deep learning models.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    LLM-Finetuning

    LLM-Finetuning

    LLM Finetuning with peft

    LLM-Finetuning is an open educational repository that provides practical notebooks and tutorials for fine-tuning large language models using modern machine learning frameworks. The project focuses on parameter-efficient fine-tuning methods such as LoRA and QLoRA, which allow large models to be adapted to new tasks without requiring full retraining. Instead of requiring specialized hardware or complex training pipelines, many examples are designed to run in cloud notebook environments such as...
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    Flower

    Flower

    Flower: A Friendly Federated Learning Framework

    ...Different machine learning frameworks have different strengths. Flower can be used with any machine learning framework, for example, PyTorch, TensorFlow, Hugging Face Transformers, PyTorch Lightning, scikit-learn, JAX, TFLite, MONAI, fastai, MLX, XGBoost, Pandas for federated analytics, or even raw NumPy for users who enjoy computing gradients by hand.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Giskard

    Giskard

    Collaborative & Open-Source Quality Assurance for all AI models

    ...Giskard automatically generates relevant tests based on the vulnerabilities detected by the scan. You can easily customize the tests depending on your use case by defining domain-specific data slicers and transformers as fixtures of your test suites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    MiniMax-M2.1

    MiniMax-M2.1

    MiniMax M2.1, a SOTA model for real-world dev & agents.

    MiniMax-M2.1 is an open-source, state-of-the-art agentic language model released to democratize high-performance AI capabilities. It goes beyond a simple parameter upgrade, delivering major gains in coding, tool use, instruction following, and long-horizon planning. The model is designed to be transparent, controllable, and accessible, enabling developers to build autonomous systems without relying on closed platforms. MiniMax-M2.1 excels in real-world software engineering tasks, including...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    TorchDistill

    TorchDistill

    A coding-free framework built on PyTorch

    torchdistill (formerly kdkit) offers various state-of-the-art knowledge distillation methods and enables you to design (new) experiments simply by editing a declarative yaml config file instead of Python code. Even when you need to extract intermediate representations in teacher/student models, you will NOT need to reimplement the models, which often change the interface of the forward, but instead specify the module path(s) in the yaml file. In addition to knowledge distillation, this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    flair

    flair

    A very simple framework for state-of-the-art NLP

    ...A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. A PyTorch NLP framework. Our framework builds directly on PyTorch, making it easy to train your own models and experiment with new approaches using Flair embeddings and classes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    Super comprehensive deep learning notes

    Super comprehensive deep learning notes

    Super Comprehensive Deep Learning Notes

    ...The repository contains hundreds of Jupyter notebooks that are richly annotated and organized by topic, progressing from basic Python and PyTorch fundamentals to advanced neural network designs like ResNet, transformers, and object detection algorithms. It’s not just a dry code repository; it includes theoretical explanations alongside hands-on examples, loss function explorations, optimization routines, and full end-to-end experiments on real datasets, making it highly suitable for both self-study and classroom use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    HY-MT

    HY-MT

    Hunyuan Translation Model Version 1.5

    HY-MT (Hunyuan Translation) is a high-quality multilingual machine translation model suite developed to support mutual translation across dozens of languages with strong performance even at smaller model scales. It ships with both an 1.8 B parameter model and a larger 7 B model, the latter optimized not only for direct translation but also for formatted and contextualized output, allowing better handling of terminology and mixed-language content. The project emphasizes both speed and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    LoggingExtras.jl

    LoggingExtras.jl

    Composable Loggers for the Julia Logging StdLib

    LoggingExtras allows routing logged information to different places when constructing complicated "log plumbing" systems. Built upon the concept of simple parts composed together, subtyping AbstractLogger provides a powerful and flexible definition for your logging system without a need to define any custom loggers. When we talk about composability, the composition of any set of Loggers is itself a Logger, and LoggingExtras is a composable logging system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Tokenizers

    Tokenizers

    Fast State-of-the-Art Tokenizers optimized for Research and Production

    Fast State-of-the-art tokenizers, optimized for both research and production. Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in Transformers. Train new vocabularies and tokenize, using today’s most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server’s CPU. Easy to use, but also extremely versatile. Designed for both research and production. Full alignment tracking. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Laravel Fractal

    Laravel Fractal

    An easy to use Fractal wrapper built for Laravel and Lumen

    ...Imagine you want to add some stats to the metadata of your request, you can do so without cluttering your code. You can run the make:transformer command to quickly generate a dummy transformer. By default it will be stored in the app\Transformers directory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DeepSeed

    DeepSeed

    Deep learning optimization library making distributed training easy

    ...With just a single GPU, ZeRO-Offload of DeepSpeed can train models with over 10B parameters, 10x bigger than the state of arts, democratizing multi-billion-parameter model training such that many deep learning scientists can explore bigger and better models. Sparse attention of DeepSpeed powers an order-of-magnitude longer input sequence and obtains up to 6x faster execution comparing with dense transformers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Deep-Learning-Interview-Book

    Deep-Learning-Interview-Book

    Interview guide for machine learning, mathematics, and deep learning

    Deep-Learning-Interview-Book collects structured notes, Q&A, and concept summaries tailored to deep-learning interviews, turning scattered study into a coherent playbook. It spans the core math (linear algebra, probability, optimization) and the practitioner topics candidates actually face, like CNNs, RNNs/Transformers, attention, regularization, and training tricks. Explanations emphasize intuition first, then key formulas and common pitfalls, so you can reason through unseen questions rather than memorize trivia. Many entries connect theory to implementation details, including how choices in activation, initialization, or normalization affect convergence and stability. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    SHAP

    SHAP

    A game theoretic approach to explain the output of ml models

    SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. While SHAP can explain the output of any machine learning model, we have developed a high-speed exact algorithm for tree ensemble methods. Fast C++ implementations are supported for XGBoost, LightGBM, CatBoost, scikit-learn and pyspark...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    rust-bert

    rust-bert

    Rust native ready-to-use NLP pipelines and transformer-based models

    rust-bert is a Rust-based implementation of transformer-based natural language processing models that provides ready-to-use pipelines for tasks such as text classification, summarization, and question answering. The project ports many capabilities of the Hugging Face Transformers ecosystem into the Rust programming language. It allows developers to run state-of-the-art NLP models like BERT, GPT-2, and DistilBERT directly within Rust applications while maintaining high performance and memory efficiency. The library integrates with Rust machine learning infrastructure using crates such as tch-rs and ONNX Runtime for model execution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MatMul-Free LM

    MatMul-Free LM

    Implementation for MatMul-free LM

    MatMul-Free LM is an experimental implementation of a large language model architecture designed to eliminate traditional matrix multiplication operations used in transformer networks. Since matrix multiplication is one of the most computationally expensive components of modern language models, the project explores alternative computational strategies that reduce hardware requirements while maintaining comparable performance. The architecture relies on quantization-aware training and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Intel LLM Library for PyTorch

    Intel LLM Library for PyTorch

    Accelerate local LLM inference and finetuning

    ...IPEX-LLM supports a wide range of popular models, including architectures such as LLaMA, Mistral, Qwen, and other transformer-based systems. The library can integrate with common AI frameworks and serving tools such as Hugging Face Transformers, LangChain, and vLLM, allowing developers to incorporate optimized inference into existing pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Vision Transformer Pytorch

    Vision Transformer Pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA

    ...Because it stays close to vanilla PyTorch, you can integrate custom datasets and training loops without framework lock-in. It’s widely used as an educational reference for people learning transformers in vision and as a lightweight baseline for research prototypes. The project encourages experimentation—swap optimizers, change augmentations, or plug the transformer backbone into downstream tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    NeuralForecast

    NeuralForecast

    Scalable and user friendly neural forecasting algorithms.

    NeuralForecast offers a large collection of neural forecasting models focusing on their performance, usability, and robustness. The models range from classic networks like RNNs to the latest transformers: MLP, LSTM, GRU, RNN, TCN, TimesNet, BiTCN, DeepAR, NBEATS, NBEATSx, NHITS, TiDE, DeepNPTS, TSMixer, TSMixerx, MLPMultivariate, DLinear, NLinear, TFT, Informer, AutoFormer, FedFormer, PatchTST, iTransformer, StemGNN, and TimeLLM. There is a shared belief in Neural forecasting methods' capacity to improve forecasting pipeline's accuracy and efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB