59 projects for "transformers" with 2 filters applied:

  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Transformers in Time Series

    Transformers in Time Series

    A professionally curated list of awesome resources

    ...It compiles literature from major conferences and journals and categorizes them by application domains such as forecasting, anomaly detection, and classification. The repository also provides a taxonomy that helps researchers understand different architectural variations of transformers designed for time series data. These models are particularly important because transformers can capture long-range dependencies in sequential data, which makes them well suited for complex temporal patterns in real-world datasets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    ...DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. The model supports multiple backbone architectures, including Vision Transformers (ViT), and can handle larger image resolutions with improved stability during training. The learned embeddings generalize robustly across tasks like classification, retrieval, and segmentation without fine-tuning, showing state-of-the-art transfer performance among self-supervised models.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 3
    DeiT (Data-efficient Image Transformers)
    DeiT (Data-efficient Image Transformers) shows that Vision Transformers can be trained competitively on ImageNet-1k without external data by using strong training recipes and knowledge distillation. Its key idea is a specialized distillation strategy—including a learnable “distillation token”—that lets a transformer learn effectively from a CNN or transformer teacher on modest-scale datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    GeoAI

    GeoAI

    GeoAI: Artificial Intelligence for Geospatial Data

    GeoAI is a comprehensive open-source Python package designed to integrate artificial intelligence techniques with geospatial data analysis, enabling users to perform advanced geographic modeling and visualization tasks with ease. It provides a unified framework that combines machine learning libraries such as PyTorch and Transformers with geospatial tools, allowing users to process satellite imagery, aerial photos, and vector datasets in a streamlined workflow. The platform supports a wide range of tasks including image classification, object detection, segmentation, and change detection, making it suitable for applications in environmental monitoring, urban planning, and disaster response. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    SwanLab

    SwanLab

    An open-source, modern-design AI training tracking and visualization

    ...SwanLab supports both cloud and self-hosted deployments, allowing organizations to run the system privately or integrate it into shared development environments. The platform integrates with a wide range of machine learning frameworks including PyTorch, Transformers, Keras, and other widely used training ecosystems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    ...The project includes support for multiple draft models, example integration code, and scripts to benchmark performance, and it is structured to work with popular model serving stacks like SGLang and the Hugging Face Transformers ecosystem.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Torch Pruning

    Torch Pruning

    DepGraph: Towards Any Structural Pruning

    ...It introduces a graph-based algorithm called DepGraph that automatically identifies dependencies between layers, allowing parameters to be pruned safely across complex architectures. This dependency analysis makes it possible to prune large networks such as transformers, convolutional networks, and diffusion models without breaking the computational graph. Torch-Pruning physically removes parameters rather than masking them, which results in smaller and faster models during both training and inference. The toolkit supports a wide variety of architectures used in computer vision and large language models, making it a flexible solution for model compression tasks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Attention Residuals (AttnRes)

    Attention Residuals (AttnRes)

    Drop-in replacement for standard residual connections in Transformers

    Attention Residuals is a research-driven architectural innovation for transformer-based models that replaces traditional residual connections with an attention-based mechanism to improve information flow across layers. In standard transformers, residual connections simply sum outputs from previous layers, which can lead to uncontrolled growth of hidden states and dilution of early-layer information in deep networks. Attention Residuals introduces a learnable softmax attention mechanism that allows each layer to selectively retrieve and weight useful representations from earlier layers, making depth dynamically adaptive rather than uniformly aggregated. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    AI Engineering Transition Path

    AI Engineering Transition Path

    Research papers and blogs to transition to AI Engineering

    AI Engineering Resources is an open educational repository that compiles research papers, tutorials, and learning materials for software engineers transitioning into artificial intelligence engineering roles. The project organizes resources that cover fundamental topics required to understand modern AI systems, including transformers, vector embeddings, tokenization, infrastructure design, and mixture-of-experts architectures. Instead of presenting isolated tutorials, the repository provides a structured pathway that guides engineers through the technical knowledge needed to build and deploy large language model systems. The materials include curated research papers, blog posts, and code examples that explain both theoretical foundations and practical implementation strategies. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    AiLearning-Theory-Applying

    AiLearning-Theory-Applying

    Quickly get started with AI theory and practical applications

    ...The project also introduces important concepts such as probability theory, linear algebra, regression models, clustering methods, and neural network architectures. Advanced sections explore modern AI topics including transformers, BERT-based natural language processing systems, and practical competition-style machine learning workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    ...The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 13
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B),...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 14
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    ...Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic information while preserving fine-grained prosody, leading to more stable and expressive generation than many discrete-token systems. Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. ...
    Downloads: 47 This Week
    Last Update:
    See Project
  • 15
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ChatGLM-6B is an open bilingual (Chinese + English) conversational language model based on the GLM architecture, with approximately 6.2 billion parameters. The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Vision Transformer Pytorch

    Vision Transformer Pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA

    ...Because it stays close to vanilla PyTorch, you can integrate custom datasets and training loops without framework lock-in. It’s widely used as an educational reference for people learning transformers in vision and as a lightweight baseline for research prototypes. The project encourages experimentation—swap optimizers, change augmentations, or plug the transformer backbone into downstream tasks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    ...The GitHub repo includes code, scripts, model loading instructions, inference utilities, prompt handling, and integration with standard ML tooling (e.g. Hugging Face / Transformers).
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    MiniMax-M2.1

    MiniMax-M2.1

    MiniMax M2.1, a SOTA model for real-world dev & agents.

    MiniMax-M2.1 is an open-source, state-of-the-art agentic language model released to democratize high-performance AI capabilities. It goes beyond a simple parameter upgrade, delivering major gains in coding, tool use, instruction following, and long-horizon planning. The model is designed to be transparent, controllable, and accessible, enabling developers to build autonomous systems without relying on closed platforms. MiniMax-M2.1 excels in real-world software engineering tasks, including...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    Super comprehensive deep learning notes

    Super comprehensive deep learning notes

    Super Comprehensive Deep Learning Notes

    ...The repository contains hundreds of Jupyter notebooks that are richly annotated and organized by topic, progressing from basic Python and PyTorch fundamentals to advanced neural network designs like ResNet, transformers, and object detection algorithms. It’s not just a dry code repository; it includes theoretical explanations alongside hands-on examples, loss function explorations, optimization routines, and full end-to-end experiments on real datasets, making it highly suitable for both self-study and classroom use.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Karpathy

    Karpathy

    An agentic Machine Learning Engineer

    karpathy is an experimental agentic machine learning engineer framework designed to automate many aspects of the ML development workflow. The project sets up a sandboxed environment where an AI agent can access datasets, run experiments, and generate machine learning artifacts through a web interface. Its startup script automatically prepares the environment by creating a sandbox directory, installing key ML libraries, and launching the agent interface. The system is tightly integrated with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    BitNet

    BitNet

    BitNet: Scaling 1-bit Transformers for Large Language Models

    BitNet is a machine learning research implementation that explores extremely low-precision neural network architectures designed to dramatically reduce the computational cost of large language models. The project implements the BitNet architecture described in research on scaling transformer models using extremely low-bit quantization techniques. In this approach, neural network weights are quantized to approximately one bit per parameter, allowing models to operate with far lower memory...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    LLM-Finetuning

    LLM-Finetuning

    LLM Finetuning with peft

    LLM-Finetuning is an open educational repository that provides practical notebooks and tutorials for fine-tuning large language models using modern machine learning frameworks. The project focuses on parameter-efficient fine-tuning methods such as LoRA and QLoRA, which allow large models to be adapted to new tasks without requiring full retraining. Instead of requiring specialized hardware or complex training pipelines, many examples are designed to run in cloud notebook environments such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    HY-MT

    HY-MT

    Hunyuan Translation Model Version 1.5

    HY-MT (Hunyuan Translation) is a high-quality multilingual machine translation model suite developed to support mutual translation across dozens of languages with strong performance even at smaller model scales. It ships with both an 1.8 B parameter model and a larger 7 B model, the latter optimized not only for direct translation but also for formatted and contextualized output, allowing better handling of terminology and mixed-language content. The project emphasizes both speed and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Deep-Learning-Interview-Book

    Deep-Learning-Interview-Book

    Interview guide for machine learning, mathematics, and deep learning

    Deep-Learning-Interview-Book collects structured notes, Q&A, and concept summaries tailored to deep-learning interviews, turning scattered study into a coherent playbook. It spans the core math (linear algebra, probability, optimization) and the practitioner topics candidates actually face, like CNNs, RNNs/Transformers, attention, regularization, and training tricks. Explanations emphasize intuition first, then key formulas and common pitfalls, so you can reason through unseen questions rather than memorize trivia. Many entries connect theory to implementation details, including how choices in activation, initialization, or normalization affect convergence and stability. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB