Showing 87 open source projects for "encoder"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    fairseq-lua

    fairseq-lua

    Facebook AI Research Sequence-to-Sequence Toolkit

    fairseq-lua is the original Lua/Torch7 version of Facebook AI Research’s sequence modeling toolkit, designed for neural machine translation (NMT) and sequence generation. It introduced early attention-based architectures and training pipelines that later evolved into the modern PyTorch-based fairseq. The framework implements sequence-to-sequence models with attention, beam search decoding, and distributed training, providing a research platform for exploring translation, summarization, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    AliceMind

    AliceMind

    ALIbaba's Collection of Encoder-decoders from MinD

    This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD (Machine IntelligeNce of Damo) Lab. Pre-trained models for natural language understanding (NLU). We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    Denoiser is a real-time speech enhancement model operating directly on raw waveforms, designed to clean noisy audio while running efficiently on CPU. It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output. The implementation includes data augmentation techniques applied to the raw waveforms (e.g. noise mixing, reverberation) to improve model robustness and generalization to diverse noise types. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    TTS

    TTS

    Deep learning for text to speech

    ...Notebooks for extensive model benchmarking. Modular (but not too much) code base enabling easy testing for new ideas. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN). If you are only interested in synthesizing speech with the released TTS models, installing from PyPI is the easiest option.
    Downloads: 3 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    ALAE

    ALAE

    Adversarial Latent Autoencoders

    ...The project implements the architecture introduced in the CVPR research paper on Adversarial Latent Autoencoders, which focuses on improving generative modeling by learning latent representations aligned with adversarial training objectives. Unlike traditional GANs that directly generate images from random noise, ALAE uses an encoder-decoder architecture that maps images into a structured latent space and then reconstructs them through adversarial training. This design allows the model to learn interpretable latent representations that can be manipulated to control generated image attributes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Multilingual Speech Synthesis

    Multilingual Speech Synthesis

    An implementation of Tacotron 2 that supports multilingual experiments

    ...The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DETR

    DETR

    End-to-end object detection with transformers

    ...Unlike traditional computer vision techniques, DETR approaches object detection as a direct set prediction problem. It consists of a set-based global loss, which forces unique predictions via bipartite matching, and a Transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. Due to this parallel nature, DETR is very fast and efficient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Texar

    Texar

    Toolkit for Machine Learning, Natural Language Processing

    Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    InferSent

    InferSent

    InferSent sentence embeddings

    ...Because the encoder is compact and language-agnostic at the interface level, it’s easy to drop into production pipelines that need robust semantic features. InferSent helped popularize the idea that supervised objectives (like NLI) can yield strong general-purpose sentence encoders, and it remains a reliable baseline against which to compare newer models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    CakeChat

    CakeChat

    CakeChat: Emotional Generative Dialog System

    ...The code is flexible and allows to condition model's responses by an arbitrary categorical variable. For example, you can train your own persona-based neural conversational model or create an emotional chatting machine. Hierarchical Recurrent Encoder-Decoder (HRED) architecture for handling deep dialog context. Multilayer RNN with GRU cells. The first layer of the utterance-level encoder is always bidirectional. By default, CuDNNGRU implementation is used for ~25% acceleration during inference. Thought vector is fed into decoder on each decoding step. Decoder can be conditioned on any categorical label, for example, emotion label or persona id. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Butteraugli

    Butteraugli

    Estimates the psychovisual difference between two images

    ...These maps make it practical to tune compressor settings and confirm whether bitrate reductions are visually acceptable. The metric has become a common yardstick for objective image quality when comparing codecs or encoder tweaks that target web or mobile delivery. Because it is deterministic and fast, it can be used in automated pipelines to gate releases on visual quality, not just file size.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    ...This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    OpenSeq2Seq is a TensorFlow-based toolkit for efficient experimentation with sequence-to-sequence models across speech and NLP tasks. Its core goal is to give researchers a flexible, modular framework for building and training encoder–decoder architectures while fully leveraging distributed and mixed-precision training. The toolkit includes ready-made models for neural machine translation, automatic speech recognition, speech synthesis, language modeling, and additional NLP tasks such as sentiment analysis. It supports multi-GPU and multi-node data-parallel training, and integrates with Horovod to scale out across large GPU clusters. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Retrieval-Based Conversational Model

    Retrieval-Based Conversational Model

    Dual LSTM Encoder for Dialog Response Generation

    Retrieval-Based Conversational Model in Tensorflow is a project implementing a retrieval-based conversational model using a dual LSTM encoder architecture in TensorFlow, illustrating how neural networks can be trained to select appropriate responses from a fixed set of candidate replies rather than generate them from scratch. The core idea is to embed both the conversation context and potential replies into vector representations, then score how well each candidate fits the current dialogue, choosing the best match accordingly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DjVuPlus

    DjVuPlus

    DjVu Read Documents,With OCR Technology(Arabic ,English ),Small Size

    ...The patents cover a particular aspect of the ZP-coder (the arithmetic coder used in DjVu and implemented in libdjvu/ZPCodec.cpp) and the background masking technique used in the IW44 wavelet encoder (implemented in libdjvu/IW44EncodeCodec.cpp). Most patents are owned by AT&T. LizardTech has very broad rights to them and grants free and permanent licenses to them for the purpose of building GPL software with the DjVu Reference Library. The grant is materialized by two paragraphs in the headers of the DjVu Reference Library source files. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Osmosis TTS

    Osmosis TTS

    Text to Speech application with searching capabilities.

    ...Text is displayed in a large window which has configurable fonts and color for users with low vision. Other features include: Saving text as a WAV file, MP3 encoding using LAME mp3 encoder, maintaining a search history, and ability to use and configure standard SAPI TTS voices.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Qwen3.6-27B

    Qwen3.6-27B

    Dense multimodal Qwen model for coding, agents, and long context

    Qwen3.6-27B is an open-weight multimodal model built to deliver strong real-world coding, agent, and long-context performance in a dense 27B-parameter architecture. It combines a causal language model with a vision encoder and supports text, image, and video inputs, making it suitable for both software workflows and broader multimodal tasks. The model emphasizes stability and practical developer utility, with major improvements in agentic coding, frontend generation, and repository-level reasoning. It also introduces thinking preservation, allowing it to retain reasoning traces from earlier turns to improve consistency, reduce repeated computation, and support iterative agent workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Ministral 3 8B Instruct 2512

    Ministral 3 8B Instruct 2512

    Compact 8B multimodal instruct model optimized for edge deployment

    Ministral 3 8B Instruct 2512 is a balanced, efficient model in the Ministral 3 family, offering strong multimodal capabilities within a compact footprint. It combines an 8.4B-parameter language model with a 0.4B vision encoder, enabling both text reasoning and image understanding. This FP8 instruct-fine-tuned variant is optimized for chat, instruction following, and structured outputs, making it ideal for daily assistant tasks and lightweight agentic workflows. Designed for edge deployment, the model can run on a wide range of hardware and fits locally on a single 12GB GPU, with the option for even smaller quantized configurations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    Qwen-Image-Edit is the image editing extension of Qwen-Image, a 20B parameter model that combines advanced visual and text-rendering capabilities for creative and precise editing. It leverages both Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling users to edit at both the content and detail level. The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or removing elements without altering surrounding regions. A standout feature is its bilingual text editing in English and Chinese, which preserves original font, size, and style during modifications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Mistral Large 3 675B Base 2512

    Mistral Large 3 675B Base 2512

    Frontier-scale 675B multimodal base model for custom AI training

    ...The model is engineered for reliability, long-context comprehension, and stable performance across many enterprise, scientific, and knowledge-intensive workloads. Its architecture includes a powerful language MoE and a 2.5B-parameter vision encoder, enabling multimodal understanding out of the box. Mistral Large 3 Base supports deployment on-premises using FP8 or NVFP4 formats, enabling high-performance workflows on B200, H200, H100, or A100 hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Mistral Large 3 675B Instruct 2512 Eagle

    Mistral Large 3 675B Instruct 2512 Eagle

    Speculative-decoding accelerator for the 675B Mistral Large 3

    ...It works alongside the primary 675B instruct model, enabling faster response times by predicting several tokens ahead using Mistral’s Eagle speculative method. Built on the same frontier-scale multimodal Mixture-of-Experts architecture, it complements a system featuring 41B active parameters and a 2.5B-parameter vision encoder. The Eagle variant is specialized rather than standalone, serving as a performance accelerator for production-grade assistants, agentic workflows, long-context applications, and retrieval-augmented reasoning pipelines. It supports the same multilingual, system-prompt-aligned, and function-calling behavior as the main instruct model when used in the recommended server-client configuration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Mistral Large 3 675B Instruct 2512 NVFP4

    Mistral Large 3 675B Instruct 2512 NVFP4

    Quantized 675B multimodal instruct model optimized for NVFP4

    ...It retains the same instruction-tuned behavior as the FP8 model, making it ideal for production assistants, agentic workflows, scientific tasks, and long-context enterprise systems. The model integrates a 673B-parameter MoE language backbone with a 2.5B-parameter vision encoder, enabling rich multimodal analysis across text and images. Designed for efficient deployment, it runs on a single H100 or A100 node in NVFP4 while delivering performance similar to FP8 for short- and mid-context workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    Ministral 3 3B Base 2512 is the smallest model in the Ministral 3 family, offering a compact yet capable multimodal architecture suited for lightweight AI applications. It combines a 3.4B-parameter language model with a 0.4B vision encoder, enabling both text and image understanding in a tiny footprint. As the base pretrained model, it is not fine-tuned for instructions or reasoning, making it the ideal foundation for custom post-training, domain adaptation, or specialized downstream tasks. The model is fully optimized for edge deployment and can run locally on a single GPU, fitting in 16GB VRAM in BF16 or less than 8GB when quantized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Ministral 3 8B Reasoning 2512

    Ministral 3 8B Reasoning 2512

    Efficient 8B multimodal model tuned for advanced reasoning tasks.

    Ministral 3 8B Reasoning 2512 is a balanced midsize model in the Ministral 3 family, delivering strong multimodal reasoning capabilities within an efficient footprint. It combines an 8.4B-parameter language model with a 0.4B vision encoder, enabling it to process both text and images for advanced reasoning tasks. This version is specifically post-trained for reasoning, making it well-suited for math, coding, and STEM applications requiring multi-step logic and problem-solving. Despite its reasoning-focused training, the model remains edge-optimized and can run locally on a single 24GB GPU in BF16, or under 12GB when quantized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Ministral 3 14B Reasoning 2512

    Ministral 3 14B Reasoning 2512

    High-precision 14B multimodal model built for advanced reasoning tasks

    Ministral 3 14B Reasoning 2512 is the largest model in the Ministral 3 series, delivering frontier-level performance with capabilities comparable to the Mistral Small 3.2 24B model. It pairs a 13.5B-parameter language model with a 0.4B vision encoder, enabling strong multimodal reasoning across both text and images. This version is specifically post-trained for reasoning tasks, making it highly effective for math, coding, STEM workloads, and complex multi-step problem-solving. Despite its scale, the model is engineered for practical deployment and can run locally on 32GB of VRAM in BF16 or under 24GB when quantized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB