Showing 83 open source projects for "encoder"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Reformer PyTorch

    Reformer PyTorch

    Reformer, the efficient Transformer, in Pytorch

    This is a Pytorch implementation of Reformer. It includes LSH attention, reversible network, and chunking. It has been validated with an auto-regressive task (enwik8).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Mocking Bird

    Mocking Bird

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    ...It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English. The codebase is implemented in Python (with PyTorch) and includes modules for encoder, synthesizer, vocoder, preprocessing, and inference, as well as demo scripts and a web-server interface for easier experimentation or deployment. MockingBird supports both using pretrained models and training your own synthesizer (with custom datasets), giving flexibility for voice-cloning or custom-voice synthesis depending on your needs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    fairseq-lua

    fairseq-lua

    Facebook AI Research Sequence-to-Sequence Toolkit

    fairseq-lua is the original Lua/Torch7 version of Facebook AI Research’s sequence modeling toolkit, designed for neural machine translation (NMT) and sequence generation. It introduced early attention-based architectures and training pipelines that later evolved into the modern PyTorch-based fairseq. The framework implements sequence-to-sequence models with attention, beam search decoding, and distributed training, providing a research platform for exploring translation, summarization, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    AliceMind

    AliceMind

    ALIbaba's Collection of Encoder-decoders from MinD

    This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD (Machine IntelligeNce of Damo) Lab. Pre-trained models for natural language understanding (NLU). We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    Denoiser is a real-time speech enhancement model operating directly on raw waveforms, designed to clean noisy audio while running efficiently on CPU. It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output. The implementation includes data augmentation techniques applied to the raw waveforms (e.g. noise mixing, reverberation) to improve model robustness and generalization to diverse noise types. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    TTS

    TTS

    Deep learning for text to speech

    ...Notebooks for extensive model benchmarking. Modular (but not too much) code base enabling easy testing for new ideas. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN). If you are only interested in synthesizing speech with the released TTS models, installing from PyPI is the easiest option.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    ALAE

    ALAE

    Adversarial Latent Autoencoders

    ...The project implements the architecture introduced in the CVPR research paper on Adversarial Latent Autoencoders, which focuses on improving generative modeling by learning latent representations aligned with adversarial training objectives. Unlike traditional GANs that directly generate images from random noise, ALAE uses an encoder-decoder architecture that maps images into a structured latent space and then reconstructs them through adversarial training. This design allows the model to learn interpretable latent representations that can be manipulated to control generated image attributes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Multilingual Speech Synthesis

    Multilingual Speech Synthesis

    An implementation of Tacotron 2 that supports multilingual experiments

    ...The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DETR

    DETR

    End-to-end object detection with transformers

    ...Unlike traditional computer vision techniques, DETR approaches object detection as a direct set prediction problem. It consists of a set-based global loss, which forces unique predictions via bipartite matching, and a Transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. Due to this parallel nature, DETR is very fast and efficient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    Texar

    Texar

    Toolkit for Machine Learning, Natural Language Processing

    Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    CakeChat

    CakeChat

    CakeChat: Emotional Generative Dialog System

    ...The code is flexible and allows to condition model's responses by an arbitrary categorical variable. For example, you can train your own persona-based neural conversational model or create an emotional chatting machine. Hierarchical Recurrent Encoder-Decoder (HRED) architecture for handling deep dialog context. Multilayer RNN with GRU cells. The first layer of the utterance-level encoder is always bidirectional. By default, CuDNNGRU implementation is used for ~25% acceleration during inference. Thought vector is fed into decoder on each decoding step. Decoder can be conditioned on any categorical label, for example, emotion label or persona id. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Butteraugli

    Butteraugli

    Estimates the psychovisual difference between two images

    ...These maps make it practical to tune compressor settings and confirm whether bitrate reductions are visually acceptable. The metric has become a common yardstick for objective image quality when comparing codecs or encoder tweaks that target web or mobile delivery. Because it is deterministic and fast, it can be used in automated pipelines to gate releases on visual quality, not just file size.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    ...This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    OpenSeq2Seq is a TensorFlow-based toolkit for efficient experimentation with sequence-to-sequence models across speech and NLP tasks. Its core goal is to give researchers a flexible, modular framework for building and training encoder–decoder architectures while fully leveraging distributed and mixed-precision training. The toolkit includes ready-made models for neural machine translation, automatic speech recognition, speech synthesis, language modeling, and additional NLP tasks such as sentiment analysis. It supports multi-GPU and multi-node data-parallel training, and integrates with Horovod to scale out across large GPU clusters. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Retrieval-Based Conversational Model

    Retrieval-Based Conversational Model

    Dual LSTM Encoder for Dialog Response Generation

    Retrieval-Based Conversational Model in Tensorflow is a project implementing a retrieval-based conversational model using a dual LSTM encoder architecture in TensorFlow, illustrating how neural networks can be trained to select appropriate responses from a fixed set of candidate replies rather than generate them from scratch. The core idea is to embed both the conversation context and potential replies into vector representations, then score how well each candidate fits the current dialogue, choosing the best match accordingly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    DjVuPlus

    DjVuPlus

    DjVu Read Documents,With OCR Technology(Arabic ,English ),Small Size

    ...The patents cover a particular aspect of the ZP-coder (the arithmetic coder used in DjVu and implemented in libdjvu/ZPCodec.cpp) and the background masking technique used in the IW44 wavelet encoder (implemented in libdjvu/IW44EncodeCodec.cpp). Most patents are owned by AT&T. LizardTech has very broad rights to them and grants free and permanent licenses to them for the purpose of building GPL software with the DjVu Reference Library. The grant is materialized by two paragraphs in the headers of the DjVu Reference Library source files. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Ministral 3 8B Instruct 2512

    Ministral 3 8B Instruct 2512

    Compact 8B multimodal instruct model optimized for edge deployment

    Ministral 3 8B Instruct 2512 is a balanced, efficient model in the Ministral 3 family, offering strong multimodal capabilities within a compact footprint. It combines an 8.4B-parameter language model with a 0.4B vision encoder, enabling both text reasoning and image understanding. This FP8 instruct-fine-tuned variant is optimized for chat, instruction following, and structured outputs, making it ideal for daily assistant tasks and lightweight agentic workflows. Designed for edge deployment, the model can run on a wide range of hardware and fits locally on a single 12GB GPU, with the option for even smaller quantized configurations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    Qwen-Image-Edit is the image editing extension of Qwen-Image, a 20B parameter model that combines advanced visual and text-rendering capabilities for creative and precise editing. It leverages both Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling users to edit at both the content and detail level. The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or removing elements without altering surrounding regions. A standout feature is its bilingual text editing in English and Chinese, which preserves original font, size, and style during modifications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Mistral Large 3 675B Base 2512

    Mistral Large 3 675B Base 2512

    Frontier-scale 675B multimodal base model for custom AI training

    ...The model is engineered for reliability, long-context comprehension, and stable performance across many enterprise, scientific, and knowledge-intensive workloads. Its architecture includes a powerful language MoE and a 2.5B-parameter vision encoder, enabling multimodal understanding out of the box. Mistral Large 3 Base supports deployment on-premises using FP8 or NVFP4 formats, enabling high-performance workflows on B200, H200, H100, or A100 hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Mistral Large 3 675B Instruct 2512 Eagle

    Mistral Large 3 675B Instruct 2512 Eagle

    Speculative-decoding accelerator for the 675B Mistral Large 3

    ...It works alongside the primary 675B instruct model, enabling faster response times by predicting several tokens ahead using Mistral’s Eagle speculative method. Built on the same frontier-scale multimodal Mixture-of-Experts architecture, it complements a system featuring 41B active parameters and a 2.5B-parameter vision encoder. The Eagle variant is specialized rather than standalone, serving as a performance accelerator for production-grade assistants, agentic workflows, long-context applications, and retrieval-augmented reasoning pipelines. It supports the same multilingual, system-prompt-aligned, and function-calling behavior as the main instruct model when used in the recommended server-client configuration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Mistral Large 3 675B Instruct 2512 NVFP4

    Mistral Large 3 675B Instruct 2512 NVFP4

    Quantized 675B multimodal instruct model optimized for NVFP4

    ...It retains the same instruction-tuned behavior as the FP8 model, making it ideal for production assistants, agentic workflows, scientific tasks, and long-context enterprise systems. The model integrates a 673B-parameter MoE language backbone with a 2.5B-parameter vision encoder, enabling rich multimodal analysis across text and images. Designed for efficient deployment, it runs on a single H100 or A100 node in NVFP4 while delivering performance similar to FP8 for short- and mid-context workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    Ministral 3 3B Base 2512 is the smallest model in the Ministral 3 family, offering a compact yet capable multimodal architecture suited for lightweight AI applications. It combines a 3.4B-parameter language model with a 0.4B vision encoder, enabling both text and image understanding in a tiny footprint. As the base pretrained model, it is not fine-tuned for instructions or reasoning, making it the ideal foundation for custom post-training, domain adaptation, or specialized downstream tasks. The model is fully optimized for edge deployment and can run locally on a single GPU, fitting in 16GB VRAM in BF16 or less than 8GB when quantized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Ministral 3 8B Reasoning 2512

    Ministral 3 8B Reasoning 2512

    Efficient 8B multimodal model tuned for advanced reasoning tasks.

    Ministral 3 8B Reasoning 2512 is a balanced midsize model in the Ministral 3 family, delivering strong multimodal reasoning capabilities within an efficient footprint. It combines an 8.4B-parameter language model with a 0.4B vision encoder, enabling it to process both text and images for advanced reasoning tasks. This version is specifically post-trained for reasoning, making it well-suited for math, coding, and STEM applications requiring multi-step logic and problem-solving. Despite its reasoning-focused training, the model remains edge-optimized and can run locally on a single 24GB GPU in BF16, or under 12GB when quantized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Ministral 3 14B Reasoning 2512

    Ministral 3 14B Reasoning 2512

    High-precision 14B multimodal model built for advanced reasoning tasks

    Ministral 3 14B Reasoning 2512 is the largest model in the Ministral 3 series, delivering frontier-level performance with capabilities comparable to the Mistral Small 3.2 24B model. It pairs a 13.5B-parameter language model with a 0.4B vision encoder, enabling strong multimodal reasoning across both text and images. This version is specifically post-trained for reasoning tasks, making it highly effective for math, coding, STEM workloads, and complex multi-step problem-solving. Despite its scale, the model is engineered for practical deployment and can run locally on 32GB of VRAM in BF16 or under 24GB when quantized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Ministral 3 3B Instruct 2512

    Ministral 3 3B Instruct 2512

    Ultra-efficient 3B multimodal instruct model built for edge deployment

    Ministral 3 3B Instruct 2512 is the smallest model in the Ministral 3 family, offering a lightweight yet capable multimodal architecture designed for edge and low-resource deployments. It includes a 3.4B-parameter language model paired with a 0.4B vision encoder, enabling it to understand both text and visual inputs. As an FP8 instruct-fine-tuned model, it is optimized for chat, instruction following, and compact agentic tasks while maintaining strong adherence to system prompts. Despite its small size, it delivers efficient real-time performance and can run locally on a single 8GB GPU, with further memory reductions through quantization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB