Reformer, the efficient Transformer, in Pytorch
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Facebook AI Research Sequence-to-Sequence Toolkit
ALIbaba's Collection of Encoder-decoders from MinD
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Deep learning for text to speech
Adversarial Latent Autoencoders
An implementation of Tacotron 2 that supports multilingual experiments
End-to-end object detection with transformers
Toolkit for Machine Learning, Natural Language Processing
CakeChat: Emotional Generative Dialog System
Estimates the psychovisual difference between two images
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Toolkit for efficient experimentation with Speech Recognition
Dual LSTM Encoder for Dialog Response Generation
DjVu Read Documents,With OCR Technology(Arabic ,English ),Small Size
Compact 8B multimodal instruct model optimized for edge deployment
An advanced bilingual image editing with semantic control
Frontier-scale 675B multimodal base model for custom AI training
Speculative-decoding accelerator for the 675B Mistral Large 3
Quantized 675B multimodal instruct model optimized for NVFP4
Small 3B-base multimodal model ideal for custom AI on edge hardware
Efficient 8B multimodal model tuned for advanced reasoning tasks.
High-precision 14B multimodal model built for advanced reasoning tasks
Ultra-efficient 3B multimodal instruct model built for edge deployment