Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Deep learning for text to speech
Adversarial Latent Autoencoders
An implementation of Tacotron 2 that supports multilingual experiments
End-to-end object detection with transformers
Toolkit for Machine Learning, Natural Language Processing
CakeChat: Emotional Generative Dialog System
Estimates the psychovisual difference between two images
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Toolkit for efficient experimentation with Speech Recognition
Dual LSTM Encoder for Dialog Response Generation
Compact 8B multimodal instruct model optimized for edge deployment
An advanced bilingual image editing with semantic control
Frontier-scale 675B multimodal base model for custom AI training
Speculative-decoding accelerator for the 675B Mistral Large 3
Quantized 675B multimodal instruct model optimized for NVFP4
Small 3B-base multimodal model ideal for custom AI on edge hardware
Efficient 8B multimodal model tuned for advanced reasoning tasks.
High-precision 14B multimodal model built for advanced reasoning tasks
Ultra-efficient 3B multimodal instruct model built for edge deployment
Efficient 14B multimodal instruct model with edge deployment and FP8
Frontier-scale 675B multimodal instruct MoE model for enterprise AIMis
Compact 3B-param multimodal model for efficient on-device reasoning
Versatile 8B-base multimodal LLM, flexible foundation for custom AI
Powerful 14B-base multimodal model — flexible base for fine-tuning