BGE-M3 is a multilingual embedding model
Llama-2-7B is a 7B-parameter transformer model for text generation
Vision-language-action model for robot control via images and text
Detects speech activity in audio using pyannote.audio 2.1 pipeline
Time series forecasting model using T5 architecture with 46M params
Multimodal ERNIE 4.5 MoE model for image-text reasoning and chat