TRIBE v2 is a multimodal foundation model developed by Meta AI for predicting human brain activity from naturalistic stimuli such as video, audio, and text. It is designed for in-silico neuroscience, enabling researchers to model how the brain responds to complex real-world inputs. The system integrates state-of-the-art encoders—including LLaMA for text, V-JEPA for video, and Wav2Vec-BERT for audio—into a unified Transformer architecture. This combined representation is mapped onto the cortical surface to predict fMRI responses across thousands of brain regions. TRIBE v2 allows researchers to simulate and analyze brain activity without requiring direct human experiments. Overall, it provides a powerful tool for studying perception, cognition, and multimodal processing in the brain.
Features
- Multimodal modeling of video, audio, and text for brain response prediction.
- Transformer-based architecture mapping inputs to fMRI cortical activity.
- Integration of advanced models like LLaMA, V-JEPA, and Wav2Vec-BERT.
- Pretrained models available for inference on real-world media inputs.
- Support for training and experimentation with neuroscience datasets.
- Visualization tools for analyzing predicted brain activity across regions.