LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual tokenizers. The project includes a supervised fine-tuning dataset composed of interleaved text and mesh data, allowing the model to learn relationships between textual descriptions and 3D structures. As a result, the model can generate mesh models directly from text prompts, explain mesh structures in natural language, or output mixed text-and-mesh sequences. This unified representation enables a single model to operate across both textual and spatial domains.
Features
- Text-based representation of 3D meshes using vertex and face data
- Unified model capable of generating both text and 3D geometry
- Supervised fine-tuning dataset combining textual and mesh data
- Text-to-3D mesh generation from natural language prompts
- Ability to interpret and describe existing mesh structures
- Integration with common LLM frameworks such as Transformers