Towards Real-World Vision-Language Understanding
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Ongoing research training transformer models at scale
Fast multimodal LLM for real-time voice interaction and AI apps
AI multi-agent framework for automating data-driven R&D workflows
Hackable and optimized Transformers building blocks
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Large Multimodal Models for Video Understanding and Editing
Build high-quality LLM apps
Practice made claude perfect
Models for object and human mesh reconstruction
AIConfig is a config-based framework to build generative AI apps
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Document (PDF, Word, PPTX ...) extraction and parse API
Build GenAI application quick and easy
Data and tools for generating and inspecting OLMo pre-training data
Your open-source LLM evaluation toolkit
An on-premises, OCR-free unstructured data extraction
LLM Large Model of Selling Anchor
ChatGLM-6B: An Open Bilingual Dialogue Language Model
A Python library for audio
The standard data-centric AI package for data quality and ML
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
The lightweight PyTorch wrapper for high-performance AI research
Diversity-driven optimization and large-model reasoning ability