Large Audio Language Model built for natural interactions
Tools for publishing transcripts for Claude Code sessions
95% token savings. 155x faster queries. 16 languages
Refine and quantize messy AI pixel art into clean, perfect pixels
Specification and documentation for the Universal Commerce Protocol
Context engineering is the new vibe coding
Chinese XLNet pre-trained model
Inference script for Oasis 500M
Extract audio and video content and organize it into a Markdown note
Document Image Parsing via Heterogeneous Anchor Prompting”
Framework for building neural networks
StreamSpeech is a seamless model for offline speech recognition
Automatic SSRF fuzzer and exploitation tool
A best practices guide for day 2 operations
Mini website for testing both general CS knowledge and enforce coding
Blazing-fast vector DB with similarity search and metadata filtering
Library for reading and writing large multi-dimensional arrays
A JAX-native LLM Post-Training Library
A Model Context Protocol server for searching and analyzing arXiv
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Set of tools to assess and improve LLM security
Open-source platform for building enterprise-grade agents
TorchMultimodal is a PyTorch library