GLM-OCR
GLM-OCR is a multimodal optical character recognition model and open source repository that provides accurate, efficient, and comprehensive document understanding by combining text and visual modalities into a unified encoder–decoder architecture derived from the GLM-V family. Built with a visual encoder pre-trained on large-scale image–text data and a lightweight cross-modal connector feeding into a GLM-0.5B language decoder, the model supports layout detection, parallel region recognition, and structured output for text, tables, formulas, and complicated real-world document formats. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization, achieving state-of-the-art benchmarks on major document understanding tasks.
Learn more
Amazon CodeWhisperer
Build apps faster with ML-powered coding companion. Accelerate application development with automatic code recommendations based on the code and comments in your IDE. Empower developers to use artificial intelligence (AI) responsibly to create syntactically correct and secure applications. Generate entire functions and logical code blocks without having to search and customize code snippets from the web. Stay focused and never leave the IDE, with real-time customized code recommendations for all your Java, Python, and JavaScript projects. Amazon CodeWhisperer is a machine learning (ML)–powered service that helps improve developer productivity by generating code recommendations based on their comments in natural language and code in the integrated development environment (IDE). Accelerate frontend and backend development by empowering developers with automatic code recommendations. Save time and effort by using CodeWhisperer to generate code to build and train your ML models.
Learn more
Gemini Code Assist
Increase software development and delivery velocity using generative AI assistance, with enterprise security and privacy protection.
Gemini Code Assist completes your code as you write, and generates whole code blocks or functions on demand. Code assistance is available in many popular IDEs, such as Visual Studio Code, JetBrains IDEs (IntelliJ, PyCharm, GoLand, WebStorm, and more), Cloud Workstations, Cloud Shell Editor, and supports 20+ programming languages, including Java, JavaScript, Python, C, C++, Go, PHP, and SQL.
Through a natural language chat interface, you can quickly chat with Gemini Code Assist to get answers to your coding questions, or receive guidance on coding best practices. Chat is available in all supported IDEs.
Enterprises can customize Gemini Code Assist using their organization’s private codebases and knowledge sources so that Gemini Code Assist can offer more tailored assistance.
Gemini Code Assist enables large-scale changes to entire codebases.
Learn more
Mu
Mu is a 330-million-parameter encoder–decoder language model designed to power the agent in Windows settings by mapping natural-language queries to Settings function calls, running fully on-device via NPUs at over 100 tokens per second while maintaining high accuracy. Drawing on Phi Silica optimizations, Mu’s encoder–decoder architecture reuses a fixed-length latent representation to cut computation and memory overhead, yielding 47 percent lower first-token latency and 4.7× higher decoding speed on Qualcomm Hexagon NPUs compared to similar decoder-only models. Hardware-aware tuning, including a 2/3–1/3 encoder–decoder parameter split, weight sharing between input and output embeddings, Dual LayerNorm, rotary positional embeddings, and grouped-query attention, enables fast inference at over 200 tokens per second on devices like Surface Laptop 7 and sub-500 ms response times for settings queries.
Learn more