Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
AI Models
Search Results

Search Results for "intelligence" - Page 6

x

Sort By:

Relevance

Clear All Filters

OS

Linux 257
Mac 230
Windows 224
More...
BSD 158
ChromeOS 158
Mobile Operating Systems 6

Category

Artificial Intelligence 267
Multimedia 7
Scientific/Engineering 6
Software Development 2
Business 1
Education 1
Security 1

License

OSI-Approved Open Source 213
Creative Commons Attribution License 10
Other License 2

Translations

English 5
Chinese (Simplified) 1
Chinese (Traditional) 1
Spanish 1

Programming Language

Python 267
Unix Shell 13
C++ 7
C 1
Go 1
More...
JavaScript 1
PowerShell 1
Rust 1
TypeScript 1

Status

Production/Stable 2

Showing 267 open source projects for "intelligence"

View related business solutions

AI Models Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Kimi-Audio

Audio foundation model excelling in audio understanding

Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one...

Downloads: 0 This Week

Last Update: 2026-01-27
See Project
2

xFormers

Hackable and optimized Transformers building blocks

xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and...

Downloads: 1 This Week

Last Update: 2026-02-20
See Project
3

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
4

HRM-Text

1B text generation model based on the HRM architecture

HRM-Text is a one-billion-parameter text generation model and pretraining framework based on the Hierarchical Reasoning Model architecture. It is designed to make foundation model pretraining more accessible by reducing compute and data requirements compared with traditional scaling-heavy approaches. The system combines hierarchical recurrent design, task-completion strengthening, and latent-space reasoning. Its training stack includes PrefixLM sequence packing, FlashAttention 3 kernels,...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
5

Gemma in PyTorch

The official PyTorch implementation of Google's Gemma models

gemma_pytorch provides the official PyTorch reference for running and fine-tuning Google’s Gemma family of open models. It includes model definitions, configuration files, and loading utilities for multiple parameter scales, enabling quick evaluation and downstream adaptation. The repository demonstrates text generation pipelines, tokenizer setup, quantization paths, and adapters for low-rank or parameter-efficient fine-tuning. Example notebooks walk through instruction tuning and evaluation...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
6

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
7

Tongyi DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

DeepResearch (Tongyi DeepResearch) is an open-source “deep research agent” developed by Alibaba’s Tongyi Lab designed for long-horizon, information-seeking tasks. It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and...

Downloads: 0 This Week

Last Update: 2026-02-27
See Project
8

Janus

Unified Multimodal Understanding and Generation Models

Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing...

Downloads: 1 This Week

Last Update: 2025-10-20
See Project
9

Google DeepMind GraphCast and GenCast

Global weather forecasting model using graph neural networks and JAX

GraphCast, developed by Google DeepMind, is a research-grade weather forecasting framework that employs graph neural networks (GNNs) to generate medium-range global weather predictions. The repository provides complete example code for running and training both GraphCast and GenCast, two models introduced in DeepMind’s research papers. GraphCast is designed to perform high-resolution atmospheric simulations using the ERA5 dataset from ECMWF, while GenCast extends the approach with...

Downloads: 1 This Week

Last Update: 2026-03-31
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
10

Step-Video-T2V

State-of-the-art (SoTA) text-to-video pre-trained model

Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible...

Downloads: 2 This Week

Last Update: 2025-12-02
See Project
11

Step-Audio

Open-source framework for intelligent speech interaction

Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
12

Vidi2

Large Multimodal Models for Video Understanding and Editing

Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
13

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...

Downloads: 0 This Week

Last Update: 2026-06-02
See Project
14

HunyuanWorld-Voyager

RGBD video generation model conditioned on camera input

HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks. At its core, Voyager integrates a world-consistent video...

Downloads: 0 This Week

Last Update: 2026-04-15
See Project
15

gemini-web2api

Convert Google Gemini web into OpenAI-compatible API

gemini-web2api is a Python bridge that exposes Google Gemini web access through OpenAI-compatible API endpoints. It is designed to let OpenAI-style clients connect to Gemini-like models through routes such as chat completions, models, responses, and native Gemini-compatible endpoints. The project can run as a simple local server and uses a mostly single-file design with an optional dependency for streaming. It supports model aliases for Flash, Thinking, Pro-style routing, Auto, and Lite...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
16

MiniMind-O

A 0.1B Omni model trained from scratch

MiniMind-O is an educational open-source project for building a small end-to-end Omni model from scratch. It extends the MiniMind family by exploring a model that can handle text, audio, and image inputs while producing text and streaming speech outputs. The project is designed to make multimodal AI training more accessible by keeping the model size small enough for ordinary personal hardware. It includes both mini and full training data paths, allowing learners to run a complete workflow...

Downloads: 0 This Week

Last Update: 2026-06-08
See Project
17

Cactus Needle

26m function call model that runs on incredibly small devices

Needle is an experimental 26-million-parameter function-calling model designed to run on extremely small devices such as phones, watches, glasses, and low-power personal AI hardware. It is based on a Simple Attention Network architecture and was distilled from a much larger model to focus on fast, compact tool-use behavior. The project provides open weights, training details, dataset generation resources, and a playground for testing the model with custom tools. Needle is optimized for...

Downloads: 0 This Week

Last Update: 2026-05-16
See Project
18

Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models

Qwen3-ASR is an automatic speech recognition system in the QwenLM family, developed to convert spoken language into text with strong accuracy and real-time performance. As a specialized ASR variant of the broader Qwen language model ecosystem, it focuses on capturing reliable transcriptions from audio sources such as recordings, live streams, or conversational inputs while supporting low latency use cases. The architecture combines advanced neural acoustic modeling with context-aware...

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
19

LingBot-VLA

A Pragmatic VLA Foundation Model

LingBot-VLA is an open-source Vision-Language-Action (VLA) foundational AI model designed to serve as a general “brain” for real-world robotic manipulation by grounding multimodal perception and language into actionable motions. It has been pretrained on tens of thousands of hours of real robotic interaction data across multiple robot platforms, which enables it to generalize well to diverse morphologies and tasks without needing extensive retraining on each new bot. The model aims to bridge...

Downloads: 0 This Week

Last Update: 2026-06-11
See Project
20

OpenTinker

OpenTinker is an RL-as-a-Service infrastructure for foundation models

OpenTinker is an open-source Reinforcement Learning-as-a-Service (RLaaS) infrastructure intended to democratize reinforcement learning for large language model (LLM) agents. Traditional RL setups can be monolithic and difficult to configure, but OpenTinker separates concerns across agent definition, environment interaction, and execution, which lets developers focus on defining the logic of agents and environments separately from how training and inference are run. It introduces a...

Downloads: 0 This Week

Last Update: 2026-03-01
See Project
21

HY-MT

Hunyuan Translation Model Version 1.5

HY-MT (Hunyuan Translation) is a high-quality multilingual machine translation model suite developed to support mutual translation across dozens of languages with strong performance even at smaller model scales. It ships with both an 1.8 B parameter model and a larger 7 B model, the latter optimized not only for direct translation but also for formatted and contextualized output, allowing better handling of terminology and mixed-language content. The project emphasizes both speed and...

Downloads: 0 This Week

Last Update: 2026-06-01
See Project
22

DFlash

Block Diffusion for Ultra-Fast Speculative Decoding

DFlash is an open-source framework for ultra-fast speculative decoding using a lightweight block diffusion model to draft text in parallel with a target large language model, dramatically improving inference speed without sacrificing generation quality. It acts as a “drafter” that proposes likely continuations which the main model then verifies, enabling significant throughput gains compared to traditional autoregressive decoding methods that generate token by token. This approach has been...

Downloads: 0 This Week

Last Update: 2026-05-10
See Project
23

Qwen3-VL-Embedding

Multimodal embedding and reranking models built on Qwen3-VL

Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
24

Z80-μLM

Z80-μLM is a 2-bit quantized language model

Z80-μLM is a retro-computing AI project that demonstrates a tiny language model (Z80-μLM) engineered to run on an 8-bit Z80 CPU by aggressively quantizing weights down to 2-bit precision. The repository provides a complete workflow where you train or fine-tune conversational models in Python, then export them into a format that can be executed on classic Z80 systems. A key deliverable is producing CP/M-compatible .COM binaries, enabling a genuinely vintage “chat with your computer”...

1 Review

Downloads: 0 This Week

Last Update: 2026-01-27
See Project
25

MedGemma

Collection of Gemma 3 variants that are trained for performance

MedGemma is a collection of specialized open-source AI models created by Google as part of its Health AI Developer Foundations initiative, built on the Gemma 3 family of transformer models and trained for medical text and image comprehension tasks that help accelerate the development of healthcare-focused AI applications. It includes multiple variants such as a 4 billion-parameter multimodal model that can process both medical images and text and a 27 billion-parameter text-only (and...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project

Previous
2
3
4
5
You're on page 6
7
8
9
10
11
Next

Related Searches

forensic audio analysis

ocr

video ai

audio voice

zip

remove background

offline artificial intelligence\

midi converter

hunyuan

djv

Related Categories

Artificial Intelligence

Multimedia

Scientific/Engineering

Software Development

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise