Showing 53 open source projects for "visual-cfd"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Dify

    Dify

    One API for plugins and datasets, one interface for prompt engineering

    Dify is an easy-to-use LLMOps platform designed to empower more people to create sustainable, AI-native applications. With visual orchestration for various application types, Dify offers out-of-the-box, ready-to-use applications that can also serve as Backend-as-a-Service APIs. Unify your development process with one API for plugins and datasets integration, and streamline your operations using a single interface for prompt engineering, visual analytics, and continuous improvement. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    InternGPT

    InternGPT

    Open source demo platform where you can easily showcase your AI models

    InternGPT is an open-source multimodal AI framework designed to extend large language models beyond text interactions into visual reasoning and image manipulation tasks. The system integrates conversational AI with computer vision models so users can interact with images, videos, and visual environments through natural language instructions. Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    InternLM-XComposer-2.5

    InternLM-XComposer-2.5

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

    ...It incorporates visual understanding modules that allow the model to analyze images and integrate them into coherent narrative outputs. The framework also supports tasks such as image captioning, multimodal reasoning, and layout generation for structured visual documents. By combining language generation with visual composition capabilities, the system enables new forms of content creation that integrate written explanations with automatically generated visual components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Flowise

    Flowise

    Drag & drop UI to build your customized LLM flow

    Open source UI visual tool to build your customized LLM flow using LangchainJS, written in Node Typescript/Javascript. Conversational agent for a chat model which utilizes chat-specific prompts and buffer memory. Open source is the core of Flowise, and it will always be free for commercial and personal usage. Flowise support different environment variables to configure your instance.
    Downloads: 26 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Kimi K2.5

    Kimi K2.5

    Moonshot's most powerful AI model

    Kimi K2.5 is Moonshot AI’s open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision and text tokens. Based on a 1T-parameter Mixture-of-Experts (MoE) architecture with 32B activated parameters, it integrates advanced language reasoning with strong visual understanding. K2.5 supports both “Thinking” and “Instant” modes, enabling either deep step-by-step reasoning or low-latency responses depending on the task. Designed for agentic workflows, it features an Agent Swarm mechanism that decomposes complex problems into coordinated sub-agents executing in parallel. With a 256K context length and MoonViT vision encoder, the model excels across reasoning, coding, long-context comprehension, image, and video benchmarks. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 6
    WeChatMsg

    WeChatMsg

    Project aimed at extracting, exporting, and analyzing chat records

    ...It provides tools that read local WeChat database files and allow users to convert chat data into readable formats such as HTML, Word, and CSV, making it possible to inspect conversations outside the mobile app environment. Beyond simple export, the project includes mechanisms for analyzing chat histories and generating annual reports or visual summaries about messaging trends, interaction patterns, and more. The original README communicates a guiding philosophy about owning personal data and using it responsibly to train personalized AI agents or preserve memories. Although the repository has seen periods of inactivity and may not receive frequent updates, its widespread use indicates community interest in preserving chat logs and understanding conversation data outside of the WeChat interface.
    Downloads: 236 This Week
    Last Update:
    See Project
  • 7
    Skywork-R1V4

    Skywork-R1V4

    Skywork-R1V is an advanced multimodal AI model series

    Skywork-R1V is an open-source multimodal reasoning model designed to extend the capabilities of large language models into vision-language tasks that require complex logical reasoning. The project introduces a model architecture that transfers the reasoning abilities of advanced text-based models into visual domains so the system can interpret images and perform multi-step reasoning about them. Instead of retraining both language and vision models from scratch, the framework uses a lightweight visual projection layer that connects a pretrained vision backbone with a reasoning-capable language model. This design allows the model to analyze images while maintaining strong textual reasoning performance, enabling tasks such as solving visual math problems, interpreting scientific diagrams, and answering questions about images.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    Qwen3-VL is the latest multimodal large language model series from Alibaba Cloud’s Qwen team, designed to integrate advanced vision and language understanding. It represents a major upgrade in the Qwen lineup, with stronger text generation, deeper visual reasoning, and expanded multimodal comprehension. The model supports dense and Mixture-of-Experts (MoE) architectures, making it scalable from edge devices to cloud deployments, and is available in both instruction-tuned and reasoning-enhanced variants. Qwen3-VL is built for complex tasks such as GUI automation, multimodal coding (converting images or videos into HTML, CSS, JS, or Draw.io diagrams), long-context reasoning with support up to 1M tokens, and comprehensive video understanding. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    ...The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into structured vector representations, enabling high-quality vectorization and design generation. This approach allows StarVector to create scalable graphics that maintain visual quality regardless of resolution, which is especially useful for design tools and illustration workflows. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Autonomous Agents

    Autonomous Agents

    Autonomous Agents (LLMs) research papers. Updated Daily

    ...These methods allow agents to combine visual and geometric information while maintaining awareness of the spatial relationships between agents and objects.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    ...The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. It also maintains a timeline of analyzed camera events that can be displayed in dashboards or queried through the assistant interface.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    InternVL

    InternVL

    A Pioneering Open-Source Alternative to GPT-4o

    InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a wide variety of tasks, including visual perception, image classification, and cross-modal retrieval between images and text. It can also be connected to language models to enable conversational interfaces that understand images, videos, and other visual content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    UFO³

    UFO³

    Weaving the Digital Agent Galaxy

    ...The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be manipulated. This enables the agent to navigate complex software environments and perform tasks that normally require manual interaction. UFO integrates mechanisms for task decomposition, planning, and execution so that high-level user requests can be broken down into smaller steps performed by specialized agents. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models answer structured questions about the environment to guide decision making. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    swark.io

    swark.io

    Create architecture diagrams from code automatically using LLMs

    Swark is an open-source developer tool and Visual Studio Code extension that automatically generates software architecture diagrams directly from source code using large language models. The project aims to help developers quickly understand complex codebases by analyzing repositories and producing visual diagrams that represent system architecture, dependencies, and component relationships.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    LandPPT

    LandPPT

    An LLM-based presentation generation platform

    ...The application integrates multiple AI models from providers such as OpenAI, Anthropic, Google, and locally hosted models to generate text, images, and structured presentation layouts. It also includes template systems and style options that allow presentations to be customized for different industries, visual themes, or storytelling formats.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    ...The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual segmentation outputs. This approach allows the system to identify objects or regions in images based on semantic descriptions, contextual reasoning, and world knowledge. The model integrates multimodal capabilities by combining language understanding with visual perception so that text instructions guide the segmentation process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Langflow

    Langflow

    Low-code app builder for RAG and multi-agent AI applications

    Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    LangGraph Studio

    LangGraph Studio

    Desktop app for prototyping and debugging LangGraph applications

    LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications. With visual graphs and the ability to edit state, you can better understand agent workflows and iterate faster. LangGraph Studio integrates with LangSmith so you can collaborate with teammates to debug failure modes. While in Beta, LangGraph Studio is available for free to all LangSmith users on any plan tier. LangGraph Studio requires docker-compose version 2.22.0+ or higher. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 23
    Floneum

    Floneum

    Instant, controllable, local pre-trained AI models in Rust

    Floneum is an open-source platform for building AI-powered workflows using large language models through a visual and extensible interface. The system allows users to design complex AI pipelines using a drag-and-drop workflow builder rather than writing extensive code. It focuses on enabling developers and researchers to create language model applications that combine different tools, data sources, and AI capabilities into automated workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    VLMEvalKit

    VLMEvalKit

    Open-source evaluation toolkit of large multi-modality models (LMMs)

    ...VLMEvalKit supports generation-based evaluation methods, allowing models to produce textual responses to visual inputs while measuring performance through techniques such as exact matching or language-model-assisted answer extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    opcode

    opcode

    A powerful GUI app and Toolkit for Claude Code

    opcode is an open source desktop application and toolkit designed to enhance the developer experience when working with Claude Code by providing a graphical interface and advanced workflow management tools. The project acts as a command center for AI-assisted programming, bridging the gap between command-line workflows and modern visual development environments. Built using the Tauri framework, Opcode enables developers to manage multiple Claude sessions, create custom agents, and track usage in a centralized interface. The platform is intended to make AI-assisted coding more intuitive by providing visual tools for monitoring agent activity, organizing projects, and reviewing development timelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB