HeyVid.ai Integrations

GPT-4o

OpenAI

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time (opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

1 Rating

Starting Price: $5.00 / 1M tokens

View Software

Wan2.1

Alibaba

Wan2.1 is an open-source suite of advanced video foundation models designed to push the boundaries of video generation. This cutting-edge model excels in various tasks, including Text-to-Video, Image-to-Video, Video Editing, and Text-to-Image, offering state-of-the-art performance across multiple benchmarks. Wan2.1 is compatible with consumer-grade GPUs, making it accessible to a broader audience, and supports multiple languages, including both Chinese and English for text generation. The model's powerful video VAE (Variational Autoencoder) ensures high efficiency and excellent temporal information preservation, making it ideal for generating high-quality video content. Its applications span across entertainment, marketing, and more.

1 Rating

Starting Price: Free

View Software

DALL·E 2

OpenAI

DALL·E 2 can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles. DALL·E 2 can can expand images beyond what’s in the original canvas, creating expansive new compositions. DALL·E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account. DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image. Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.

2 Ratings

Starting Price: Free

View Software

Runway

Runway AI

Runway is an AI research and product company focused on building systems that simulate the world through generative models. The platform develops advanced video, world, and robotics models that can understand, generate, and interact with reality. Runway’s technology powers state-of-the-art generative video models like Gen-4.5 with cinematic motion and visual fidelity. It also pioneers General World Models (GWM) capable of simulating environments, agents, and physical interactions. Runway bridges art and science to transform media, entertainment, robotics, and real-time interaction. Its models enable creators, researchers, and organizations to explore new forms of storytelling and simulation. Runway is used by leading enterprises, studios, and academic institutions worldwide.

Starting Price: $15 per user per month

View Software

Luma AI

Photorealistic, high quality 3D, now for everyone. Luma is a new way to create incredible lifelike 3D with AI using your iPhone. Easily capture products, objects, landscapes and scenes wherever you are. From your captures create cinematic product videos, impossible camera moves for TikTok, or just relive the moment. No Lidar or fancy capture equipment necessary, all you need is an iPhone 11 or newer. For the first time, now you can: - Capture 3D scenes with intricate details, reflections, and lighting and share with everyone. Bring people where you are! - Capture products in 3D and showcase them on your website exactly how they appear in real life. No more "fake 3D". - Capture 3D game assets in unmatched quality and bring them to Blender, Unity or your 3D engine of choice.

Starting Price: Free

View Software

Grok

xAI

Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask! Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use it if you hate humor! A unique and fundamental advantage of Grok is that it has real-time knowledge of the world via the 𝕏 platform. It will also answer spicy questions that are rejected by most other AI systems.

Starting Price: Free

View Software

Imagen

Google

Imagen is a text-to-image generation model developed by Google Research. It uses advanced deep learning techniques, primarily leveraging large Transformer-based architectures, to generate high-quality, photorealistic images from natural language descriptions. Imagen's core innovation lies in combining the power of large language models (like those used in Google's NLP research) with the generative capabilities of diffusion models—a class of generative models known for creating images by progressively refining noise into detailed outputs. What sets Imagen apart is its ability to produce highly detailed and coherent images, often capturing fine-grained details and textures based on complex text prompts. It builds on the advancements in image generation made by models like DALL-E, but focuses heavily on semantic understanding and fine detail generation.

Starting Price: Free

View Software

Qwen-Image

Alibaba

Qwen-Image is a multimodal diffusion transformer (MMDiT) foundation model offering state-of-the-art image generation, text rendering, editing, and understanding. It excels at complex text integration, seamlessly embedding alphabetic and logographic scripts into visuals with typographic fidelity, and supports diverse artistic styles from photorealism to impressionism, anime, and minimalist design. Beyond creation, it enables advanced image editing operations such as style transfer, object insertion or removal, detail enhancement, in-image text editing, and human pose manipulation through intuitive prompts. Its built-in vision understanding tasks, including object detection, semantic segmentation, depth and edge estimation, novel view synthesis, and super-resolution, extend its capabilities into intelligent visual comprehension. Qwen-Image is accessible via popular libraries like Hugging Face Diffusers and integrates prompt-enhancement tools for multilingual support.

Starting Price: Free

View Software

Flux

Build hardware more efficiently with real-time collaboration, an easy-to-use simulator, and forkable community content. Leverage collective intelligence through modern sharing, permissions, and an easy-to-use version control system. We believe in the power of open-source. Get started quickly with an ever-expanding library of parts and schematics created by the Flux community. Finally, a programmable simulator that doesn’t require a PHD to use. Check your schematic before you build, all from the browser. Whether you're building a simple circuit board or designing hardware for the next Mars mission, Flux is where great hardware projects are born. Flux is a browser-based end-to-end electronic design tool that breaks down barriers. Flux is making something new, and we’re doing in a new way. It’s called building in the open. Join our community of engineers, makers, and entrepreneurs who are passionate about improving hardware design tools.

Starting Price: $7 per user per month

View Software

Stable Diffusion

Stability AI

Over the last few weeks we all have been overwhelmed by the response and have been working hard to ensure a safe and ethical release, incorporating data from our beta model tests and community for the developers to act on. In cooperation with the tireless legal, ethics and technology teams at HuggingFace and amazing engineers at CoreWeave. We have developed an AI-based Safety Classifier included by default in the overall software package. This understands concepts and other factors in generations to remove outputs that may not be desired by the model user. The parameters of this can be readily adjusted and we welcome input from the community how to improve this. Image generation models are powerful, but still need to improve to understand how to represent what we want better.

Starting Price: $0.2 per image

View Software

Midjourney

Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. You may also generate images with our tool on another server that has invited and set up the Midjourney Bot: read the instructions there or ask more experienced users to point you towards one of the Bot channels on that server. Once you're satisfied with the prompt you just wrote, press Enter or send your message. That will deliver your request to the Midjourney Bot, which will soon start generating your images. You can ask the Midjourney Bot to send you a Discord direct message containing your final results. Commands are functions of the Midjourney bot that can be typed in any bot channel or thread under a bot channel.

Starting Price: $10 per month

View Software

Recraft

Recraft offers the best in class vectorizer that can convert any illustration into a vector with excellent quality and using only a minimal number of points. Browse through the community page to discover new techniques and gain inspiration for beautiful images generation with Recraft. Switch between various artistic styles to transform your images as you need.

Starting Price: $10/month

View Software

Seedance

ByteDance

Seedance 1.0 API is officially live, giving creators and developers direct access to the world’s most advanced generative video model. Ranked #1 globally on the Artificial Analysis benchmark, Seedance delivers unmatched performance in both text-to-video and image-to-video generation. It supports multi-shot storytelling, allowing characters, styles, and scenes to remain consistent across transitions. Users can expect smooth motion, precise prompt adherence, and diverse stylistic rendering across photorealistic, cinematic, and creative outputs. The API provides a generous free trial with 2 million tokens and affordable pay-as-you-go pricing from just $1.8 per million tokens. With scalability and high concurrency support, Seedance enables studios, marketers, and enterprises to generate 5–10 second cinematic-quality videos in seconds.

View Software

Seedream

ByteDance

Seedream 3.0 is ByteDance’s newest high-aesthetic image generation model, officially available through its API with 200 free trial images. It supports native 2K resolution output for crisp, professional visuals across text-to-image and image-to-image tasks. The model excels at realistic character rendering, capturing nuanced facial details, natural skin textures, and expressive emotions while avoiding the artificial look common in older AI outputs. Beyond realism, Seedream provides advanced text typesetting, enabling designer-level posters with accurate typography, layout, and stylistic cohesion. Its image editing capabilities preserve fine details, follow instructions precisely, and adapt seamlessly to varied aspect ratios. With transparent pricing at just $0.03 per image, Seedream delivers professional-grade visuals at an accessible cost.

View Software

Pika

Pika Labs

A powerful Text-to-Video platform that can unleash your creativity simply by typing. Pika Labs introduces a groundbreaking solution that breathes life into your concepts by merely inputting your preferred text. The era of intricate video editing tools and time-consuming production procedures is now a thing of the past. This revolutionary platform lets you turn your text into compelling and visually stunning videos without breaking a sweat. Unlock your creative potential and marvel as your carefully crafted words effortlessly metamorphose into vibrant video content that rivets your viewers' attention.

View Software

PixVerse

Create breathtaking videos with AI. Transform your ideas into stunning visuals with our powerful video creation platform. Brush the area, mark the direction, and watch your image come to life. Create with a more friendly interface and explore amazing creations from the community. Manage all your videos in one place and view videos you liked in your collection. Dive into endless possibilities and narrate your stories like never before. Bring your characters to life with consistent identity across multiple scenes and transformations. Improved compatibility and responsiveness to motion parameters, delivering more effective results in matching motion intensity. You can now control the movement of the camera in different directions, horizontal, vertical, roll, and zoom. We believe AI video generation injects new vitality into the content industry and ignites the imagination in every ordinary corner.

View Software

Vidu

Vidu is an AI-powered video generation platform that allows users to create stunning videos from text, images, or reference materials in just seconds. With unique features such as Multi-Entity Consistency, Vidu enables creators to generate high-quality, dynamic videos that are consistent across various elements like characters, objects, and environments. The platform is ideal for industries such as film, anime, and advertising, offering tools to streamline production, enhance creativity, and produce realistic animations with powerful semantic understanding.

View Software

Hunyuan T1

Tencent

Hunyuan T1 is Tencent's deep-thinking AI model, now fully open to all users through the Tencent Yuanbao platform. This model excels in understanding multiple dimensions and potential logical relationships, making it suitable for handling complex tasks. Users can experience various AI models on the platform, including DeepSeek-R1 and Tencent Hunyuan Turbo. The official version of the Tencent Hunyuan T1 model will also be launched soon, providing external API access and other services. Built upon Tencent's Hunyuan large language model, Yuanbao excels in Chinese language understanding, logical reasoning, and task execution. It offers AI-based search, summaries, and writing capabilities, enabling users to analyze documents and engage in prompt-based interactions.

View Software

Veo 3

Google

Veo 3 is Google’s latest state-of-the-art video generation model, designed to bring greater realism and creative control to filmmakers and storytellers. With the ability to generate videos in 4K resolution and enhanced with real-world physics and audio, Veo 3 allows creators to craft high-quality video content with unmatched precision. The model’s improved prompt adherence ensures more accurate and consistent responses to user instructions, making the video creation process more intuitive. It also introduces new features that give creators more control over characters, scenes, and transitions, enabling seamless integration of different elements to create dynamic, engaging videos.

View Software

FLUX.1 Kontext

Black Forest Labs

FLUX.1 Kontext is a suite of generative flow matching models developed by Black Forest Labs, enabling users to generate and edit images using both text and image prompts. This multimodal approach allows for in-context image generation, facilitating seamless extraction and modification of visual concepts to produce coherent renderings. Unlike traditional text-to-image models, FLUX.1 Kontext unifies instant text-based image editing with text-to-image generation, offering capabilities such as character consistency, context understanding, and local editing. Users can perform targeted modifications on specific elements within an image without affecting the rest, preserve unique styles from reference images, and iteratively refine creations with minimal latency.

View Software

Nano Banana

Google

Nano Banana is Gemini’s fast, accessible image-creation model designed for quick, playful, and casual creativity. It lets users blend photos, maintain character consistency, and make small local edits with ease. The tool is perfect for transforming selfies, reimagining pictures with fun themes, or combining two images into one. With its ability to handle stylistic changes, it can turn photos into figurine-style designs, retro portraits, or aesthetic makeovers using simple prompts. Nano Banana makes creative experimentation easy and enjoyable, requiring no advanced skills or complex controls. It’s the ideal starting point for users who want simple, fast, and imaginative image editing inside the Gemini app.

View Software

Sora 2

OpenAI

Sora is OpenAI’s advanced text-to-video generation model that takes text, images, or short video inputs and produces new videos up to 20 seconds long (1080p, vertical or horizontal format). It also supports remixing or extending existing video clips and blending media inputs. Sora is accessible via ChatGPT Plus/Pro and through a web interface. The system includes a featured/recent feed showcasing community creations. It embeds strong content policies to restrict sensitive or copyrighted content, and videos generated include metadata tags to indicate AI provenance. With the announcement of Sora 2, OpenAI is pushing the next iteration: Sora 2 is being released with enhancements in physical realism, controllability, audio generation (speech and sound effects), and deeper expressivity. Alongside Sora 2, OpenAI launched a standalone iOS app called Sora, which resembles a short-video social experience.

View Software

Veo 3.1

Google

Veo 3.1 builds on the capabilities of the previous model to enable longer and more versatile AI-generated videos. With this version, users can create multi-shot clips guided by multiple prompts, generate sequences from three reference images, and use frames in video workflows that transition between a start and end image, both with native, synchronized audio. The scene extension feature allows extension of a final second of a clip by up to a full minute of newly generated visuals and sound. Veo 3.1 supports editing of lighting and shadow parameters to improve realism and scene consistency, and offers advanced object removal that reconstructs backgrounds to remove unwanted items from generated footage. These enhancements make Veo 3.1 sharper in prompt-adherence, more cinematic in presentation, and broader in scale compared to shorter-clip models. Developers can access Veo 3.1 via the Gemini API or through the tool Flow, targeting professional video workflows.

View Software

Kling 2.6

Kuaishou Technology

Kling 2.6 is an advanced AI video generation model that produces fully immersive audio-visual content in a single pass. Unlike earlier AI video tools that generated silent visuals, Kling 2.6 creates synchronized visuals, natural voiceovers, sound effects, and ambient audio together. The model supports both text-to-audio-visual and image-to-audio-visual workflows for fast content creation. Kling 2.6 automatically aligns sound, rhythm, emotion, and camera movement to deliver a cohesive viewing experience. Native Audio allows creators to control voices, sound effects, and atmosphere without external editing. The platform is designed to be accessible for beginners while offering creative depth for advanced users. Kling 2.6 transforms AI video from basic visuals into fully realized, story-driven media.

View Software

Hailuo AI

Hailuo AI represents a pioneering venture into the realm of AI-driven video content creation. This model allows users to generate six-second video clips from textual descriptions, operating at a resolution of 1280x720 with a frame rate of 25 fps. It's designed to democratize video production, enabling creators to visualize their ideas without extensive technical knowledge or equipment. Hailuo AI showcases capabilities in rendering human movement with notable naturalness, alongside handling cinematic camera movements, which sets it apart in the competitive landscape of AI video generators.

1 Rating

View Software

Ideogram AI

Ideogram AI is a text to image AI image generator. Ideogram's technology is based on a new type of neural network called a diffusion model. Diffusion models are trained on a large dataset of images, and they can then generate new images that are similar to the images in the dataset. However, unlike other generative AI models, diffusion models can also be used to generate images in a specific style.

2 Ratings

View Software

HeyVid.ai Integrations

26 Integrations with HeyVid.ai

GPT-4o

Wan2.1

DALL·E 2

Runway

Luma AI

Grok

Imagen

Qwen-Image

Flux

Stable Diffusion

Midjourney

Recraft

Seedance

Seedream

Pika

PixVerse

Vidu

Hunyuan T1

Veo 3

FLUX.1 Kontext

Nano Banana

Sora 2

Veo 3.1

Kling 2.6

Hailuo AI

Ideogram AI

Related Categories

Related Categories That Integrate With HeyVid.ai