Alternatives to SAM 3D

Compare SAM 3D alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to SAM 3D in 2026. Compare features, ratings, user reviews, pricing, and more from SAM 3D competitors and alternatives in order to make an informed decision for your business.

  • 1
    Seed3D

    Seed3D

    ByteDance

    Seed3D 1.0 is a foundation-model pipeline that takes a single input image and generates a simulation-ready 3D asset, including closed manifold geometry, UV-mapped textures, and physically-based rendering material maps, designed for immediate integration into physics engines and embodied-AI simulators. It uses a hybrid architecture combining a 3D variational autoencoder for latent geometry encoding, and a diffusion-transformer stack to generate detailed 3D shapes, followed by multi-view texture synthesis, PBR material estimation, and UV texture completion. The geometry branch produces watertight meshes with fine structural details (e.g., thin protrusions, holes, text), while the texture/material branch yields multi-view consistent albedo, metallic, and roughness maps at high resolution, enabling realistic appearance under varied lighting. Assets generated by Seed3D 1.0 require minimal cleanup or manual tuning.
  • 2
    ReconstructMe

    ReconstructMe

    ReconstructMe

    ReconstructMe’s usage concept is similar to that of an ordinary video camera – simply move around the object to be modelled in 3D. Scanning with ReconstructMe scales from smaller objects such as human faces up to entire rooms and runs on commodity computer hardware. Read more about its features. Integrate ReconstructMe into your application using our powerful SDK. ReconstructMe’s usage concept is similar to that of an ordinary video camera – simply move around the object to be captured. However, instead of a video stream you get a complete 3d model in real-time. Read about our hardware requirements. Modelling with ReconstructMe scales from smaller objects such as human faces up to entire rooms. ReconstructMe is capable of capturing and processing the color information of the object being scanned, as long as the sensor provides the necessary color stream.
    Starting Price: $279 one-time payment
  • 3
    Qwen-Image

    Qwen-Image

    Alibaba

    Qwen-Image is a multimodal diffusion transformer (MMDiT) foundation model offering state-of-the-art image generation, text rendering, editing, and understanding. It excels at complex text integration, seamlessly embedding alphabetic and logographic scripts into visuals with typographic fidelity, and supports diverse artistic styles from photorealism to impressionism, anime, and minimalist design. Beyond creation, it enables advanced image editing operations such as style transfer, object insertion or removal, detail enhancement, in-image text editing, and human pose manipulation through intuitive prompts. Its built-in vision understanding tasks, including object detection, semantic segmentation, depth and edge estimation, novel view synthesis, and super-resolution, extend its capabilities into intelligent visual comprehension. Qwen-Image is accessible via popular libraries like Hugging Face Diffusers and integrates prompt-enhancement tools for multilingual support.
    Starting Price: Free
  • 4
    OmniHuman-1

    OmniHuman-1

    ByteDance

    OmniHuman-1 is a cutting-edge AI framework developed by ByteDance that generates realistic human videos from a single image and motion signals, such as audio or video. The platform utilizes multimodal motion conditioning to create lifelike avatars with accurate gestures, lip-syncing, and expressions that align with speech or music. OmniHuman-1 can work with a range of inputs, including portraits, half-body, and full-body images, and is capable of producing high-quality video content even from weak signals like audio-only input. The model's versatility extends beyond human figures, enabling the animation of cartoons, animals, and even objects, making it suitable for various creative applications like virtual influencers, education, and entertainment. OmniHuman-1 offers a revolutionary way to bring static images to life, with realistic results across different video formats and aspect ratios.
  • 5
    Imverse LiveMaker
    Use LiveMaker™ to make photorealistic 3D scenes for virtual reality experiences, volumetric videos, movie previsualization, video games, immersive training, virtual showrooms, and much more! LiveMaker™ is the first software that enables you to build 3D models from inside of virtual reality. It’s easy to use, and requires no special programming skills. Using proprietary voxel technology, LiveMaker™ lets you import 360° photos and reconstruct their geometry, retexture occlusions, create new objects, and relight the entire scene. It also allows you to import and integrate external media and assets, static or dynamic, low or high quality, so you can design your virtual scene without limitations. You can use LiveMaker™ to create complete environments or for quick visual prototyping, and the 3D models created with LiveMaker™ can be easily exported and used in other tools depending on your needs and workflow.
  • 6
    alwaysAI

    alwaysAI

    alwaysAI

    alwaysAI provides developers with a simple and flexible way to build, train, and deploy computer vision applications to a wide variety of IoT devices. Select from a catalog of deep learning models or upload your own. Use our flexible and customizable APIs to quickly enable core computer vision services. Quickly prototype, test and iterate with a variety of camera-enabled ARM-32, ARM-64 and x86 devices. Identify objects in an image by name or classification. Identify and count objects appearing in a real-time video feed. Follow the same object across a series of frames. Find faces or full bodies in a scene to count or track. Locate and define borders around separate objects. Separate key objects in an image from background visuals. Determine human body poses, fall detection, emotions. Use our model training toolkit to train an object detection model to identify virtually any object. Create a model tailored to your specific use-case.
  • 7
    Parallel Domain Replica Sim
    Parallel Domain Replica Sim enables the creation of high-fidelity, fully annotated, simulation-ready environments from users’ own captured data (photos, videos, scans). With PD Replica, you can generate near-pixel-perfect reconstructions of real-world scenes, transforming them into virtual environments that preserve visual detail and realism. PD Sim provides a Python API through which perception, machine learning, and autonomy teams can configure and run large-scale test scenarios and simulate sensor inputs (camera, lidar, radar, etc.) in either open- or closed-loop mode. These simulated sensor feeds come with full annotations, so developers can test their perception systems under a wide variety of conditions, lighting, weather, object configurations, and edge cases, without needing to collect real-world data for every scenario.
  • 8
    3D House Planner

    3D House Planner

    3D House Planner

    3D House Planner is the professional home design web application that allows you to design houses and apartments. No installation required. It is accessible through your browser. 3D House Planner is absolutely free for all. You can import or export 3d models for personal or commercial purposes. There are countless possibilities. Browse our catalog and select from thousands of objects to furnish and decorate the interior and exterior of your home with furnitures, decorative accessories, electric devices, and household appliances. We have also a texture Library with a wide range of high quality textures. Most textures contain albedo, normal, ambient occlusion, metalness and roughness maps. Apart from home design, you can import your own 3d object, change appearance, position of objects, make videos, take snapshots, etc.
  • 9
    Imagen 3

    Imagen 3

    Google

    Imagen 3 is the next evolution of Google's cutting-edge text-to-image AI generation technology. Building on the strengths of its predecessors, Imagen 3 offers significant advancements in image fidelity, resolution, and semantic alignment with user prompts. By employing enhanced diffusion models and more sophisticated natural language understanding, it can produce hyper-realistic, high-resolution images with intricate textures, vivid colors, and precise object interactions. Imagen 3 also introduces better handling of complex prompts, including abstract concepts and multi-object scenes, while reducing artifacts and improving coherence. With its powerful capabilities, Imagen 3 is poised to revolutionize creative industries, from advertising and design to gaming and entertainment, by providing artists, developers, and creators with an intuitive tool for visual storytelling and ideation.
  • 10
    HunyuanWorld
    HunyuanWorld-1.0 is an open source AI framework and generative model developed by Tencent Hunyuan that creates immersive, explorable, and interactive 3D worlds from text prompts or image inputs by combining the strengths of 2D and 3D generation techniques into a unified pipeline. At its core, the project features a semantically layered 3D mesh representation that uses 360° panoramic world proxies to decompose and reconstruct scenes with geometric consistency and semantic awareness, enabling the creation of diverse, coherent environments that can be navigated and interacted with. Unlike traditional 3D generation methods that struggle with either limited diversity or inefficient data representations, HunyuanWorld-1.0 integrates panoramic proxy generation, hierarchical 3D reconstruction, and semantic layering to balance high visual quality and structural integrity while enabling exportable meshes compatible with common graphics workflows.
    Starting Price: Free
  • 11
    FLUX.2 [max]

    FLUX.2 [max]

    Black Forest Labs

    FLUX.2 [max] is the flagship image-generation and editing model in the FLUX.2 family from Black Forest Labs that delivers top-tier photorealistic output with professional-grade quality and unmatched consistency across styles, objects, characters, and scenes. It supports grounded generation that can incorporate real-time contextual information, enabling visuals that reflect current trends, environments, and detailed prompt intent while maintaining coherence and structure. It excels at producing marketplace-ready product photos, cinematic visuals, logo and brand assets, and high-fidelity creative imagery with precise control over colors, lighting, composition, and textures, and it preserves identity even through complex edits and multi-reference inputs. FLUX.2 [max] handles detailed features such as character proportions, facial expressions, typography, and spatial reasoning with high stability, making it suitable for iterative creative workflows.
  • 12
    Mudbox

    Mudbox

    Autodesk

    3D digital painting and sculpting software. Create beautiful characters and environments with Mudbox. Sculpt and paint highly detailed 3D geometry and textures. Mudbox® 3D digital sculpting and texture painting software gives you an intuitive, tactile toolset. Create highly detailed 3D characters and environments using an intuitive set of digital tools based on real sculpting techniques. Paint directly on your 3D assets across multiple channels. Add resolution to a mesh only in areas that need it with an artist-friendly, camera-based workflow. Create clean, production-quality meshes from scanned, imported, or sculpted data. Bake normal, displacement, and ambient occlusion maps. Get effective, brush-based workflows for polygons and textures. Bring assets from Maya into Mudbox to add detailed geometry. Send characters from Maya LT to Mudbox for sculpting and texturing. Then transfer your model back to Maya LT. Take your 3D assets and environments from first draft to final frame.
    Starting Price: $7 per month
  • 13
    Veo 3.1

    Veo 3.1

    Google

    Veo 3.1 builds on the capabilities of the previous model to enable longer and more versatile AI-generated videos. With this version, users can create multi-shot clips guided by multiple prompts, generate sequences from three reference images, and use frames in video workflows that transition between a start and end image, both with native, synchronized audio. The scene extension feature allows extension of a final second of a clip by up to a full minute of newly generated visuals and sound. Veo 3.1 supports editing of lighting and shadow parameters to improve realism and scene consistency, and offers advanced object removal that reconstructs backgrounds to remove unwanted items from generated footage. These enhancements make Veo 3.1 sharper in prompt-adherence, more cinematic in presentation, and broader in scale compared to shorter-clip models. Developers can access Veo 3.1 via the Gemini API or through the tool Flow, targeting professional video workflows.
  • 14
    NVIDIA Picasso
    NVIDIA Picasso is a cloud service for building generative AI–powered visual applications. Enterprises, software creators, and service providers can run inference on their models, train NVIDIA Edify foundation models on proprietary data, or start from pre-trained models to generate image, video, and 3D content from text prompts. Picasso service is fully optimized for GPUs and streamlines training, optimization, and inference on NVIDIA DGX Cloud. Organizations and developers can train NVIDIA’s Edify models on their proprietary data or get started with models pre-trained with our premier partners. Expert denoising network to generate photorealistic 4K images. Temporal layers and novel video denoiser generate high-fidelity videos with temporal consistency. A novel optimization framework for generating 3D objects and meshes with high-quality geometry. Cloud service for building and deploying generative AI-powered image, video, and 3D applications.
  • 15
    BodyPaint 3D
    Maxon's BodyPaint 3D is the ultimate tool for creating high-end textures and unique sculptures. Wave good-bye to UV seams, inaccurate texturing and constant back-and-forth switching to your 2D image editor. Say hello to hassle-free texturing that lets you quickly paint highly detailed textures directly on your 3D objects. BodyPaint 3D also offers a comprehensive set of sculpting tools that let you turn a simple object into a detailed work of art. When you use BodyPaint 3D to paint complete materials onto your 3D models, you’ll immediately see how the texture fits with the contour of the model, how the bump or displacement react to lighting, and how the transparency and reflection interact with the environment. There’s no need to waste time transitioning textures between environments, you’ll always see an accurate depiction of the texture so you can concentrate on making it look great.
    Starting Price: $22 per month
  • 16
    Molmo
    Molmo is a family of open, state-of-the-art multimodal AI models developed by the Allen Institute for AI (Ai2). These models are designed to bridge the gap between open and proprietary systems, achieving competitive performance across a wide range of academic benchmarks and human evaluations. Unlike many existing multimodal models that rely heavily on synthetic data from proprietary systems, Molmo is trained entirely on open data, ensuring transparency and reproducibility. A key innovation in Molmo's development is the introduction of PixMo, a novel dataset comprising highly detailed image captions collected from human annotators using speech-based descriptions, as well as 2D pointing data that enables the models to answer questions using both natural language and non-verbal cues. This allows Molmo to interact with its environment in more nuanced ways, such as pointing to objects within images, thereby enhancing its applicability in fields like robotics and augmented reality.
  • 17
    OptiTrack Motive
    Motive + OptiTrack cameras deliver the best-performing real-time human and object tracking available today. Vastly improved skeletal tracking precision. Robust, accurate bone tracking, even during heavy occlusion of markers. “Solver” in human motion tracking terms refers to the programmatic process of estimating the pose (6 DoF) of each bone, deduced by the actually measured markers, at each frame of measurement. A precision solver, like that developed for Motive 3.0, accurately defines the skeleton movement of the tracked subject(s) which yields higher confidence and more nuanced performance capture for character animation. A robust solver will also perform precision marker labeling and skeletal tracking even when many markers are hidden from cameras or lost, providing more reliable tracking data and vastly reduced editing time across all applications. Motive processes OptiTrack camera data to deliver global 3D positions, marker IDs, and rotational data.
    Starting Price: $999 one-time payment
  • 18
    ZenCtrl

    ZenCtrl

    Fotographer AI

    ZenCtrl is an open source AI image generation toolkit developed by Fotographer AI, designed to produce high-quality, multi-view, and diverse-scene outputs from a single image without any training. It enables precise regeneration of objects and subjects from any angle and background, offering real-time element regeneration that provides both stability and flexibility in creative workflows. ZenCtrl allows users to regenerate subjects from any angle, swap backgrounds or clothing with just a click, and start generating results immediately without the need for additional training. By leveraging advanced image processing techniques, it ensures high accuracy without the need for extensive training data. The model's architecture is composed of lightweight sub-models, each fine-tuned on task-specific data to excel at a single job, resulting in a lean system that delivers sharper, more controllable results.
    Starting Price: Free
  • 19
    SeedEdit

    SeedEdit

    ByteDance

    SeedEdit is an advanced AI image-editing model developed by the ByteDance Seed team that enables users to revise an existing image using natural-language text prompts while preserving unedited regions with high fidelity. It accepts an input image plus a text description of the change (such as style conversion, object removal or replacement, background swap, lighting shift, or text change), and produces a seamlessly edited result that maintains structural integrity, resolution, and identity of the original content. The model leverages a diffusion-based architecture trained via a meta-information embedding pipeline and joint loss (combining diffusion and reward losses) to balance image reconstruction and re-generation, resulting in strong editing controllability, detail retention, and prompt adherence. The latest version (SeedEdit 3.0) supports high-resolution edits (up to 4 K), delivers fast inference (under ~10-15 seconds in many cases), and handles multi-round sequential edits.
  • 20
    Gemini 2.5 Flash Image
    Gemini 2.5 Flash Image is Google’s latest state-of-the-art image generation and editing model, now accessible via the Gemini API, Google AI Studio’s build mode, and Vertex AI. It enables powerful creative control by allowing users to blend multiple input images into a single visual, maintain consistent characters or products across edits for rich storytelling, and apply precise, natural-language-based–based transformations, such as removing objects, changing poses, adjusting colors, or altering backgrounds. The model is backed by Gemini’s deep world knowledge, enabling it to understand and reinterpret scenes or diagrams in context, which unlocks dynamic use cases like educational tutors or scene-aware editing assistants. Demonstrated through customizable template apps in AI Studio (including photo editors, multi-image fusers, and interactive tools), the model supports rapid prototyping and remixing via prompts or UI.
  • 21
    Gemini 3 Pro Image
    Gemini Image Pro is a high-capability, multimodal image-generation and editing system that enables users to create, transform, and refine visuals through natural-language prompts or by combining multiple input images, with support for consistent character and object appearance across edits, precise local transformations (such as background blur, object removal, style transfers or pose changes), and native world-knowledge understanding to ensure context-aware outcomes. It supports multi-image fusion, merging several photo inputs into a cohesive new image, and emphasizes design workflow features such as template-based outputs, brand-asset consistency, and repeated character/person-style appearances across scenes. It includes digital watermarking to tag AI-generated imagery and is available through the Gemini API, Google AI Studio, and Vertex AI platforms.
  • 22
    Symage

    Symage

    Symage

    Symage is a synthetic data platform that generates custom, photorealistic image datasets with automated pixel-perfect labeling to support training and improving AI and computer vision models; using physics-based rendering and simulation rather than generative AI, it produces high-fidelity synthetic images that mirror real-world conditions and handle diverse scenarios, lighting, camera angles, object motion, and edge cases with controlled precision, which helps eliminate data bias, reduce manual labeling, and dramatically cut data preparation time by up to 90%. Designed to give teams the right data for model training rather than relying on limited real datasets, Symage lets users tailor environments and variables to match specific use cases, ensuring datasets are balanced, scalable, and accurately labeled at every pixel. It is built on decades of expertise in robotics, AI, machine learning, and simulation, offering a way to overcome data scarcity and boost model accuracy.
  • 23
    Mistral OCR 3

    Mistral OCR 3

    Mistral AI

    Mistral OCR 3 is the third-generation optical character recognition model from Mistral AI designed to achieve a new frontier in accuracy and efficiency for document processing by extracting text, embedded images, and structure from a wide range of documents with exceptional fidelity. It delivers breakthrough performance with a 74% overall win rate over the previous generation on forms, scanned documents, complex tables, and handwriting, outperforming both enterprise document processing solutions and AI-native OCR tools. OCR 3 supports output in clean text, Markdown, or structured JSON with HTML table reconstruction to preserve layout, enabling downstream systems and workflows to understand both content and structure. It powers the Document AI Playground in Mistral AI Studio for drag-and-drop parsing of PDFs and images and integrates via API for developers to automate document extraction workflows.
    Starting Price: $14.99 per month
  • 24
    Ultralytics

    Ultralytics

    Ultralytics

    Ultralytics offers a full-stack vision-AI platform built around its flagship YOLO model suite that enables teams to train, validate, and deploy computer-vision models with minimal friction. The platform allows you to drag and drop datasets, select from pre-built templates or fine-tune custom models, then export to a wide variety of formats for cloud, edge or mobile deployment. With support for tasks including object detection, instance segmentation, image classification, pose estimation and oriented bounding-box detection, Ultralytics’ models deliver high accuracy and efficiency and are optimized for both embedded devices and large-scale inference. The product also includes Ultralytics HUB, a web-based tool where users can upload their images/videos, train models online, preview results (even on a phone), collaborate with team members, and deploy via an inference API.
  • 25
    ActiveCube

    ActiveCube

    Virtalis

    Professional interactive 3D visualization system, designed to transform how organizations build and engage. Wrap your teams in a human-scale virtual space where they effortlessly and naturally interact with the scenario and with each other. Thanks to the high-resolution 3D images that surround the user, the ActiveCube achieves a high level of immersion without the isolation of HMDs. Being able to see the real world helps to reduce nausea often experienced with HMDs. Get stronger insight and appreciation of your data reinforced by natural, real-time tracking and human-scale interaction with virtual and real-world objects. See other users, read body language and use other devices as you would normally for a more comfortable working environment. ActiveCubes can be configured to 2 or more walls with images that surround the user. Virtalis has the expertise necessary to design and deliver such complex systems seamlessly, as attested by satisfied Fortune 500 customers.
  • 26
    Marble

    Marble

    World Labs

    Marble is an experimental AI model internally tested by World Labs, a variant and extension of their Large World Model technology. It is a web service that turns a single 2D image into a navigable spatial environment. Marble offers two generation modes: a smaller, fast model for rough previews that’s quick to iterate on, and a larger, high-fidelity model that takes longer (around ten minutes in the example) but produces a significantly more convincing result. The value proposition is instant, photogrammetry-like image-to-world creation without a full capture rig, turning a single shot into an explorable space for memory capture, mood boards, archviz previews, or creative experiments.
  • 27
    InstructGPT
    InstructGPT is an open-source framework for training language models to generate natural language instructions from visual input. It uses a generative pre-trained transformer (GPT) model and the state-of-the-art object detector, Mask R-CNN, to detect objects in images and generate natural language sentences that describe the image. InstructGPT is designed to be effective across domains such as robotics, gaming and education; it can assist robots in navigating complex tasks with natural language instructions, or help students learn by providing descriptive explanations of processes or events.
    Starting Price: $0.0200 per 1000 tokens
  • 28
    Movmi

    Movmi

    Movmi

    Providing a high qualified tool for human body motion developers, Movmi provides a revolutionary solution for capturing humanoid motion from 2D media Data (Image, Video). Use media shots from any camera, starting from smartphones to professional cameras, through any lifestyle scene. Browses a collection of full-textured characters which are used in every purpose: cartoon, fantasy, and CG projects. Movmi Store Explores Full-Body Character animation of many poses and actions. You can use the animation on Any of Movmi characters. Movmi Store Contains a collection of 3D characters that are free of charge so the Motion Developers have the freedom to use them in their Development. It Explores a library of Full-Body Character animation of many poses and actions.
    Starting Price: Free
  • 29
    Magma

    Magma

    Microsoft

    Magma is a cutting-edge multimodal foundation model developed by Microsoft, designed to understand and act in both digital and physical environments. The model excels at interpreting visual and textual inputs, allowing it to perform tasks such as interacting with user interfaces or manipulating real-world objects. Magma builds on the foundation models paradigm by leveraging diverse datasets to improve its ability to generalize to new tasks and environments. It represents a significant leap toward developing AI agents capable of handling a broad range of general-purpose tasks, bridging the gap between digital and physical actions.
  • 30
    VGSTUDIO

    VGSTUDIO

    Volume Graphics

    VGSTUDIO is the ideal choice for visual quality inspection in industrial applications, e.g., in the electronics industry, but also for the visualization of data in fields of academic research such as archaeology, geology, and life sciences. VGSTUDIO covers the entire workflow, from the precise reconstruction of three-dimensional volume data sets using the images taken by your CT scanner to visualization (in 3D and 2D) and the creation of impressive animations. 3D visualization of even very large CT data sets, with almost no limit on data volume. Real-time ray tracing for a photo-realistic look. Combined visualization of voxel and mesh data, including textured meshes. Arbitrary orientation of 2D slices, 2D slice rotation view around a customizable axis. Gray-value classification of a data set, and a wide variety of 3D clipping options. Unrolling of objects or leveling of freeform surfaces in a 2D view. Combination of consecutive slices into a single 2D view.
  • 31
    Act-Two

    Act-Two

    Runway AI

    Act-Two enables animation of any character by transferring movements, expressions, and speech from a driving performance video onto a static image or reference video of your character. By selecting the Gen‑4 Video model and then the Act‑Two icon in Runway’s web interface, you supply two inputs; a performance video of an actor enacting your desired scene and a character input (either a single image or a video clip), and optionally enable gesture control to map hand and body movements onto character images. Act‑Two automatically adds environmental and camera motion to still images, supports a range of angles, non‑human subjects, and artistic styles, and retains original scene dynamics when using character videos (though with facial rather than full‑body gesture mapping). Users can adjust facial expressiveness on a sliding scale to balance natural motion with character consistency, preview results in real time, and generate high‑resolution clips up to 30 seconds long.
    Starting Price: $12 per month
  • 32
    Frost 3D Universal
    Frost 3D software allows you to develop scientific models of permafrost thermal regimes under the thermal influence of pipelines, production wells, hydraulic constructions, etc., taking into account the thermal stabilization of the ground. The software package is based on ten years experience in the field of programming, computational geometry, numerical methods, 3D visualization, and parallelization of computational algorithms. Creation of 3D computational domain with surface topography and soil lithology; 3D reconstruction of pipelines, boreholes, basements, and foundations of buildings; Import of 3D objects including Wavefront (OBJ), StereoLitho (STL), 3D Studio Max (3DS) and Frost 3D Objects (F3O); Library of thermophysical properties of the ground, building elements, climatic factors and the parameters of cooling units; Specification of thermal and hydrological properties of 3D objects and heat transfer parameters on the surfaces of objects.
  • 33
    FindFace

    FindFace

    NtechLab

    NtechLab platform processes video and recognizes human faces, bodies and actions, as well as cars and plate numbers. AI-powered technology enables record breaking accuracy and high speed of recognition. The multi-object and analytical capabilities of FindFace Multi unlock new scenarios for responding challenges of public sector and business. FindFace Multi quickly and accurately recognizes faces, human bodies, cars, and license plate numbers in a live video stream or in a video archive. Searching for faces, bodies, and vehicles in a database or in an archive is available both by a photo sample and by specific features, for example, by age, clothes color, or vehicle model. NtechLab developers are constantly improving recognition algorithms, increasing their performance and accuracy. With FindFace Multi it takes less than a second to detect a face in a video stream, recognize it, and search for a match in a database with billions of images.
  • 34
    Photo Eraser

    Photo Eraser

    Toscanapps

    Powered by advanced AI technology, Photo Eraser is a photo eraser that not only removes unwanted objects from your pictures but also seamlessly reconstructs the background to give you that perfect shot you've always desired. No more distractions in your photos. With Photo Eraser's cutting-edge erase elements function, you can effortlessly eliminate any unwanted object, person, or background clutter from your images. The app's AI capabilities ensure that the area previously occupied by the removed item is filled in with an accurate and natural-looking background, making the edit invisible. The Photo Eraser feature comes equipped with a range of intuitive tools designed to speed up the editing process, ensuring you achieve professional-quality results in just a few taps. The AI detection feature automatically identifies objects and people that you may want to remove. This intelligent detection capability saves you time and effort.
    Starting Price: Free
  • 35
    Mocha Pro

    Mocha Pro

    Boris FX

    Mocha Pro is the world renowned software for planar tracking, rotoscoping and object removal. Essential to visual effects and post-production workflows, Mocha has been recognized with prestigious Academy and Emmy Awards for contribution to the film and television industry. Mocha Pro has recently been used on global hits including The Mandalorian, Stranger Things, Avengers: Endgame, and many more. The next evolution of Mocha. PowerMesh enables a powerful new sub-planar tracking engine for VFX, roto and stabilization. Warped surface tracking and roto that sticks. Track complex organic surfaces through occlusions and blur using Mocha’s intuitive layer based interface. Simple to use and faster than most optical flow based techniques. Apply to source files for realistic match moves, convert to AE Nulls to drive motion graphics, render a mesh warped stabilize/reverse stabilize plate for compositing, or export dense tracking data to host applications.
    Starting Price: $27.75 per month
  • 36
    Wan2.5

    Wan2.5

    Alibaba

    Wan2.5-Preview introduces a next-generation multimodal architecture designed to redefine visual generation across text, images, audio, and video. Its unified framework enables seamless multimodal inputs and outputs, powering deeper alignment through joint training across all media types. With advanced RLHF tuning, the model delivers superior video realism, expressive motion dynamics, and improved adherence to human preferences. Wan2.5 also excels in synchronized audio-video generation, supporting multi-voice output, sound effects, and cinematic-grade visuals. On the image side, it offers exceptional instruction following, creative design capabilities, and pixel-accurate editing for complex transformations. Together, these features make Wan2.5-Preview a breakthrough platform for high-fidelity content creation and multimodal storytelling.
    Starting Price: Free
  • 37
    Shap-E

    Shap-E

    OpenAI

    This is the official code and model release for Shap-E. Generate 3D objects conditioned on text or images. Sample a 3D model, conditioned on a text prompt, or conditioned on a synthetic view image. To get the best result, you should remove the background from the input image. Load 3D models or a trimesh, and create a batch of multiview renders and a point cloud encode them into a latent and render it back. For this to work, install Blender version 3.3.1 or higher.
    Starting Price: Free
  • 38
    GPT-Image-1
    OpenAI's Image Generation API, powered by the gpt-image-1 model, enables developers and businesses to integrate high-quality, professional-grade image generation directly into their tools and platforms. This model offers versatility, allowing it to create images across diverse styles, faithfully follow custom guidelines, leverage world knowledge, and accurately render text, unlocking countless practical applications across multiple domains. Leading enterprises and startups across industries, including creative tools, ecommerce, education, enterprise software, and gaming, are already using image generation in their products and experiences. It gives creators the choice and flexibility to experiment with different aesthetic styles. Users can generate and edit images from simple prompts, adjusting styles, adding or removing objects, expanding backgrounds, and more.
    Starting Price: $0.19 per image
  • 39
    SAM Audio
    SAM Audio is a next-generation AI model for detailed audio segmentation and editing. It lets users isolate specific sounds from complex audio mixtures using intuitive prompts that mimic how people think about sound. You can type descriptive text (like “remove dog barking” or “keep vocals only”), click on objects in a video to pull their associated audio, or mark specific time spans where target sounds occur — all in one unified system. SAM Audio is available for experimentation and integration through Meta’s Segment Anything Playground platform, where users can upload their own audio or video files and instantly try SAM Audio’s capabilities. It’s also downloadable for use in custom audio and research workflows. Unlike traditional audio tools that focus on single, narrow tasks, SAM Audio supports multiple kinds of prompts and real-world sound environments with high accuracy.
    Starting Price: Free
  • 40
    openMVG

    openMVG

    openMVG

    Extend awareness of the power of 3D reconstruction from images and photogrammetry by developing a C++ framework. Simplify reproducible research with easy-to-read and accurate implementation of state of the art and "classic" algorithms. OpenMVG is designed to be easy to read, learn, modify and use. Thanks to its strict test-driven development and samples, the library allows to build trusted larger systems. OpenMVG provides an end-to-end 3D reconstruction from images framework compounded of libraries, binaries, and pipelines. The libraries provide easy access to features like images manipulation, features description and matching, feature tracking, camera models, multiple-view-geometry, robust-estimation, structure-from-motion algorithms, etc. The binaries solve unit tasks that a pipeline could require scene initialization, feature detection & matching and structure-from-motion reconstruction.
  • 41
    SURE Aerial
    nFrames SURE software delivers an efficient solution of dense image surface reconstruction for mapping, surveying, geo-information and research organizations. The SURE software delivers derivation of precise point clouds, DSMs, True Orthophotos and textured Meshes from small, medium and large frame images. This advanced solution is designed for applications including countrywide mapping, monitoring projects that use manned aircraft and UAVs, cadaster, infrastructure planning, and 3D modeling. SURE Aerial is specifically designed for aerial image datasets captured with large frame nadir cameras, oblique cameras and hybrid systems with additional LiDAR sensors. Without limitation in image resolution, it empowers the production of 3D Meshes, True Orthophotos, Point Clouds and Digital Surface Models on common workstation hardware and in cluster environments. Simple to setup and operate, SURE Aerial is compliant with mapping industry standards and accessible for web streaming technologies.
  • 42
    Z-Image

    Z-Image

    Z-Image

    Z-Image is an open source image generation foundation model family developed by Alibaba’s Tongyi-MAI team that uses a Scalable Single-Stream Diffusion Transformer architecture to generate photorealistic and creative images from text prompts with only 6 billion parameters, making it more efficient than many larger models while still delivering competitive quality and instruction following. It includes multiple variants; Z-Image-Turbo, a distilled version optimized for ultra-fast inference with as few as eight function evaluations and sub-second generation on appropriate GPUs; Z-Image, the full foundation model suited for high-fidelity creative generation and fine-tuning; Z-Image-Omni-Base, a versatile base checkpoint for community-driven development; and Z-Image-Edit, tuned for image-to-image editing tasks with strong instruction adherence.
    Starting Price: Free
  • 43
    Marey

    Marey

    Moonvalley

    Marey is Moonvalley’s foundational AI video model engineered for world-class cinematography, offering filmmakers precision, consistency, and fidelity across every frame. It is the first commercially safe video model, trained exclusively on licensed, high-resolution footage to eliminate legal gray areas and safeguard intellectual property. Designed in collaboration with AI researchers and professional directors, Marey mirrors real production workflows to deliver production-grade output free of visual noise and ready for final delivery. Its creative control suite includes Camera Control, transforming 2D scenes into manipulable 3D environments for cinematic moves; Motion Transfer, applying timing and energy from reference clips to new subjects; Trajectory Control, drawing exact paths for object movement without prompts or rerolls; Keyframing, generating smooth transitions between reference images on a timeline; Reference, defining appearance and interaction of individual elements.
    Starting Price: $14.99 per month
  • 44
    Gemini 3 Deep Think
    The most advanced model from Google DeepMind, Gemini 3, sets a new bar for model intelligence by delivering state-of-the-art reasoning and multimodal understanding across text, image, and video. It surpasses its predecessor on key AI benchmarks and excels at deeper problems such as scientific reasoning, complex coding, spatial logic, and visual-/video-based understanding. The new “Deep Think” mode pushes the boundaries even further, offering enhanced reasoning for very challenging tasks, outperforming Gemini 3 Pro on benchmarks like Humanity’s Last Exam and ARC-AGI. Gemini 3 is now available across Google’s ecosystem, enabling users to learn, build, and plan at new levels of sophistication. With context windows up to one million tokens, more granular media-processing options, and specialized configurations for tool use, the model brings better precision, depth, and flexibility for real-world workflows.
  • 45
    Imagen3D

    Imagen3D

    Imagen3D

    Imagen3D is an AI-powered online tool that instantly converts photos into high-quality 3D models with industry-standard topology, watertight geometry, and realistic PBR texture maps, eliminating the need for manual modeling cleanup and delivering production-ready assets for rendering, animation, 3D printing, AR or VR, and game workflows in minutes. It uses advanced image-to-3D technology to preserve fine surface details from your source images and offers flexible quality options (Fast, Pro, Ultra) so you can balance speed versus detail, generating models often in under three minutes. It supports uploading single images or multiple views for enhanced reconstruction accuracy and outputs to universal formats such as GLB, OBJ, STL, GLTF, USDZ, and MP4 for seamless use in Blender, Unity, Unreal, Maya, web viewers, and more.
    Starting Price: $10 per month
  • 46
    Bifrost

    Bifrost

    Bifrost AI

    Quickly and easily generate diverse and realistic synthetic data and high-fidelity 3D worlds to enhance model performance. Bifrost's platform is the fastest way to generate the high-quality synthetic images that you need to improve ML performance and overcome real-world data limitations. Prototype and test up to 30x faster by circumventing costly and time-consuming real-world data collection and annotation. Generate data to account for rare scenarios underrepresented in real data, resulting in more balanced datasets. Manual annotation and labeling is an error-prone, resource-intensive process. Easily and quickly generate data that is pre-labeled and pixel-perfect. Real-world data can inherit the biases of conditions under which the data was collected, and generate data to solve for these instances.
  • 47
    Qwen2.5-VL

    Qwen2.5-VL

    Alibaba

    Qwen2.5-VL is the latest vision-language model from the Qwen series, representing a significant advancement over its predecessor, Qwen2-VL. This model excels in visual understanding, capable of recognizing a wide array of objects, including text, charts, icons, graphics, and layouts within images. It functions as a visual agent, capable of reasoning and dynamically directing tools, enabling applications such as computer and phone usage. Qwen2.5-VL can comprehend videos exceeding one hour in length and can pinpoint relevant segments within them. Additionally, it accurately localizes objects in images by generating bounding boxes or points and provides stable JSON outputs for coordinates and attributes. The model also supports structured outputs for data like scanned invoices, forms, and tables, benefiting sectors such as finance and commerce. Available in base and instruct versions across 3B, 7B, and 72B sizes, Qwen2.5-VL is accessible through platforms like Hugging Face and ModelScope.
    Starting Price: Free
  • 48
    DEEPMOTION

    DEEPMOTION

    DEEPMOTION

    Say hello to a revolutionary solution for capturing and reconstructing full body motion. Animate 3D lets you turn videos into 3D animations for use in games, augmented/virtual reality, and other applications. Simply upload a video clip, select output formats and job settings, and RUN! It's that simple. Animate 3D lets you create animations from video clips in seconds, drastically reducing development time and costs. And with pioneering features such as Physics Simulation, Foot Locking, Slow Motion handling and now full body motion combined with Face Tracking you have more control and flexibility to create high-fidelity 3D animations. Upload custom FBX or GLB characters, or create new models directly through Animate 3D, and our AI will automatically retarget animations onto your custom characters. Plus with an interactive animation previewer you can verify your 3D animation results immediately before downloading and copying into your solution.
    Starting Price: $12 per month
  • 49
    RecFusion

    RecFusion

    RecFusion

    With RecFusion you can create 3D models of people, pets, furniture and many other objects, even your motorcycle! All you need is a depth-sensor like the Microsoft Kinect or the Asus Xtion. Just move the sensor around the object and you can see the model building up on your screen in real-time and in color. Use the built-in post-processing functions to prepare your models for 3D printing and publish your models on the web to show them to your friends. Download RecFusion now and start creating your own models today! For customers from any domain, ImFusion GmbH is offering custom solutions. Use the 3D-reconstruction as a third-party component in your software. Supports custom measurement and scanning applications. Registration of 3D data is available. Branded versions of the application are supported as well. RecFusion provides you with custom image processing and computer vision solutions.
    Starting Price: €145 one-time payment
  • 50
    VideoPoet
    VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator. It contains a few simple components. An autoregressive language model learns across video, image, audio, and text modalities to autoregressively predict the next video or audio token in the sequence. A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video-to-audio. Furthermore, such tasks can be composed together for additional zero-shot capabilities. This simple recipe shows that language models can synthesize and edit videos with a high degree of temporal consistency.