11 projects for "image processing" with 2 filters applied:

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Automate contact and company data extraction Icon
    Automate contact and company data extraction

    Build lead generation pipelines that pull emails, phone numbers, and company details from directories, maps, social platforms. Full API access.

    Generate leads at scale without building or maintaining scrapers. Use 10,000+ ready-made tools that handle authentication, pagination, and anti-bot protection. Pull data from business directories, social profiles, and public sources, then export to your CRM or database via API. Schedule recurring extractions, enrich existing datasets, and integrate with your workflows.
    Explore Apify Store
  • 1
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    ...It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or post-processing modules (e.g. mesh smoothing, texturing) to make the outputs more output-ready. Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Total Network Visibility for Network Engineers and IT Managers Icon
    Total Network Visibility for Network Engineers and IT Managers

    Network monitoring and troubleshooting is hard. TotalView makes it easy.

    This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
    Learn More
  • 5
    Step3-VL-10B

    Step3-VL-10B

    Multimodal model achieving SOTA performance

    ...It achieves this efficiency and strong performance through unified pre-training on a massive 1.2 trillion-token multimodal corpus that jointly optimizes a language-aligned perception encoder with a powerful decoder, creating deep synergy between image processing and text understanding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MediaPipe Face Detection

    MediaPipe Face Detection

    Detect faces in an image

    The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    ...It supports dozens of languages, making it practical for multilingual, global, or distributed environments. With a large 256k token context window, it can handle long documents, extended inputs, or multi-step processing workflows even at its small size.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Turn more customers into advocates. Icon
    Turn more customers into advocates.

    Fight skyrocketing paid media costs by turning your customers into a primary vehicle for acquisition, awareness, and activation with Extole.

    The platform's advanced capabilities ensure companies get the most out of their referral programs. Leverage custom events, profiles, and attributes to enable dynamic, audience-specific referral experiences. Use first-party data to tailor customer segment messaging, rewards, and engagement strategies. Use our flexible APIs to build management capabilities and consumer experiences–headlessly or hybrid. We have all the tools you need to build scalable, secure, and high-performing referral programs.
    Learn More
  • 10
    Ministral 3 8B Instruct 2512

    Ministral 3 8B Instruct 2512

    Compact 8B multimodal instruct model optimized for edge deployment

    Ministral 3 8B Instruct 2512 is a balanced, efficient model in the Ministral 3 family, offering strong multimodal capabilities within a compact footprint. It combines an 8.4B-parameter language model with a 0.4B vision encoder, enabling both text reasoning and image understanding. This FP8 instruct-fine-tuned variant is optimized for chat, instruction following, and structured outputs, making it ideal for daily assistant tasks and lightweight agentic workflows. Designed for edge deployment,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ministral 3 14B Instruct 2512

    Ministral 3 14B Instruct 2512

    Efficient 14B multimodal instruct model with edge deployment and FP8

    Ministral 3 14B Instruct 2512 is the largest model in the Ministral 3 family, delivering frontier performance comparable to much larger systems while remaining optimized for edge-level deployment. It combines a 13.5B-parameter language model with a 0.4B-parameter vision encoder, enabling strong multimodal understanding in both text and image tasks. This FP8 instruct-tuned variant is designed specifically for chat, instruction following, and agentic workflows with robust system-prompt...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next