Showing 15 open source projects for "fusion"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    UForm

    UForm

    Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

    ...Early-fusion models encode both modalities jointly so they can take into account fine-grained features. Usually, these models are used for re-ranking relatively small retrieval results. Mid-fusion models are the golden midpoint between the previous two types. Mid-fusion models consist of two parts – unimodal and multimodal.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ViMax

    ViMax

    Director, Screenwriter, Producer, and Video Generator All-in-One

    ...The system aims to bridge foundational vision backbones and generative language models through adapters and fusion layers that maximize both signal integration and reasoning depth, and includes utility pipelines for training, evaluation, and deployment.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    ...It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity reinforcement and modality-specific condition injection. Text-image fusion module based on LLaVA for improved multimodal understanding. Applicable to single- and multi-subject scenarios, video editing/replacement, singing avatars etc.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    MESHROOM

    MESHROOM

    3D reconstruction software

    ...The dense modeling of the scene is the result yielded by chaining two computer vision-based pipelines, “Structure-from-Motion” (SfM) and “Multi View Stereo” (MVS). Fusion of Multi-bracketing LDR images into HDR. Alignment of panorama images. Support for fisheye optics. Automatically estimate fisheye circle or manually edit it. Take advantage of motorized-head file. Easy to integrate in your Renderfarm System. Add specific rules to select the most suitable machines regarding CPU, RAM, GPU requirements of each Node.
    Downloads: 102 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    Advanced RAG Techniques

    Advanced RAG Techniques

    Advanced techniques for RAG systems

    ...It includes hands-on Jupyter notebooks and runnable scripts that show how to implement ideas like optimizing chunk sizes, proposition chunking, HyDE/HyPE query transformations, fusion retrieval, reranking, and ensemble retrieval. There is also an evaluation section that demonstrates how to measure RAG performance and compare different configurations in a systematic way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Claw Compactor

    Claw Compactor

    14-stage Fusion Pipeline for LLM token compression

    Claw Compactor is a utility designed to optimize and manage the context limitations inherent in AI agent systems, particularly those built on OpenClaw-like architectures. It addresses the challenge of finite context windows in language models by compressing or summarizing historical interactions while preserving essential information. The system works by transforming older conversation data into condensed representations that maintain continuity without exceeding token limits. This approach...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Grounded-Segment-Anything

    Grounded-Segment-Anything

    Marrying Grounding DINO with Segment Anything & Stable Diffusion

    ...The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for objects once they are localized. This fusion lets users provide arbitrary text descriptions (e.g., “a cat, a bicycle, or a coffee mug”), have the detection model find relevant bounding boxes, and then use SAM to generate precise segmentation masks that isolate each object in the scene.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    VMZ (Video Model Zoo)

    VMZ (Video Model Zoo)

    VMZ: Model Zoo for Video Modeling

    The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Omnilingual ASR

    Omnilingual ASR

    Omnilingual ASR Open-Source Multilingual SpeechRecognition

    Omnilingual-ASR is a research codebase exploring automatic speech recognition that generalizes across a very large number of languages using shared modeling and training recipes. It focuses on leveraging self-supervised audio pretraining and scalable fine-tuning so low-resource languages can benefit from high-resource data. The project provides data preparation pipelines, training scripts, decoding utilities, and evaluation tools so researchers can reproduce results and extend to new...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    DLRM

    DLRM

    An implementation of a deep learning recommendation model (DLRM)

    ...It includes data loaders for standard benchmarks (like Criteo), training scripts, evaluation tools, and capabilities like mixed precision, gradient compression, and memory fusion to maximize throughput.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Make-A-Video - Pytorch (wip)

    Make-A-Video - Pytorch (wip)

    Implementation of Make-A-Video, new SOTA text to video generator

    Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch. They combine pseudo-3d convolutions (axial convolutions) and temporal attention and show much better temporal fusion. The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks". The gist of the paper comes down to, take a SOTA text-to-image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for attention across time and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    NanoDet-Plus

    NanoDet-Plus

    Lightweight anchor-free object detection model

    ...In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training. We also introduce a light feature pyramid called Ghost-PAN to enhance multi-layer feature fusion. These improvements boost previous NanoDet's detection accuracy by 7 mAP on COCO dataset. NanoDet provide multi-backend C++ demo including ncnn, OpenVINO and MNN. There is also an Android demo based on ncnn library. Supports various backends including ncnn, MNN and OpenVINO. Also provide Android demo based on ncnn inference framework.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 13
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    Mask2Former is a unified segmentation architecture that handles semantic, instance, and panoptic segmentation with one model and one training recipe. Its core idea is to cast segmentation as mask classification: a transformer decoder predicts a set of mask queries, each with an associated class score, eliminating the need for task-specific heads. A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    U-Net Fusion RFI

    U-Net Fusion RFI

    U-Net for RFI Detection based on @jakeret's implementation

    ...This project will use the aoflagger program within the code, so you may need to ensure that any environment variables are set for aoflagger before use. cite: https://sourceforge.net/p/u-net-fusion-rfi/wiki/cite/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ViAmI-Server

    ViAmI-Server

    Pattern recognition for ADL events

    This software uses computer vision algorithms for mining sequence data from telemonitoring data with CBRs. We propose an approach which treats the detection of changes in behavior detected with a sensor/video fusion, which occur at radically different time-scales, through a CBR in two levels: low and high level. The system is always updating the database with the daily data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo