Showing 105 open source projects for "cross"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    LingBot-VLA

    LingBot-VLA

    A Pragmatic VLA Foundation Model

    LingBot-VLA is an open-source Vision-Language-Action (VLA) foundational AI model designed to serve as a general “brain” for real-world robotic manipulation by grounding multimodal perception and language into actionable motions. It has been pretrained on tens of thousands of hours of real robotic interaction data across multiple robot platforms, which enables it to generalize well to diverse morphologies and tasks without needing extensive retraining on each new bot. The model aims to bridge...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Sapiens

    Sapiens

    High-resolution models for human tasks

    ...It integrates sensory inputs such as vision, audio, and proprioception into a unified learning architecture that allows agents to understand and adapt to their surroundings dynamically. The project emphasizes long-horizon reasoning and cross-modal grounding—connecting language, perception, and action into a single agentic model capable of following abstract goals. It includes simulation environments, datasets, and benchmarks for testing grounded understanding, imitation learning, and decision-making. The system’s modular pipeline supports both imitation-based and reinforcement-based training strategies, allowing flexible experimentation with different embodiments and tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). The model is designed to work with a small set of dependencies and to be accessible for developers who want offline TTS with customizable voice style, including options for streaming or non-streaming generation modes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    HunyuanVideo-Avatar

    HunyuanVideo-Avatar

    Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

    HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Continuous Claude v3

    Continuous Claude v3

    Context management for Claude Code. Hooks maintain state via ledgers

    Continuous Claude v3 is a persistent, multi-agent development environment built around the Claude Code CLI that aims to overcome the limitations of standard LLM context windows. Rather than relying on a single session’s context, Continuous Claude uses mechanisms like ledgers, YAML handoffs, and a memory system to preserve and recall state across multiple sessions, ensuring that learned insights and plans are not lost when context compaction occurs. The project orchestrates many specialized...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    OpenaiBot

    OpenaiBot

    Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant

    If you don't have the instant messaging platform you need or you want to develop a new application, you are welcome to contribute to this repository. You can develop a new Controller by using Event.py. Compatibility with multiple LLMs and integration with GPT and third-party systems is handled by our llm-kira project on GitHub. It can accurately limit billing, with limits and ID binding. Supports asynchronous operations and can handle multiple requests simultaneously. Allows for private and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    mlforecast

    mlforecast

    Scalable machine learning for time series forecasting

    mlforecast is a time-series forecasting framework built around machine-learning models, designed to make forecasting both efficient and scalable. It lets you apply any regressor that follows the typical scikit-learn API, for example, gradient-boosted trees or linear models, to time-series data by automating much of the messy feature engineering and data preparation. Instead of writing custom code to build lagged features, rolling statistics, and date-based predictors, mlforecast generates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Rhino

    Rhino

    On-device Speech-to-Intent engine powered by deep learning

    ...Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds. Develop voice features with a few lines of code using intuitive and cross-platform SDKs. Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud. Measure adoption, learn, and iterate. Continuously re-design and re-train to optimize engagement. Building accurate, responsive, and private voice technology is difficult. We learned the hard way, so you don’t have to. Picovoice heavily invests in R&D to offer superior voice AI that surpasses even Big Tech in accuracy and efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Aviary

    Aviary

    Ray Aviary - evaluate multiple LLMs easily

    Aviary is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs. Providing an extensive suite of pre-configured open source LLMs, with defaults that work out of the box. Supporting Transformer models hosted on Hugging Face Hub or present on local disk. Aviary has native support for autoscaling and multi-node deployments thanks to Ray and Ray Serve. Aviary can scale to zero and create new model replicas (each composed of multiple GPU workers) in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MMDeploy

    MMDeploy

    OpenMMLab Model Deployment Framework

    ...All kinds of modules in the SDK can be extended, such as Transform for image processing, Net for Neural Network inference, Module for postprocessing and so on. Install and build your target backend. ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN frameworks. Please read getting_started for the basic usage of MMDeploy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Deface GUI -  Face Anonymization Tool

    Deface GUI - Face Anonymization Tool

    Graphical User Interface Face Anonymization Tool

    This application is a professional tool with a graphical user interface that enables anonymization of faces using the Deface Engine. Cross-Platform Compatible (Linux-Windows) NOTE: To use on Windows, first install Python. Then, if necessary, install “pip install deface” (only if necessary).
    Downloads: 12 This Week
    Last Update:
    See Project
  • 16
    CLIP-as-service

    CLIP-as-service

    Embed images and sentences into fixed-length vectors

    ...Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression. Smooth integration with neural search ecosystem including Jina and DocArray. Build cross-modal and multi-modal solutions in no time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Shinkai: Local AI Agents

    Shinkai: Local AI Agents

    Shinkai allows you to create advanced AI (local) agents effortlessly

    ...Key Features: - No-Code Agent Creation - Build specialized agents (trading bots, sentiment trackers, etc.) with simple descriptions - Multi-Agent Collaboration - Agents work together to solve complex problems - Crypto Integration - Built-in support for decentralized payments and transactions - Flexible AI Models - Choose from cloud models (GPT-4, Claude) or run locally - Universal Compatibility - Works with Model Context Protocol (MCP) for cross-platform integration - Local Security - Crypto keys and computations stay on your device Shinkai transforms AI from single-task tools into collaborative, autonomous systems that can operate in decentralized networks while maintaining privacy and security.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    CodinIT.dev

    CodinIT.dev

    Free, local, open-source AI app builder

    CodinIT.dev is a free, local, open source AI app builder that lets you go from idea to full-stack application entirely on your machine, no coding required, just chat with AI. You can build unlimited apps with real-time previews, instant undo, and responsive, frictionless workflows. Deep Supabase integration means you can create UI and backend logic in one cohesive environment, while the model-agnostic architecture lets you connect to any AI, whether cloud-based (Gemini 3 Pro, GPT-5,...
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Astronomical-Image-Refiner

    Astronomical-Image-Refiner

    This repository contains the complete code and data for studying primo

    ... 🚀 Overview This repository provides: Physics Validation Tests that compute predicted P–D entanglement observables. Expected Numerical Results for cross‑checking model predictions. Automated pipelines for running analyses on clusters (e.g., Abell 1689) and other astrophysical targets. Jupyter notebooks for full end‑to‑end scientific workflows. Data ingestion tools for MAST/HST/JWST FITS downloads. 📂 Repository Structure Primordial-Photon-Dark-Photon-Entanglement/ ├── Physics_Validation_Tests.py ├── Expected_Numerical_Results.py ├── utils/ │ ├── preprocessing.py │ ├── fits_tools.py │ └── visualization.py ├── notebooks/ │ ├── Photon_DarkPhoton_Cluster_
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    PC_Workman_HCK

    PC_Workman_HCK

    AI-powered PC monitoring that explains. Not shows numbers/spikes.

    ...Features: - Time travel monitoring - debug issues from hours ago - AI diagnostics with HCK_GPT - Custom fan curves with profiles - Floating always-on-top widget - 2D system map - Cross-GPU support (NVIDIA/AMD/Intel) Four complete rebuilds. 29 features killed. 24,000 lines of optimized code. No team. Solo Dev. BUILD-IN-PUBLIC Free because good tools should be. Alpha v1.6.3—real tools built on real constraints.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Zylthra

    Zylthra

    Zylthra: A PyQt6 app to generate synthetic datasets with DataLLM.

    Welcome to Zylthra, a powerful Python-based desktop application built with PyQt6, designed to generate synthetic datasets using the DataLLM API from data.mostly.ai. This tool allows users to create custom datasets by defining columns, configuring generation parameters, and saving setups for reuse, all within a sleek, dark-themed interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    VALL-E X

    VALL-E X

    Open source implementation of Microsoft's VALL-E X zero-shot TTS model

    VALL-E-X is an open-source implementation of Microsoft’s VALL-E X zero-shot text-to-speech model, focused on multilingual, cross-lingual voice cloning. It is capable of synthesizing speech in English, Chinese, and Japanese from text while mimicking the voice characteristics of a speaker given only a short 3–10 second prompt. The model attempts to match not just timbre, but also tone, pitch, emotion, and prosody of the reference audio, resulting in highly personalized output.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    CoTracker

    CoTracker

    CoTracker is a model for tracking any point (pixel) on a video

    CoTracker is a learning-based point tracking system that jointly follows many user-specified points across a video, rather than tracking each point independently. By reasoning about all tracks together, it can maintain temporal consistency, handle mutual occlusions, and reduce identity swaps when trajectories cross. The model takes sparse point queries on one frame and predicts their sub-pixel locations and a visibility score for every subsequent frame, producing long, coherent trajectories. Its transformer-style architecture aggregates information both along time and across points, allowing it to recover tracks even after brief disappearances. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB