Showing 105 open source projects for "cross"

View related business solutions
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 1
    Ultimate Vocal Remover (UVR5)

    Ultimate Vocal Remover (UVR5)

    GUI for a Vocal Remover that uses Deep Neural Networks

    This application uses state-of-the-art source separation models to remove vocals from audio files. UVR's core developers trained all of the models provided in this package (except for the Demucs v3 and v4 4-stem models).
    Downloads: 506 This Week
    Last Update:
    See Project
  • 2
    GPT-SoVITS

    GPT-SoVITS

    1 min voice data can also be used to train a good TTS model

    GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 3
    OpenVoice

    OpenVoice

    Instant voice cloning by MIT and MyShell. Audio foundation model

    ...It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak naturally in others. Architecturally, OpenVoice separates “tone color” cloning from style control, which makes it easier to keep a consistent identity while flexibly changing prosody or language. The project provides open-weight models, inference code, and examples, making it suitable both for research and for building production voice experiences. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 4
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. The framework targets both interactive graphical applications and media-rich experiences, making it a solid foundation for games, creative tools, or visualization systems that demand both performance and flexibility. ...
    Downloads: 34 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Hunyuan3D-2.1

    Hunyuan3D-2.1

    From Images to High-Fidelity 3D Assets

    ...It supports both shape generation (mesh geometry) and texture generation modules. Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. Cross-platform support (MacOS, Windows, Linux) via Python / PyTorch, including diffusers-style APIs.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 6
    GPT4All

    GPT4All

    Run Local LLMs on Any Device. Open-source

    GPT4All is an open-source project that allows users to run large language models (LLMs) locally on their desktops or laptops, eliminating the need for API calls or GPUs. The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This...
    Downloads: 197 This Week
    Last Update:
    See Project
  • 7
    Cua

    Cua

    Open-source infrastructure for Computer-Use Agents. Sandboxes

    ...It introduces a declarative syntax for specifying build scripts, automation pipelines, environment setups, and project-specific commands so contributors don’t need to memorize disparate scripts or tooling across languages and ecosystems. Cua can also manage task dependencies, handle cross-platform invocations, and simplify complex workflows into simple aliases or compound commands that are easy to share in teams. By centralizing shared commands in a structured, documented config, it helps reduce errors, accelerates onboarding of new contributors, and keeps task definitions versioned with the codebase. The CLI is typically lightweight, easy to install, and designed to integrate with existing toolchains and shells without friction.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general...
    Downloads: 46 This Week
    Last Update:
    See Project
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • 10
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    ...It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like chatbots, podcasts, and applications where dynamic turn-taking between speakers is essential. FireRedTTS2 supports multilingual output and speaker flexibility, enabling scenarios that involve language switching, cross-lingual voice cloning, and expressive dialogue generation that maintains consistency over longer utterances.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    ...Fish Speech emphasizes expressive and controllable voices: it supports a long list of emotion tags, tone markers, and special audio effect markers that can be embedded in the text to drive prosody and vocal style, from basic emotions to nuanced states like sarcastic, conciliative, or hysterical. The system is multilingual and cross-lingual, handling multiple languages in a single input without explicit phoneme markup, and is trained on large-scale datasets.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    ...The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. Spark-TTS allows users to control speech characteristics like gender, pitch, and speaking rate to customize synthesized output and support virtual speaker creation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    ...It seeks to combine performant media playback or handling (audio/video decoding, streaming, buffering) with a modular, developer-friendly API that allows easy embedding into larger applications or services. Because multimedia delivery requirements vary widely (adaptive streaming, live feeds, cross-platform compatibility, custom UI, performance constraints), Dolphin aims to offer a foundation that developers can build upon or adapt to their needs. It is designed to integrate with other tools and libraries and provide stable playback or media-processing pipelines, while remaining open-source so that users can inspect, extend, and adapt it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CodeGeeX

    CodeGeeX

    CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

    CodeGeeX is a large-scale multilingual code generation model with 13 billion parameters, trained on 850B tokens across more than 20 programming languages. Developed with MindSpore and later made PyTorch-compatible, it is capable of multilingual code generation, cross-lingual code translation, code completion, summarization, and explanation. It has been benchmarked on HumanEval-X, a multilingual program synthesis benchmark introduced alongside the model, and achieves state-of-the-art performance compared to other open models like InCoder and CodeGen. CodeGeeX also powers IDE plugins for VS Code and JetBrains, offering features like code completion, translation, debugging, and annotation. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    ...Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. It emphasizes bilingual usability, making it well-suited for cross-lingual multimodal applications. The model also supports fine-tuning and downstream customization, extending its applicability to creative content generation, human–computer interaction, and research on vision-language alignment.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    ...While the internal architecture details are not fully documented publicly, the repo suggests that VL2 introduces enhancements over prior vision-language models (e.g. better scaling, cross-modal attention, more robust alignment) to improve grounding and multimodal understanding.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    LMCache

    LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    ...Its design supports reuse beyond strict prefix matching and enables sharing across serving instances, improving efficiency under real multi-tenant traffic. The broader project includes examples, tests, a server component, and public posts describing cross-engine sharing and inter-GPU KV transfers. These capabilities aim to lower latency, cut GPU cycles, and stabilize performance for production workloads with overlapping prompts or retrieval-augmented contexts. The end result is a cache fabric for LLMs that complements engines rather than replacing them.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    HeartMuLa is the open-source library and reference implementation for the HeartMuLa family of music foundation models, designed to support both music generation and music-related understanding tasks in a cohesive stack. At the center is HeartMuLa, a music language model that generates music conditioned on inputs like lyrics and tags, with multilingual support that broadens the range of lyric-driven use cases. The project also includes HeartCodec, a music codec optimized for high...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 19
    Hugging Face Skills

    Hugging Face Skills

    Definitions for AI/ML tasks like dataset creation

    ...Each skill is a self-contained folder with structured metadata and guidance that tells an agent how to execute tasks such as dataset creation, model training, evaluation, or Hub operations. The project is designed to be interoperable across major agent ecosystems, including Claude Code, OpenAI Codex, Gemini CLI, and Cursor, making it a cross-platform building block for agent automation. By formalizing best practices and workflows, Skills helps transform general-purpose coding agents into domain-aware assistants that can execute complex ML pipelines with less manual prompting. The repository also includes ready-to-use skills for common Hugging Face operations and encourages teams to extend them with custom domain logic.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    CodeGeeX2

    CodeGeeX2

    CodeGeeX2: A More Powerful Multilingual Code Generation Model

    ...Its backend powers the CodeGeeX IDE plugins for VS Code, JetBrains, and other editors, offering developers interactive AI assistance with features like infilling and cross-file completion.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    NoneBot

    NoneBot

    Asynchronous multi-platform robot framework written in Python

    ...Asynchronous priority development to improve operational efficiency. Simple and clear dependency injection system, built-in dependency functions reduce user code. NoneBot2 is a modern, cross-platform, and extensible Python chatbot framework. It is based on Python's type annotations and asynchronous features, and can provide convenient and flexible support for your needs. NoneBot2 is written based on Python asyncio , and has a certain degree of synchronous function compatibility based on the asynchronous mechanism. NoneBot2 provides an easy-to-use, interactive command-line tool -- nb-cli, making it easier to get started with NoneBot2 for the first time. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    SentenceTransformers

    SentenceTransformers

    Multilingual sentence & image embeddings with BERT

    SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 23
    runprompt

    runprompt

    Run LLM prompts from your shell

    ...The project emphasizes extensibility, letting users define custom actions, integrate with existing shell environments, and even leverage fuzzy matching or contextual prompts to narrow down options as you type. Designed to be cross-platform, RunPrompt works with standard shells on Windows, macOS, and Linux while honoring the user’s preferred environment and configurations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Liger Kernel

    Liger Kernel

    Efficient Triton Kernels for LLM Training

    Liger Kernel is a unified kernel developed by LinkedIn to streamline data science and machine learning workflows across different languages and tools. It provides a consistent interface for running code in various languages (such as Python, R, SQL) within a single Jupyter-like environment, enhancing productivity and collaboration for data scientists working in mixed-language projects.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    ImageBind

    ImageBind

    ImageBind One Embedding Space to Bind Them All

    ...The model is trained using large-scale contrastive learning, leveraging diverse datasets from natural images, videos, audio clips, and sensor data. Once trained, it can perform cross-modal retrieval, zero-shot classification, and multimodal composition without additional fine-tuning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB