Showing 22 open source projects for "video audio frontend"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Never Get Blocked Again | Enterprise Web Scraping Icon
    Never Get Blocked Again | Enterprise Web Scraping

    Enterprise-Grade Proxies • Built-in IP Rotation • 195 Countries • 20K+ Companies Trust Us

    Get unrestricted access to public web data with our ethically-sourced proxy network. Automated session management and advanced unblocking handle the hard parts. Scale from 1 to 1M requests with zero blocks. Built for developers with ready-to-use APIs, serverless functions, and complete documentation. Used by 20,000+ companies including Fortune 500s. SOC2 and GDPR compliant.
    Get Started
  • 1
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ... be used to prepare raw data or improve existing training data to get more accurate ML models. The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. Multi-user labeling sign up and login, when you create an annotation it's tied to your account. Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    .... InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products. This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 3
    StoryTeller

    StoryTeller

    Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

    A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS). Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    PaddleSpeech

    PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model

    PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with state-of-art and influential models. Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide high...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
    Get started free
  • 5
    NÜWA - Pytorch

    NÜWA - Pytorch

    Implementation of NÜWA, attention network for text to video synthesis

    Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Amiga Memories

    Amiga Memories

    A walk along memory lane

    Amiga Memories is a project (started & released in 2013) that aims to make video programmes that can be published on the internet. The images and sound produced by Amiga Memories are 100% automatically generated. The generator itself is implemented in Squirrel, the 3D rendering is done on GameStart 3D. An Amiga Memories video is mostly based on a narrative. The purpose of the script is to define the spoken and written content. The spoken text will be read by a voice synthesizer (Text To Speech...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    ChatGPT Telegram Bot

    ChatGPT Telegram Bot

    A Telegram bot that integrates with OpenAI's official ChatGPT APIs

    A Telegram bot that integrates with OpenAI's official ChatGPT, DALL·E and Whisper APIs to provide answers. Ready to use with minimal configuration required.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    UForm

    UForm

    Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

    UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained features...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Save hundreds of developer hours with components built for SaaS applications. Icon
    Save hundreds of developer hours with components built for SaaS applications.

    The #1 Embedded Analytics Solution for SaaS Teams.

    Whether you want full self-service analytics or simpler multi-tenant security, Qrvey’s embeddable components and scalable data management remove the guess work.
    Try Developer Playground
  • 10
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    txtai

    txtai

    Build AI-powered semantic search applications

    ..., models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Deep Lake

    Deep Lake

    Data Lake for Deep Learning. Build, manage, and query datasets

    Deep Lake (formerly known as Activeloop Hub) is a data lake for deep learning applications. Our open-source dataset format is optimized for rapid streaming and querying of data while training models at scale, and it includes a simple API for creating, storing, and collaborating on AI datasets of any size. It can be deployed locally or in the cloud, and it enables you to store all of your data in one place, ranging from simple annotations to large videos. Deep Lake is used by Google, Waymo,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Triton Inference Server

    Triton Inference Server

    The Triton Inference Server provides an optimized cloud

    ... real-time, batched, ensembles, and audio/video streaming. Provides Backend API that allows adding custom backends and pre/post-processing operations. Model pipelines using Ensembling or Business Logic Scripting (BLS). HTTP/REST and GRPC inference protocols based on the community-developed KServe protocol. A C API and Java API allow Triton to link directly into your application for edge and other in-process use cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    AI Atelier

    AI Atelier

    Based on the Disco Diffusion, version of the AI art creation software

    ... over a network, the complete source code of the modified version must be made available. Create 2D and 3D animations and not only still frames (from Disco Diffusion v5 and VQGAN Animations). Input audio and images for generation instead of just text. Simplify tool setup process on colab, and enable ‘one-click’ sharing of the generated link to other users. Experiment with the possibilities for multi-user access to the same link.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16

    SoundTranscriber

    SoundTranscriber can be used to generate automatic transcription / aut

    SoundTranscriber can be used to generate automatic transcription / aut
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    NWT - Pytorch (wip)

    NWT - Pytorch (wip)

    Implementation of NWT, audio-to-video generation, in Pytorch

    Implementation of NWT, audio-to-video generation, in Pytorch. The paper proposes a new discrete latent representation named Memcodes, which can be succinctly described as a type of multi-head hard-attention to learned memory (codebook) key/values. They claim the need for less codes and smaller codebook dimensions in order to achieve better reconstructions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Linux-Intelligent-Ocr-Solution

    Linux-Intelligent-Ocr-Solution

    Easy-OCR solution and Tesseract trainer for GNU/Linux

    Linux-intelligent-ocr-solution Lios is a free and open source software for converting print in to text using either scanner or a camera, It can also produce text out of scanned images from other sources such as Pdf, Image, Folder containing Images or screenshot. Program is given total accessibility for visually impaired. A Tesseract Trainer GUI is also shipped with this package. Forum : https://groups.google.com/forum/#!forum/lios Video Tutorial : https://www.youtube.com/playlist?list...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19

    FastoCloud PRO

    IPTV/NVR/CCTV/Video cloud https://fastocloud.com

    IPTV/Video cloud Features: Cross-platform (Linux, MacOSX, FreeBSD, Raspbian/Armbian) GPU/CPU Encode/Decode/Post Processing Stream statistics CCTV Adaptive hls streams Load balancing Temporary urls HLS push EPG scanning Subtitles to text conversions AD insertion Logo overlay Video effects Relays Timeshifts Catchups Playlists Restream/Transcode from online streaming services like Youtube, Twitch...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Deepvoice3_pytorch

    Deepvoice3_pytorch

    PyTorch implementation of convolutional neural networks

    An open source implementation of Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Lip Reading

    Lip Reading

    Cross Audio-Visual Recognition using 3D Architectures

    The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    JAVT - Just Another Voice Transformer

    JAVT - Just Another Voice Transformer

    Just Another Speech Recognition and Text to Speech software.

    JAVT or Just Another Voice Transformer (formerly, it is called Just Another Video Transcriber) is a Speech Recognition software that also support text to Speech and simple media conversion. JAVT allows you to convert from video files to audio wav file using ffmpeg, and then transcribe the audio file to text using either Microsoft SAPI or CMU Sphinx. You can also open a text file and allow JAVT to read it out for you through text to speech conversion.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next