Showing 224 open source projects for "data quality"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    ...Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Bulk Image Optimizer and Converter

    Bulk Image Optimizer and Converter

    Imagine having all your images well compressed and optimized :)

    Bulk Image Optimizer and Converter (Portable Executable) It allows users to choose the output format (JPEG, PNG, or WebP), set the desired image quality, and remove EXIF data. The optimized images are saved in a separate folder named "optimized" within the input folder. The tool displays progress information, including the number of images processed, the average compression ratio, and the total space saved.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Cinemagoer

    Cinemagoer

    Python package to retrieve and manage data of the IMDb

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies. Platform-independent, it can retrieve data from both the IMDb's web server and a local copy of the whole db.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Data science blogs

    Data science blogs

    A curated list of data science blogs

    Data Science Blogs is a curated repository that aggregates a wide range of high-quality blogs and resources related to data science, machine learning, and analytics into a single organized collection. It serves as a discovery platform for practitioners, researchers, and learners who want to stay updated with industry trends, techniques, and insights without manually searching for reliable sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    SQLBucket

    SQLBucket

    Lightweight library to write, orchestrate and test your SQL ETL

    SQLBucket is a lightweight framework to help write, orchestrate and validate SQL data pipelines. It gives the possibility to set variables and introduces some control flow using the fantastic Jinja2 library. It also implements a very simplistic unit and integration test framework where you can validate the results of your ETL in the form of SQL checks. With SQLBucket, you can apply TDD principles when writing data pipelines. To start working, you need to instantiate your SQLBucket core...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    EnCodec

    EnCodec

    State-of-the-art deep learning based audio codec

    ...Unlike traditional codecs (like MP3 or Opus), Encodec uses a learned quantizer and decoder to reconstruct complex waveforms with remarkable accuracy at bitrates as low as 1.5 kbps. It employs a convolutional encoder–decoder architecture trained with perceptual loss functions that optimize for human auditory quality rather than raw waveform distance. The model can operate in real time and supports variable bandwidths, bitrates, and multi-band audio. Encodec has applications in speech and music compression, generative modeling, and efficient data transmission for communication systems. The repository includes pretrained checkpoints, PyTorch inference code, and examples for integrating Encodec as a module in downstream generative or streaming systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AllenNLP

    AllenNLP

    An open-source NLP research library, built on PyTorch

    AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Grow.dev

    Grow.dev

    A declarative website generator designed for high-quality websites

    Grow.dev is a static site generator optimized for building highly interactive, localized microsites. Grow.dev focuses on providing optimal workflows and developer ergonomics for creating projects that are highly maintainable in the long term. Grow.dev encourages a strong but simple separation of content and presentation and makes maintaining content in different locales and environments a snap.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    WaveRNN

    WaveRNN

    WaveRNN Vocoder + TTS

    WaveRNN is a PyTorch implementation of DeepMind’s WaveRNN vocoder, bundled with a Tacotron-style TTS front end to form a complete text-to-speech stack. As a vocoder, WaveRNN models raw audio with a compact recurrent neural network that can generate high-quality waveforms more efficiently than many traditional autoregressive models. The repository includes scripts and code for preprocessing datasets such as LJSpeech, training Tacotron to produce mel spectrograms, training WaveRNN on those spectrograms (with optional GTA data), and finally generating audio. A quick_start.py script allows users to immediately synthesize example sentences from a pretrained model and inspect both generated audio and attention plots. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    QuickPlot

    QuickPlot

    Simple user interface for gnuplot aimed for reflectometry data

    Graphical user interface for gnuplot to create publication quality figure very quickly. It supports templates for fast formatting of graphics, different plot styles, insets, axis and label options. One important feature is storing metadata in png and pdf files that can be used to reload any graph saved with QuickPlot.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    MAE (Masked Autoencoders)

    MAE (Masked Autoencoders)

    PyTorch implementation of MAE

    MAE (Masked Autoencoders) is a self-supervised learning framework for visual representation learning using masked image modeling. It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision. The encoder processes only the visible patches, while a lightweight decoder reconstructs the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SciDAVis is a user-friendly data analysis and visualization program primarily aimed at high-quality plotting of scientific data. It strives to combine an intuitive, easy-to-use graphical user interface with powerful features such as Python scriptability.
    Leader badge
    Downloads: 1,717 This Week
    Last Update:
    See Project
  • 13
    GANformer

    GANformer

    Generative Adversarial Transformers

    This is an implementation of the GANformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GiantMIDI-Piano

    GiantMIDI-Piano

    Classical piano MIDI dataset

    GiantMIDI-Piano is a large-scale symbolic classical piano music dataset built by applying the piano_transcription system on a vast collection of piano performance recordings. The dataset contains thousands of piano works, spanning a large number of composers and styles, with each piece transcribed into high-precision MIDI files capturing note events, pedal usage, velocities, etc. It provides a resource for music information retrieval (MIR), symbolic music modeling, composer classification,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Easy Upscale

    Easy Upscale

    A simple image upscaler application using EDSR, ESPCN, FSRCNN, etc.

    This application was made to fulfill the assignment for the Data Structures course. The concept of the application is an application to upgrade/enhance image quality. The main theme is queues, we implement circular queues for pooling/storing a list of images to be upscaled. Gui creation is made manually using the tkinter library. For the upscale process itself, it uses the OpenCV library with a model obtained from open source.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Argos Translate

    Argos Translate

    Open-source offline translation library written in Python

    ...This allows for translating between a wide variety of languages at the cost of some loss of translation quality.
    Downloads: 103 This Week
    Last Update:
    See Project
  • 17
    HistogramsApp

    HistogramsApp

    Application that generates KDE-PDP plots from geochronological data

    HistogramsApp is a Python 3.6 application that generates (KDE and PDP) from geochronological data .HistogramsApp allows to interactively setup plot parameters such as the bandwidth and the peak detection sensibility. To cite the application please refer to: 1) https://www.tandfonline.com/doi/abs/10.1080/00206814.2021.1954556?journalCode=tigr20 Rodriguez-Corcho, A. F., Rojas-Agramonte, Y., Barrera-Gonzalez, J. A., Marroquin-Gomez, M. P., Bonilla-Correa, S., Izquierdo-Camacho, D.,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Parakeet

    Parakeet

    PAddle PARAllel text-to-speech toolKIT

    PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle dynamic graph and includes many influential TTS models. In order to facilitate exploiting the existing TTS models directly and developing the new ones, Parakeet selects typical models and provides...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Glazier

    Glazier

    A tool for automating the installation of Windows OS

    ...It streamlines the entire Windows imaging process by booting systems into the Windows Preinstallation Environment (WinPE), retrieving installation instructions from a web server, and automatically applying operating systems, software, and configurations. The tool is fully text-based and code-driven, with configurations written in YAML, allowing teams to leverage source control for versioning, collaboration, and quality assurance. By distributing installation data via HTTPS, Glazier ensures scalability and flexibility, supporting both simple local servers and large-scale cloud-based deployments. Its extensibility makes it easy for administrators to create custom actions using Python or PowerShell, enabling tailored automation for diverse enterprise environments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    XLM (Cross-lingual Language Model)

    XLM (Cross-lingual Language Model)

    PyTorch original implementation of Cross-lingual Language Model

    XLM (Cross-lingual Language Model) is a family of multilingual pretraining methods that align representations across languages to enable strong zero-shot transfer. It popularized objectives like Masked Language Modeling (MLM) across many languages and Translation Language Modeling (TLM) that jointly trains on parallel sentence pairs to tighten cross-lingual alignment. Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    OpenAI Glow

    OpenAI Glow

    Copy code in "Glow: Generative Flow with Invertible 1x1 Convolutions"

    Glow is an open source generative model released by OpenAI that demonstrates flow-based generative modeling techniques. Unlike models that rely on approximate inference, Glow uses invertible transformations to directly learn the data distribution, allowing for exact likelihood computation and efficient sampling. The model is capable of producing high-quality synthetic images while maintaining interpretable latent spaces that enable meaningful manipulation of generated outputs. Glow’s architecture is based on reversible layers and efficient flow operations, which allow large-scale training while keeping memory usage manageable. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Optimus

    Optimus

    Agile Data Preparation Workflows made easy with Pandas

    Easily write code to clean, transform, explore and visualize data using Python. Process using a simple API, making it easy to use for newcomers. More than 100 functions to handle strings, process dates, urls and emails. Easily plot data from any size. Out-of-box functions to explore and fix data quality. Use the same code to process your data in your laptop or in a remote cluster of GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    LiVES

    LiVES

    LiVES is a Video Editing System. It is designed to be simple to use, y

    LiVES mixes realtime video performance and non-linear editing in one professional quality application. It is designed to be simple to use, yet powerful. It is small in size, yet it has many advanced features. Using LiVES, you can start editing and making video right away, without having to worry about formats, frame sizes, or framerates. It is a very flexible tool which is used by both professional VJ's and video editors - mix and switch clips from the keyboard, use dozens of realtime...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25

    Microarray assosiated motif analyzer

    Cis-element prediction tool from microarray data

    We developed a novel clustering-free method, microarray-associated motif analyzer (MAMA), to predict novel cis-acting elements based on weighted sequence similarities and gene expression profiles in microarray analyses. Simulation of gene expression was performed using a support vector machine and based on the presence of predicted motifs and motif pairs. The accuracy of simulated gene expression was used to evaluate the quality of prediction and to optimize the parameters used in this...
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo