Search Results for "talend data quality" - Page 2

Showing 193 open source projects for "talend data quality"

View related business solutions
  • 99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
    99.99% Uptime for MySQL and PostgreSQL on Google Cloud

    Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

    Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
    Try Cloud SQL Free
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    Giskard

    Giskard

    Collaborative & Open-Source Quality Assurance for all AI models

    The testing framework dedicated to ML models, from tabular to LLMs. Giskard is an open-source testing framework dedicated to ML models, from tabular models to LLMs. Testing Machine Learning applications can be tedious. Since ML models depend on data, testing scenarios depend on the domain specificities and are often infinite. At Giskard, we believe that Machine Learning needs its own testing framework. Created by ML engineers for ML engineers, Giskard enables you to scan your model to find...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 3
    Elementary

    Elementary

    Open-source data observability for analytics engineers

    Elementary is an open-source data observability solution for data & analytics engineers. Monitor your dbt project and data in minutes, and be the first to know of data issues. Gain immediate visibility, detect data issues, send actionable alerts, and understand the impact and root cause. Generate a data observability report, host it or share with your team. Monitoring of data quality metrics, freshness, volume and schema changes, including anomaly detection. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. This capability is grounded in a new data engine that automatically annotated over four million unique concepts, producing a massive open-vocabulary segmentation dataset and enabling the model to achieve 75–80% of human performance on the SA-CO benchmark, which itself spans 270K unique concepts.
    Downloads: 80 This Week
    Last Update:
    See Project
  • 6
    Toloka-Kit

    Toloka-Kit

    Toloka-Kit is a Python library for working with Toloka API

    ...There’s no need to validate JSON files and work with them directly. Support of both synchronous and asynchronous (via async/await) executions. Streaming support: build complex pipelines which send and receive data in real-time. For example, you can pass data between two related projects: one for data labeling, and another for its validation. AutoQuality feature which automatically finds the best fitting quality control rules for your project.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    dbt-re-data

    dbt-re-data

    re_data - fix data issues before your users & CEO would discover them

    re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlaying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Data transformations in re_data are implemented and exposed as models & macros in this dbt package. Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Prophet

    Prophet

    Tool for producing high quality forecasts for time series data

    Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. Prophet is used in many applications across Facebook for producing reliable forecasts for planning and goal...
    Downloads: 9 This Week
    Last Update:
    See Project
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 10
    Grafana

    Grafana

    Leading open-source visualization and observability platform

    Grafana OSS is a leading open-source visualization and observability platform that lets you query, visualize, alert on, and explore your data—regardless of where it’s stored. With support for 100+ data source plugins (such as Prometheus, Loki, Elasticsearch, InfluxDB, SQL/NoSQL databases, OTel, and more), you can unify metrics, logs, traces, and other observability signals in one place. Grafana OSS empowers you to build dynamic, reusable dashboards with rich visualizations, template...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 11
    Union Pandera

    Union Pandera

    Light-weight, flexible, expressive statistical data testing library

    ...Validate the functions that produce your data by automatically generating test cases for them. Integrate seamlessly with the Python ecosystem. Overcome the initial hurdle of defining a schema by inferring one from clean data, then refine it over time. Identify the critical points in your data pipeline, and validate data going in and out of them. Build confidence in the quality of your data by defining schemas for complex data objects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Improved Diffusion

    Improved Diffusion

    Release for Improved Denoising Diffusion Probabilistic Models

    improved-diffusion is an open source implementation of diffusion probabilistic models created by OpenAI. These models, also known as score-based generative models, are a class of generative models that have shown strong performance in producing high-quality synthetic data such as images. The repository provides code for training and sampling diffusion models with improved techniques that enhance stability, efficiency, and output fidelity. It includes scripts for setting up training runs, generating samples, and reproducing results from OpenAI’s research on diffusion-based generation. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Awesome-Quant

    Awesome-Quant

    A curated list of insanely awesome libraries, packages and resources

    awesome-quant is a curated list (“awesome list”) of libraries, packages, articles, and resources for quantitative finance (“quants”). It includes tools, frameworks, research papers, blogs, datasets, etc. It aims to help people working in algorithmic trading, quant investing, financial engineering, etc., find useful open source or educational resources. Licensed under typical “awesome” list standards.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    waitress

    waitress

    A WSGI server for Python 3

    ...They are still skipped/not processed, but if they contain invalid data we no longer continue in and return a 400 Bad Request. This stops potential HTTP desync/HTTP request smuggling.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    ...The training strategy for the HY-Motion series includes extensive pre-training on thousands of hours of varied motion data, fine-tuning on curated high-quality datasets, and reinforcement learning with human feedback, which improves both the plausibility and adaptability of generated motion sequences.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    refinery

    refinery

    Open-source choice to scale, assess and maintain natural language data

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    CO3D (Common Objects in 3D)

    CO3D (Common Objects in 3D)

    Tooling for the Common Objects In 3D dataset

    CO3Dv2 (Common Objects in 3D, version 2) is a large-scale 3D computer vision dataset and toolkit from Facebook Research designed for training and evaluating category-level 3D reconstruction methods using real-world data. It builds upon the original CO3Dv1 dataset, expanding both scale and quality—featuring 2× more sequences and 4× more frames, with improved image fidelity, more accurate segmentation masks, and enhanced annotations for object-centric 3D reconstruction. CO3Dv2 enables research in multi-view 3D reconstruction, novel view synthesis, and geometry-aware representation learning. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Scientific Visualization

    Scientific Visualization

    An open access book on scientific visualization using python

    The Scientific Visualization book is a freely available open-access textbook that introduces how to produce effective scientific visualizations using Python, focusing especially on leveraging the popular plotting library Matplotlib (and related tools). It goes beyond simple plotting tutorials and emphasizes design principles: how to choose colors, layout subplots, annotate graphs, and present data in a way that is both accurate and visually compelling. As such, it serves as a guide for researchers, data scientists, and academic authors who need to create publication-quality figures or explanatory graphics, rather than quick exploratory plots. It includes extensive examples that demonstrate best practices — for instance handling multiple subplots, combining line plots with scatter/density overlays, or rendering high-resolution vector graphics for print.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DeepVariant

    DeepVariant

    DeepVariant is an analysis pipeline that uses a deep neural networks

    ...DeepTrio extends DeepVariant's functionality, allowing it to utilize the power of neural networks to predict genomic variants in trios or duos. See this page for more details and instructions on how to run DeepTrio. Out-of-the-box use for PCR-positive samples and low quality sequencing runs, and easy adjustments for different sequencing technologies and non-human species.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Matrix

    Matrix

    Multi-Agent daTa geneRation Infra and eXperimentation framework

    Matrix is a distributed, large-scale engine for multi-agent synthetic data generation and experiments: it provides the infrastructure to run thousands of “agentic” workflows concurrently (e.g. multiple LLMs interacting, reasoning, generating content, data-processing pipelines) by leveraging distributed computing (like Ray + cluster management). The idea is to treat data generation as a “data-to-data” transformation: each input item defines a task, and the runtime orchestrates asynchronous,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    mistletoe

    mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python

    mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    NeMo Curator

    NeMo Curator

    Scalable data pre processing and curation toolkit for LLMs

    NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline expansion and accelerating model convergence through the preparation of high-quality tokens. At the core of the NeMo Curator is the DocumentDataset which serves as the the main dataset class. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Ling-V2

    Ling-V2

    Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

    ...It introduces highly sparse architectures where only a fraction of the model’s parameters are activated per input token, enabling models like Ling-mini-2.0 to achieve reasoning and instruction-following capabilities on par with much larger dense models while remaining significantly more computationally efficient. Trained on more than 20 trillion tokens of high-quality data and enhanced through multi-stage supervised fine-tuning and reinforcement learning, Ling-V2’s models demonstrate strong general reasoning, mathematical problem-solving, coding understanding, and knowledge-intensive task performance.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Claude Code Plugins Directory

    Claude Code Plugins Directory

    Official, Anthropic-managed directory of high quality Claude Plugins

    Claude Code Plugins Directory repository provides a collection of plugins intended to extend Claude’s capabilities by turning the model into a specialized assistant tailored to specific workflows, teams, or organizational needs. These plugins define how Claude should access tools, retrieve data, and execute structured tasks so that outputs become more consistent and production-ready. The project emphasizes customizable automation by allowing developers to encode preferred workflows, domain...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Unstract

    Unstract

    No-code LLM Platform to launch APIs and ETL Pipelines

    Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB