Showing 35 open source projects for "talend data quality"

View related business solutions
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 1
    Synthetic Data Kit

    Synthetic Data Kit

    Tool for generating high quality Synthetic datasets

    Synthetic Data Kit is a CLI-centric toolkit for generating high-quality synthetic datasets to fine-tune Llama models, with an emphasis on producing reasoning traces and QA pairs that line up with modern instruction-tuning formats. It ships an opinionated, modular workflow that covers ingesting heterogeneous sources (documents, transcripts), prompting models to create labeled examples, and exporting to fine-tuning schemas with minimal glue code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Synthetic Data Vault (SDV)

    Synthetic Data Vault (SDV)

    Synthetic Data Generation for tabular, relational and time series data

    The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models. Additionally, it enables the testing of Machine Learning or other data dependent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Dagster

    Dagster

    An orchestration platform for the development, production

    Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust multi-tenant, multi-tool engine that scales technically and organizationally. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs up to 54% with BigQuery Icon
    Cut Data Warehouse Costs up to 54% with BigQuery

    Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

    BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
    Try BigQuery Free
  • 5
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    ...You also customize the process to include your own work. Select any of the publicly available datasets from the SDV project, or input your own data. Choose from any of the SDV synthesizers and baselines. Or write your own custom machine learning model. In addition to performance and memory usage, you can also measure synthetic data quality and privacy through a variety of metrics. Install SDGym using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 7
    Union Pandera

    Union Pandera

    Light-weight, flexible, expressive statistical data testing library

    ...Validate the functions that produce your data by automatically generating test cases for them. Integrate seamlessly with the Python ecosystem. Overcome the initial hurdle of defining a schema by inferring one from clean data, then refine it over time. Identify the critical points in your data pipeline, and validate data going in and out of them. Build confidence in the quality of your data by defining schemas for complex data objects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    CO3D (Common Objects in 3D)

    CO3D (Common Objects in 3D)

    Tooling for the Common Objects In 3D dataset

    CO3Dv2 (Common Objects in 3D, version 2) is a large-scale 3D computer vision dataset and toolkit from Facebook Research designed for training and evaluating category-level 3D reconstruction methods using real-world data. It builds upon the original CO3Dv1 dataset, expanding both scale and quality—featuring 2× more sequences and 4× more frames, with improved image fidelity, more accurate segmentation masks, and enhanced annotations for object-centric 3D reconstruction. CO3Dv2 enables research in multi-view 3D reconstruction, novel view synthesis, and geometry-aware representation learning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Claude Code Plugins Directory

    Claude Code Plugins Directory

    Official, Anthropic-managed directory of high quality Claude Plugins

    Claude Code Plugins Directory repository provides a collection of plugins intended to extend Claude’s capabilities by turning the model into a specialized assistant tailored to specific workflows, teams, or organizational needs. These plugins define how Claude should access tools, retrieve data, and execute structured tasks so that outputs become more consistent and production-ready. The project emphasizes customizable automation by allowing developers to encode preferred workflows, domain...
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Uncertainty Baselines

    Uncertainty Baselines

    High-quality implementations of standard and SOTA methods

    Uncertainty Baselines is a collection of strong, well-documented training pipelines that make it straightforward to evaluate predictive uncertainty in modern machine learning models. Rather than offering toy scripts, it provides end-to-end recipes—data input, model architectures, training loops, evaluation metrics, and logging—so results are comparable across runs and research groups. The library spans canonical modalities and tasks, from image classification and NLP to tabular problems,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PyScaffold

    PyScaffold

    Python project template generator with batteries included

    PyScaffold is a project generator for bootstrapping high-quality Python packages, ready to be shared on PyPI and installable via pip. It is easy to use and encourages the adoption of the best tools and practices of the Python ecosystem, helping you and your team to stay sane, happy and productive. The best part? It is stable and has been used by thousands of developers for over half a decade! Checkout out this demo project, which was set up using PyScaffold and if you are still not convinced...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    sqlite-utils

    sqlite-utils

    Python CLI utility and library for manipulating SQLite databases

    ...The project also embraces an ecosystem of plugins, so you can add custom SQL functions, extra commands, or UIs (including a terminal UI) via separate packages. Because it’s designed by someone who uses SQLite heavily in real projects, the tool includes many small quality-of-life features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TextTest is an application-independent tool for text-based functional testing. This means running a batch-mode binary in lots of different ways, and using the text output produced as a means of controlling the behaviour of that application.
    Leader badge
    Downloads: 107 This Week
    Last Update:
    See Project
  • 14
    Grassroots DICOM

    Grassroots DICOM

    Cross-platform DICOM implementation

    Grassroots DiCoM is a C++ library for DICOM medical files. It is accessible from Python, C#, Java and PHP. It supports RAW, JPEG, JPEG 2000, JPEG-LS, RLE and deflated transfer syntax. It comes with a super fast scanner implementation to quickly scan hundreds of DICOM files. It supports SCU network operations (C-ECHO, C-FIND, C-STORE, C-MOVE). PS 3.3 & 3.6 are distributed as XML files. It also provides PS 3.15 certificates and password based mecanism to anonymize and de-identify DICOM datasets.
    Leader badge
    Downloads: 120 This Week
    Last Update:
    See Project
  • 15
    Muse: Middleware Universal Scripting idE

    Muse: Middleware Universal Scripting idE

    Automate: WebSphere; WebLogic; JBoss; Glassfish; Tomcat; Linux, WinRM

    Simplify... Aggregate... Automate... Simplify... *** OPEN SOURCE - GPL3/EPL. Use Python / Jython to automate WebSphere, WebLogic, JBoss, Glassfish and Tomcat Middleware Estates over JMX, both SSL and non-SSL + Linux SSH (agent-less) + WinRM Target all 5 servers, Linux and WinRM from the same workspace. Familiar Eclipse based Jython and Python Development IDE, pre-configured and ready to go. 4-Click Installer. Win x64, Linux WINE x64. Built-In JVM. Java 8/9/10, Amazon...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    fastMRI

    fastMRI

    A large open dataset + tools to speed up MRI scans using ML

    fastMRI is a large-scale collaborative research project by Facebook AI Research (FAIR) and NYU Langone Health that explores how deep learning can accelerate magnetic resonance imaging (MRI) acquisition without compromising image quality. By enabling reconstruction of high-fidelity MR images from significantly fewer measurements, fastMRI aims to make MRI scanning faster, cheaper, and more accessible in clinical settings. The repository provides an open-source PyTorch framework with data loaders, subsampling utilities, reconstruction models, and evaluation metrics, supporting both research reproducibility and practical experimentation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Cinemagoer

    Cinemagoer

    Python package to retrieve and manage data of the IMDb

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies. Platform-independent, it can retrieve data from both the IMDb's web server and a local copy of the whole db.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    Grow.dev

    Grow.dev

    A declarative website generator designed for high-quality websites

    Grow.dev is a static site generator optimized for building highly interactive, localized microsites. Grow.dev focuses on providing optimal workflows and developer ergonomics for creating projects that are highly maintainable in the long term. Grow.dev encourages a strong but simple separation of content and presentation and makes maintaining content in different locales and environments a snap.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    abu

    abu

    Abu quantitative trading system (stocks, options, futures, bitcoin)

    ...The above system combines hundreds of seed quantitative models, such as financial time series loss model, deep pattern quality assessment model, long and short pattern combination evaluation model, long pattern stop-loss strategy model, short pattern covering strategy model, big data K-line pattern Historical portfolio fitting model, trading position mentality model, dopamine quantification model, inertial residual resistance support model, long-short swap revenge probability model, strong and weak confrontation model, trend angle change rate model, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    matplotlib
    Matplotlib is a python library for making publication quality plots using a syntax familiar to MATLAB users. Matplotlib uses numpy for numerics. Output formats include PDF, Postscript, SVG, and PNG, as well as screen display. As of matplotlib version 1.5, we are no longer making file releases available on SourceForge. Please visit http://matplotlib.org/users/installing.html for help obtaining matplotlib.
    Leader badge
    Downloads: 52 This Week
    Last Update:
    See Project
  • 21
    dotCODES_Source_Control_for_VS

    dotCODES_Source_Control_for_VS

    The dotCODES Source Control Maintenance Mainframe (SCM2)

    The dotCODES Source Control Maintenance Mainframe for Visual Studio is an administrator console application for developing dotCODES components. Built upon a Python foundation, the program is used to create data center routines (Unix packages) and maintain enterprise cloud services (CGI scripts/Apache) by means of building dotCODES runtimes and deploying them to and from the client server.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MSCViewer

    MSCViewer

    A tool for visualization and analysis of logs as sequence diagrams

    MSCViewer is a tool intended for debugging of control flows in concurrent, distributed systems. The tool loads logs generated by various entities in the system and visualize a sequence diagram chart for events and interactions. The diagram is fully interactive: entity can be added/removed from the diagram and shuffled; events can be filtered, searched, highlighted and annotated with comments. MSCViewer features integration with a Python interpreter which allows writing Python scripts...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    LightProfiler

    LightProfiler

    Profiler for Oracle extended SQL trace files

    LightProfiler – application for performance analysis of the Oracle databases. It generates detailed resource profile for extended SQL trace files (10046 event), containing information about consuming of response time (by events, by cursors, etc.), data files usage, error analysis (SQL, PL/SQL) and much more. Also it contain tools for additional processing of trace files (extract session data, splitting files) and for management of database's sessions (disconnecting, tracing, monitor...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Python XML Validator

    Python XML Validator

    Validates XML against xml schema. Based on Python lxml module.

    Validates XML against xml schema. Based on Python lxml module.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    cutplace
    Cutplace validates tabular data (CSV, fixed format) according to an interface control document (ICD). The ICD acts as executable specification and can be described using popular spreadsheet applications (Calc, Excel).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB