Python Data Profiling Tools

View 3987 business solutions

Browse free open source Python Data Profiling Tools and projects below. Use the toggles on the left to filter open source Python Data Profiling Tools by OS, license, language, programming language, and project status.

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Panda-Helper

    Panda-Helper

    Panda-Helper: Data profiling utility for Pandas DataFrames and Series

    Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Metacrafter

    Metacrafter

    Metadata and data identification tool and Python library

    Python command line tool and Python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifiable information (PII). Metacrafter is a rule-based tool that helps to label fields of the tables in databases. It scans table and finds person names, surnames, midnames, PII data, basic identifiers like UUID/GUID. These rules written as .yaml files and could be easily extended.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Population Shift Monitoring

    Population Shift Monitoring

    Monitor the stability of a Pandas or Spark dataframe

    popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets. popmon creates histograms of features binned in time-slices, and compares the stability of the profiles and distributions of those histograms using statistical tests, both over time and with respect to a reference. It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two features. popmon can automatically flag and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc, using monitoring business rules. Advanced users can leverage popmon's modular data pipeline to customize their workflow. Visualization of the pipeline can be useful when debugging or for didactic purposes. There is a script included with the package that you can use.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    COBOL Data Definitions
    Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 5
    Data Preprocessing Automate

    Data Preprocessing Automate

    Data Preprocessing Automation: A GUI for easy data cleaning & visualiz

    Data Preprocessing Automation is a Python-based GUI application designed to simplify and automate data preprocessing tasks. It allows users to upload Excel files, automatically handle missing values, remove duplicates, and detect and remove outliers using statistical methods. The application provides data visualization tools, including box plots for distribution analysis and scatter plots for exploring relationships between variables. Users can download the processed data for further analysis. Built with Tkinter, Pandas, Matplotlib, and Seaborn, it ensures an intuitive interface and efficient performance. Additionally, it features a custom logo, a clean UI with a green-blue theme, and options for licensing and public release. This tool is ideal for data analysts, researchers, and professionals looking to automate preprocessing without coding. 🚀
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    NYCOpenData-Profiling-Analysis

    NYCOpenData-Profiling-Analysis

    Open Data Profiling, Quality and Analysis on NYC OpenData dataset

    Open data often comes with little or no metadata. You will profile a large collection of open data sets and derive metadata that can be used for data discovery, querying, and identification of data quality problems. For each column, identify and summarize the semantic types present in the column. These can be generic types (e.g., city, state) or collection-specific types (NYU school names, NYC agency). For each semantic type T identified, enumerate all the values encountered for T in all columns present in the collection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Optimus

    Optimus

    Agile Data Preparation Workflows made easy with Pandas

    Easily write code to clean, transform, explore and visualize data using Python. Process using a simple API, making it easy to use for newcomers. More than 100 functions to handle strings, process dates, urls and emails. Easily plot data from any size. Out-of-box functions to explore and fix data quality. Use the same code to process your data in your laptop or in a remote cluster of GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Swiple

    Swiple

    Swiple enables you to easily observe, understand, validate data

    Swiple is an automated data monitoring platform that helps analytics and data engineering teams seamlessly monitor the quality of their data. With automated data analysis and profiling, scheduling and alerting, teams can resolve data quality issues before they impact mission critical resources. Experience hassle-free integration with Swiple's zero-infrastructure and zero-code setup. Seamlessly incorporate data quality checks into your existing workflows without any coding or infrastructure changes, allowing you to focus on what matters most - your data. Save engineers weeks of time generating data quality checks. Swiple analyzes your dataset and builds data quality checks based on what is observed in your data. You just pick the ones you want.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    odd-collector

    odd-collector

    Open-source metadata collector based on ODD Specification

    ODD Collector is a lightweight service that gathers metadata from all your data sources. Push-client is a provider which sends information directly to the central repository of the Platform. ODDRN (Open Data Discovery Resource Name) is a unique resource name that identifies entities such as data sources, data entities, dataset fields etc. It is used to build lineage and update metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Powering the best of the internet | Fastly Icon
    Powering the best of the internet | Fastly

    Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

    Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.
    Try for free
  • 10
    odd-collector-gcp

    odd-collector-gcp

    Open-source GCP metadata collector based on ODD Specification

    ODD Collector GCP is a lightweight service which gathers metadata from all your Google Cloud Platform data sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    rimhistory

    rimhistory

    RimWorld game save data analyzer

    RimWorld game save data analyzer.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.