Python Data Profiling Tools

View 4069 business solutions

Browse free open source Python Data Profiling Tools and projects below. Use the toggles on the left to filter open source Python Data Profiling Tools by OS, license, language, programming language, and project status.

  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Auth for GenAI | Auth0 Icon
    Auth for GenAI | Auth0

    Enable AI agents to securely access tools, workflows, and data with fine-grained control and just a few lines of code.

    Easily implement secure login experiences for AI Agents - from interactive chatbots to background workers with Auth0. Auth for GenAI is now available in Developer Preview
    Try free now
  • 1
    odd-collector-gcp

    odd-collector-gcp

    Open-source GCP metadata collector based on ODD Specification

    ODD Collector GCP is a lightweight service which gathers metadata from all your Google Cloud Platform data sources.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Metacrafter

    Metacrafter

    Metadata and data identification tool and Python library

    Python command line tool and Python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifiable information (PII). Metacrafter is a rule-based tool that helps to label fields of the tables in databases. It scans table and finds person names, surnames, midnames, PII data, basic identifiers like UUID/GUID. These rules written as .yaml files and could be easily extended.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    NYCOpenData-Profiling-Analysis

    NYCOpenData-Profiling-Analysis

    Open Data Profiling, Quality and Analysis on NYC OpenData dataset

    Open data often comes with little or no metadata. You will profile a large collection of open data sets and derive metadata that can be used for data discovery, querying, and identification of data quality problems. For each column, identify and summarize the semantic types present in the column. These can be generic types (e.g., city, state) or collection-specific types (NYU school names, NYC agency). For each semantic type T identified, enumerate all the values encountered for T in all columns present in the collection.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Optimus

    Optimus

    Agile Data Preparation Workflows made easy with Pandas

    Easily write code to clean, transform, explore and visualize data using Python. Process using a simple API, making it easy to use for newcomers. More than 100 functions to handle strings, process dates, urls and emails. Easily plot data from any size. Out-of-box functions to explore and fix data quality. Use the same code to process your data in your laptop or in a remote cluster of GPUs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 5
    Panda-Helper

    Panda-Helper

    Panda-Helper: Data profiling utility for Pandas DataFrames and Series

    Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Population Shift Monitoring

    Population Shift Monitoring

    Monitor the stability of a Pandas or Spark dataframe

    popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets. popmon creates histograms of features binned in time-slices, and compares the stability of the profiles and distributions of those histograms using statistical tests, both over time and with respect to a reference. It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two features. popmon can automatically flag and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc, using monitoring business rules. Advanced users can leverage popmon's modular data pipeline to customize their workflow. Visualization of the pipeline can be useful when debugging or for didactic purposes. There is a script included with the package that you can use.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Swiple

    Swiple

    Swiple enables you to easily observe, understand, validate data

    Swiple is an automated data monitoring platform that helps analytics and data engineering teams seamlessly monitor the quality of their data. With automated data analysis and profiling, scheduling and alerting, teams can resolve data quality issues before they impact mission critical resources. Experience hassle-free integration with Swiple's zero-infrastructure and zero-code setup. Seamlessly incorporate data quality checks into your existing workflows without any coding or infrastructure changes, allowing you to focus on what matters most - your data. Save engineers weeks of time generating data quality checks. Swiple analyzes your dataset and builds data quality checks based on what is observed in your data. You just pick the ones you want.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    odd-collector

    odd-collector

    Open-source metadata collector based on ODD Specification

    ODD Collector is a lightweight service that gathers metadata from all your data sources. Push-client is a provider which sends information directly to the central repository of the Platform. ODDRN (Open Data Discovery Resource Name) is a unique resource name that identifies entities such as data sources, data entities, dataset fields etc. It is used to build lineage and update metadata.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    rimhistory

    rimhistory

    RimWorld game save data analyzer

    RimWorld game save data analyzer.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Sales CRM and Pipeline Management Software | Pipedrive Icon
    Sales CRM and Pipeline Management Software | Pipedrive

    The easy and effective CRM for closing deals

    Pipedrive’s simple interface empowers salespeople to streamline workflows and unite sales tasks in one workspace. Unlock instant sales insights with Pipedrive’s visual sales pipeline and fine-tune your strategy with robust reporting features and a personalized AI Sales Assistant.
    Try it for free
  • 10
    COBOL Data Definitions
    Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Data Preprocessing Automate

    Data Preprocessing Automate

    Data Preprocessing Automation: A GUI for easy data cleaning & visualiz

    Data Preprocessing Automation is a Python-based GUI application designed to simplify and automate data preprocessing tasks. It allows users to upload Excel files, automatically handle missing values, remove duplicates, and detect and remove outliers using statistical methods. The application provides data visualization tools, including box plots for distribution analysis and scatter plots for exploring relationships between variables. Users can download the processed data for further analysis. Built with Tkinter, Pandas, Matplotlib, and Seaborn, it ensures an intuitive interface and efficient performance. Additionally, it features a custom logo, a clean UI with a green-blue theme, and options for licensing and public release. This tool is ideal for data analysts, researchers, and professionals looking to automate preprocessing without coding. 🚀
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.