Data Profiling Tools for Linux

View 411 business solutions

Browse free open source Data Profiling tools and projects for Linux below. Use the toggles on the left to filter open source Data Profiling tools by OS, license, language, programming language, and project status.

  • SKUDONET Open Source Load Balancer Icon
    SKUDONET Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    SKUDONET ADC, operates at the application layer, efficiently distributing network load and application load across multiple servers. This not only enhances the performance of your application but also ensures that your web servers can handle more traffic seamlessly.
  • Manage your IT department more effectively Icon
    Manage your IT department more effectively

    Streamline your business from end to end with ConnectWise PSA

    ConnectWise PSA (formerly Manage) allows you to stop working in separate systems, and helps you build a more profitable business. No more duplicate data entries, inefficient employees, manual invoices, and the inability to accurately track client service issues. Get a behind the scenes look into the award-winning PSA that automates processes for each area of business: sales, help desk, support, finance, and HR.
  • 1
    DataCleaner

    DataCleaner

    Data quality analysis, profiling, cleansing, duplicate detection +more

    DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
    Leader badge
    Downloads: 153 This Week
    Last Update:
    See Project
  • 2
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Leader badge
    Downloads: 54 This Week
    Last Update:
    See Project
  • 3
    COBOL Data Definitions
    Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    DISTOD

    DISTOD

    Distributed discovery of bidirectional order dependencies

    The DISTOD data profiling algorithm is a distributed algorithm to discover bidirectional order dependencies (in set-based form) from relational data. DISTOD is based on the single-threaded FASTOD-BID algorithm [1], but DISTOD scales elastically to many machines outperforming FASTOD-BID by up to orders of magnitude. Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cyber Risk Assessment and Management Platform Icon
    Cyber Risk Assessment and Management Platform

    ConnectWise Identify is a powerful cybersecurity risk assessment platform offering strategic cybersecurity assessments and recommendations.

    When it comes to cybersecurity, what your clients don’t know can really hurt them. And believe it or not, keep them safe starts with asking questions. With ConnectWise Identify Assessment, get access to risk assessment backed by the NIST Cybersecurity Framework to uncover risks across your client’s entire business, not just their networks. With a clearly defined, easy-to-read risk report in hand, you can start having meaningful security conversations that can get you on the path of keeping your clients protected from every angle. Choose from two assessment levels to cover every client’s need, from the Essentials to cover the basics to our Comprehensive Assessment to dive deeper to uncover additional risks. Our intuitive heat map shows you your client’s overall risk level and priority to address risks based on probability and financial impact. Each report includes remediation recommendations to help you create a revenue-generating action plan.
  • 5
    DQO Data Quality Operations Center

    DQO Data Quality Operations Center

    Data Quality Operations Center

    DQO is an DataOps friendly data quality monitoring tool with customizable data quality checks and data quality dashboards. DQO comes with around 100 predefined data quality checks which helps you monitor the quality of your data. Table and column-level checks which allows writing your own SQL queries. Daily and monthly date partition testing. Data segmentation by up to 9 different data streams. Build-in scheduling. Calculation of data quality KPIs which can be displayed on multiple built-in data quality dashboards.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Metacrafter

    Metacrafter

    Metadata and data identification tool and Python library

    Python command line tool and Python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifiable information (PII). Metacrafter is a rule-based tool that helps to label fields of the tables in databases. It scans table and finds person names, surnames, midnames, PII data, basic identifiers like UUID/GUID. These rules written as .yaml files and could be easily extended.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    NYCOpenData-Profiling-Analysis

    NYCOpenData-Profiling-Analysis

    Open Data Profiling, Quality and Analysis on NYC OpenData dataset

    Open data often comes with little or no metadata. You will profile a large collection of open data sets and derive metadata that can be used for data discovery, querying, and identification of data quality problems. For each column, identify and summarize the semantic types present in the column. These can be generic types (e.g., city, state) or collection-specific types (NYU school names, NYC agency). For each semantic type T identified, enumerate all the values encountered for T in all columns present in the collection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Optimus

    Optimus

    Agile Data Preparation Workflows made easy with Pandas

    Easily write code to clean, transform, explore and visualize data using Python. Process using a simple API, making it easy to use for newcomers. More than 100 functions to handle strings, process dates, urls and emails. Easily plot data from any size. Out-of-box functions to explore and fix data quality. Use the same code to process your data in your laptop or in a remote cluster of GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Panda-Helper

    Panda-Helper

    Panda-Helper: Data profiling utility for Pandas DataFrames and Series

    Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Powerful small business accounting software Icon
    Powerful small business accounting software

    For small businesses looking for desktop accounting software

    With AccountEdge, business owners can organize, process, and report on their financial information so they can focus on their business. Features include: accounting, integrated payroll, sales and purchases, contact management, inventory tracking, time billing, and more.
  • 10
    Population Shift Monitoring

    Population Shift Monitoring

    Monitor the stability of a Pandas or Spark dataframe

    popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets. popmon creates histograms of features binned in time-slices, and compares the stability of the profiles and distributions of those histograms using statistical tests, both over time and with respect to a reference. It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two features. popmon can automatically flag and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc, using monitoring business rules. Advanced users can leverage popmon's modular data pipeline to customize their workflow. Visualization of the pipeline can be useful when debugging or for didactic purposes. There is a script included with the package that you can use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    RDS - JS - Examples

    RDS - JS - Examples

    TypeScript/JavaScript example code using the RDS API

    Rich Data Services (or RDS) is a suite of REST APIs designed by Metadata Technology North America (MTNA) to meet various needs for data engineers, managers, custodians, and consumers. RDS provides a range of services including data profiling, mapping, transformation, validation, ingestion, and dissemination. For more information about each of these APIs and how you can incorporate or consume them as part of your work flow please visit the MTNA website. RDS-JS-Examples is TypeScript/JavaScript repository for showcases and examples to demonstrate using the RDS API. Many of the examples will leverage the RDS JavaScript SDK to simplify and faciliate interacting with any given RDS API. By using this SDK you will add to your project the benefit of strong types and easy to use helper functions that directly reflect the RDS API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Roomba

    Roomba

    A Node.js tool to examine the correctness of Open Data Metadata

    Linked Open Data (LOD) has emerged as one of the largest collection of interlinked datasets on the web. Benefiting from this mine of data requires the existence of descriptive information about each dataset in the accompanying metadata. Such meta information is currently very limited to few data portals where they are usually provided manually thus giving little or bad quality insights. To address this issue, we propose a scalable automatic approach for extracting, validating and generating descriptive linked dataset profiles. This approach applies several techniques to check the validity of the attached metadata as well as providing descriptive and statistical information of a certain dataset as well as a whole data portal. Using our framework on prominent data portals shows that the general state of the Linked Open Data needs attention as most of datasets suffer from bad quality metadata and lack additional informative metrics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Semantic Type Detection

    Semantic Type Detection

    Metadata/data identification Java library

    Metadata/data identification Java library. Identifies Base Type (e.g. Boolean, Double, Long, String, LocalDate, LocalTime, ...) and Semantic Type information (e.g. Gender, Age, Color, Country, ...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support. Large set of built-in Semantic Types (extensible via JSON defined plugins). Extensive Profiling metrics (e.g. Min, Max, Distinct, signatures, …) Sufficiently fast to be used inline. See Speed notes below. Minimal false positives for Semantic type detection. See Performance notes below. Usable in either Streaming, Bulk or Record mode. Broad country/language support - including US, Canada, Mexico, Brazil, UK, Australia, much of Europe, Japan and China. Support for sharded analysis (i.e. Analysis results can be merged) Once stream is profiled then subsequent samples can be validated and/or new samples can be generated.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Swiple

    Swiple

    Swiple enables you to easily observe, understand, validate data

    Swiple is an automated data monitoring platform that helps analytics and data engineering teams seamlessly monitor the quality of their data. With automated data analysis and profiling, scheduling and alerting, teams can resolve data quality issues before they impact mission critical resources. Experience hassle-free integration with Swiple's zero-infrastructure and zero-code setup. Seamlessly incorporate data quality checks into your existing workflows without any coding or infrastructure changes, allowing you to focus on what matters most - your data. Save engineers weeks of time generating data quality checks. Swiple analyzes your dataset and builds data quality checks based on what is observed in your data. You just pick the ones you want.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to run Unzip the zip file Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json Mac UNIX java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    odd-collector

    odd-collector

    Open-source metadata collector based on ODD Specification

    ODD Collector is a lightweight service that gathers metadata from all your data sources. Push-client is a provider which sends information directly to the central repository of the Platform. ODDRN (Open Data Discovery Resource Name) is a unique resource name that identifies entities such as data sources, data entities, dataset fields etc. It is used to build lineage and update metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    odd-collector-gcp

    odd-collector-gcp

    Open-source GCP metadata collector based on ODD Specification

    ODD Collector GCP is a lightweight service which gathers metadata from all your Google Cloud Platform data sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    rimhistory

    rimhistory

    RimWorld game save data analyzer

    RimWorld game save data analyzer.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next