NYCOpenData-Profiling-Analysis download

Open data often comes with little or no metadata. You will profile a large collection of open data sets and derive metadata that can be used for data discovery, querying, and identification of data quality problems. For each column, identify and summarize the semantic types present in the column. These can be generic types (e.g., city, state) or collection-specific types (NYU school names, NYC agency). For each semantic type T identified, enumerate all the values encountered for T in all columns present in the collection.

Features

Number of non-empty cells
Number of empty cells (i.e., cell with no data)
Top-5 most frequent value(s)
Data types (a column may contain values belonging to multiple types)
Semantic Profiling
Data Analysis

Project Samples

NYCOpenData-Profiling-Analysis Screenshot 1

Project Activity

See All Activity >

License

MIT License

Follow NYCOpenData-Profiling-Analysis

NYCOpenData-Profiling-Analysis Web Site

User Reviews

Be the first to post a review of NYCOpenData-Profiling-Analysis!

Additional Project Details

Programming Language

Python

Related Categories

Python Data Profiling Tool

Registered

2023-06-12

Similar Business Software

DuckDB

Processing and storing tabular datasets, e.g. from CSV or Parquet files. Large result set transfer to client. Large client/server installations for centralized enterprise data warehousing. Writing to a single database from multiple concurrent processes. DuckDB is a relational database management...

See Software
TIBCO Clarity

TIBCO Clarity is a data preparation tool that offers you on-demand software services from the web in the form of Software-as-a-Service. You can use TIBCO Clarity to discover, profile, cleanse, and standardize raw data collated from disparate sources and provide good quality data for accurate...

See Software
Secoda

With Secoda AI on top of your metadata, you can now get contextual search results from across your tables, columns, dashboards, metrics, and queries. Secoda AI can also help you generate documentation and queries from your metadata, saving your team hundreds of hours of mundane work and...

See Software

Report inappropriate content