This is a Java implementation of WhyLogs, with support for Apache Spark integration for large scale datasets. Understanding the properties of data as it moves through applications is essential to keeping your ML/AI pipeline stable and improving your user experience, whether your pipeline is built for production or experimentation. WhyLogs is an open source statistical logging library that allows data science and ML teams to effortlessly profile ML/AI pipelines and applications, producing log files that can be used for monitoring, alerts, analytics, and error analysis. WhyLogs calculates approximate statistics for datasets of any size up to TB-scale, making it easy for users to identify changes in the statistical properties of a model's inputs or outputs. Using approximate statistics allows the package to run on minimal infrastructure and monitor an entire dataset, rather than miss outliers and other anomalies by only using a sample of the data to calculate statistics.

Features

  • WhyLogs provides complex statistics across different stages of your ML/AI pipelines and applications
  • WhyLogs scales with your system, from local development mode to live production systems in multi-node clusters, and works well with batch and streaming architectures
  • WhyLogs produces small mergeable lightweight outputs in a variety of formats, using sketching algorithms and summarizing statistics
  • To enable data engineering pipelines and ML pipelines to share a common framework for tracking data quality and drifts, the WhyLogs library supports multiple languages and integrations
  • In addition to supporting traditional monitoring approaches, WhyLogs data can support advanced ML-focused analytics, error analysis, and data quality and data drift detection
  • Unified data instrumentation

Project Samples

Project Activity

See All Activity >

Categories

Data Quality

License

Apache License V2.0

Follow WhyLogs Java Library

WhyLogs Java Library Web Site

You Might Also Like
Top-Rated Free CRM Software Icon
Top-Rated Free CRM Software

216,000+ customers in over 135 countries grow their businesses with HubSpot

HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of WhyLogs Java Library!

Additional Project Details

Programming Language

Java

Related Categories

Java Data Quality Tool

Registered

2023-06-12