Showing 8 open source projects for "duplicates"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    janitor

    janitor

    Simple tools for data cleaning in R

    janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ...File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Audit Data Analytics
    Audit Data Analytics, LLC's ADA software is an open source solution for performing operations on audit data, such as filtering specific records and summarizing data sets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    SafeUtils

    SafeUtils

    110+ developer tools as native MacOS, Linux & Windows desktop apps.

    Tools: https://safeutils.com/barcode-generator https://safeutils.com/color-picker https://safeutils.com/qr-code-generator https://safeutils.com/qr-code-scanner https://safeutils.com/word-counter https://safeutils.com/base-64-decoder https://safeutils.com/diff-checker https://safeutils.com/hex-to-ascii https://safeutils.com/json-formatter https://safeutils.com/lorem-ipsum-generator https://safeutils.com/random-generator https://safeutils.com/time-converter https://safeutils.com/...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DataCleaner

    DataCleaner

    Data quality analysis, profiling, cleansing, duplicate detection +more

    DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
    Downloads: 16 This Week
    Last Update:
    See Project
  • 7
    Apache Kafka

    Apache Kafka

    Mirror of Apache Kafka

    ...The commit/offset model and retention policies support patterns from real-time processing to event sourcing and audit trails. Exactly-once processing semantics, idempotent producers, and transactions help prevent duplicates across complex dataflows. Kafka Streams and Kafka Connect extend the core: Streams provides a library for stateful stream processing within applications, while Connect standardizes integration with external systems. With horizontal scalability, strong ordering guarantees within partitions, and mature tooling, Kafka serves as the backbone for event-driven architectures across analytics, microservices, and data integration.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Toolsverse ETL Framework

    Toolsverse ETL Framework

    Open source Extract Transform Load engine written in Java

    ETL Framework is a standalone Extract Transform Load engine written in Java. It includes executables for all major platforms and can be easily integrated into other applications. Key Features: * embeddable, open source and free * fast and scalable * uses target database features to do transformations and loads * manual and automatic data mapping * data streaming * bulk data loads * data quality features using SQL, JavaScript? and regex * data transformations Requirements *...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB