Showing 10 open source projects for "duplicate linux"

View related business solutions
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    janitor

    janitor

    Simple tools for data cleaning in R

    janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Zingg

    Zingg

    Scalable master data management and identity resolution

    Zingg is an open-source entity resolution and master data management platform for finding duplicate, related, or matching records across large datasets. It uses machine learning to learn how records should be compared, reducing the need for brittle hand-written matching rules. The project is designed for data engineering and analytics teams working on customer 360, supplier 360, deduplication, fuzzy matching, data quality, and golden record workflows. Zingg runs on Apache Spark and can scale...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    ydata-profiling

    ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Discord.SortedSet

    Discord.SortedSet

    Elixir SortedSet backed by a Rust-based NIF

    SortedSet NIF is a performant and reliable sorted set data structure for Elixir, implemented in Rust using the Rustler crate to take advantage of native performance while maintaining seamless integration with the BEAM ecosystem. It provides ordering and uniqueness guarantees, with all terms stored according to Elixir’s built-in sorting rules. Internally, it uses a vector of vectors layout rather than a single vector to minimize costly reallocations, allowing efficient bucket pointer copying...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    The Timeline Project

    The Timeline Project

    Cross-platform app for displaying and navigating events on a timeline.

    The Timeline Project aims to create a free, cross-platform application for displaying and navigating events on a timeline.
    Leader badge
    Downloads: 94 This Week
    Last Update:
    See Project
  • 6
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DataCleaner

    DataCleaner

    Data quality analysis, profiling, cleansing, duplicate detection +more

    DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    PAICE is a rapid bioinformatics pathway visualization tool for KEGG-compatible accessions derived from Illumina Solexa next-gen and Affymetrix datasets. It colors KEGG pathways while appreciating detection-calls and duplicate gene copies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    A simple little engine to do fuzzy name & address searching. Helps improve data quality and avoids duplicate data entry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    PHP::Duploc helps you find duplicate code (a \"bad smell\", prompting a refactoring) in your PHP scripts. By looking for certain patterns in a \"dotplot\" graph, you can visually get a quick overview over a large body of code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB