duplicate linux free download

Showing 10 open source projects for "duplicate linux"

View related business solutions

Data Management Mac Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

janitor

Simple tools for data cleaning in R

janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.

Downloads: 0 This Week

Last Update: 2025-07-30
See Project
2

Zingg

Scalable master data management and identity resolution

Zingg is an open-source entity resolution and master data management platform for finding duplicate, related, or matching records across large datasets. It uses machine learning to learn how records should be compared, reducing the need for brittle hand-written matching rules. The project is designed for data engineering and analytics teams working on customer 360, supplier 360, deduplication, fuzzy matching, data quality, and golden record workflows. Zingg runs on Apache Spark and can scale...

Downloads: 8 This Week

Last Update: 3 days ago
See Project
3

ydata-profiling

Create HTML profiling reports from pandas DataFrame objects

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
4

Discord.SortedSet

Elixir SortedSet backed by a Rust-based NIF

SortedSet NIF is a performant and reliable sorted set data structure for Elixir, implemented in Rust using the Rustler crate to take advantage of native performance while maintaining seamless integration with the BEAM ecosystem. It provides ordering and uniqueness guarantees, with all terms stored according to Elixir’s built-in sorting rules. Internally, it uses a vector of vectors layout rather than a single vector to minimize costly reallocations, allowing efficient bucket pointer copying...

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
5

The Timeline Project

Cross-platform app for displaying and navigating events on a timeline.

The Timeline Project aims to create a free, cross-platform application for displaying and navigating events on a timeline.

46 Reviews

Downloads: 94 This Week

Last Update: 2026-04-16
See Project
6

text-dedup

All-in-one text de-duplication

text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
7

DataCleaner

Data quality analysis, profiling, cleansing, duplicate detection +more

DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io

3 Reviews

Downloads: 13 This Week

Last Update: 2019-02-12
See Project
8

PAICE: Rapid pathway visualization

PAICE is a rapid bioinformatics pathway visualization tool for KEGG-compatible accessions derived from Illumina Solexa next-gen and Affymetrix datasets. It colors KEGG pathways while appreciating detection-calls and duplicate gene copies.

Downloads: 0 This Week

Last Update: 2015-02-01
See Project
9

Joppelganger

A simple little engine to do fuzzy name & address searching. Helps improve data quality and avoids duplicate data entry.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

PHP::Duploc

PHP::Duploc helps you find duplicate code (a \"bad smell\", prompting a refactoring) in your PHP scripts. By looking for certain patterns in a \"dotplot\" graph, you can visually get a quick overview over a large body of code.

Downloads: 0 This Week

Last Update: 2015-04-14
See Project