duplicate linux free download

Showing 11 open source projects for "duplicate linux"

View related business solutions

Data Management Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

janitor

Simple tools for data cleaning in R

janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.

Downloads: 0 This Week

Last Update: 2025-07-30
See Project
2

Zingg

Scalable master data management and identity resolution

Zingg is an open-source entity resolution and master data management platform for finding duplicate, related, or matching records across large datasets. It uses machine learning to learn how records should be compared, reducing the need for brittle hand-written matching rules. The project is designed for data engineering and analytics teams working on customer 360, supplier 360, deduplication, fuzzy matching, data quality, and golden record workflows. Zingg runs on Apache Spark and can scale...

Downloads: 8 This Week

Last Update: 3 days ago
See Project
3

ydata-profiling

Create HTML profiling reports from pandas DataFrame objects

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
4

Discord.SortedSet

Elixir SortedSet backed by a Rust-based NIF

SortedSet NIF is a performant and reliable sorted set data structure for Elixir, implemented in Rust using the Rustler crate to take advantage of native performance while maintaining seamless integration with the BEAM ecosystem. It provides ordering and uniqueness guarantees, with all terms stored according to Elixir’s built-in sorting rules. Internally, it uses a vector of vectors layout rather than a single vector to minimize costly reallocations, allowing efficient bucket pointer copying...

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

The Timeline Project

Cross-platform app for displaying and navigating events on a timeline.

The Timeline Project aims to create a free, cross-platform application for displaying and navigating events on a timeline.

46 Reviews

Downloads: 94 This Week

Last Update: 2026-04-16
See Project
6

text-dedup

All-in-one text de-duplication

text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
7

MarDRe

MapReduce-based tool to remove duplicate DNA reads

MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems....

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
8

DataCleaner

Data quality analysis, profiling, cleansing, duplicate detection +more

DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io

3 Reviews

Downloads: 13 This Week

Last Update: 2019-02-12
See Project
9

PAICE: Rapid pathway visualization

PAICE is a rapid bioinformatics pathway visualization tool for KEGG-compatible accessions derived from Illumina Solexa next-gen and Affymetrix datasets. It colors KEGG pathways while appreciating detection-calls and duplicate gene copies.

Downloads: 0 This Week

Last Update: 2015-02-01
See Project
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
10

Joppelganger

A simple little engine to do fuzzy name & address searching. Helps improve data quality and avoids duplicate data entry.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project
11

PHP::Duploc

PHP::Duploc helps you find duplicate code (a \"bad smell\", prompting a refactoring) in your PHP scripts. By looking for certain patterns in a \"dotplot\" graph, you can visually get a quick overview over a large body of code.

Downloads: 0 This Week

Last Update: 2015-04-14
See Project