duplicate free download

Showing 10 open source projects for "duplicate"

View related business solutions

Data Management Linux Clear Filters & Widen Search

Cut Your Data Warehouse Bill by 54%
Migrate from Snowflake, Redshift, or Databricks with free tools. No SQL rewrites.

BigQuery delivers 54% lower TCO with serverless scale and flexible pricing. Free migration tools handle the SQL translation automatically.

Try Free
Go From Idea to Deployed AI App Fast
One platform to build, fine-tune, and deploy. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

janitor

Simple tools for data cleaning in R

janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.

Downloads: 0 This Week

Last Update: 2025-07-30
See Project
2

Discord.SortedSet

Elixir SortedSet backed by a Rust-based NIF

SortedSet NIF is a performant and reliable sorted set data structure for Elixir, implemented in Rust using the Rustler crate to take advantage of native performance while maintaining seamless integration with the BEAM ecosystem. It provides ordering and uniqueness guarantees, with all terms stored according to Elixir’s built-in sorting rules. Internally, it uses a vector of vectors layout rather than a single vector to minimize costly reallocations, allowing efficient bucket pointer copying...

Downloads: 4 This Week

Last Update: 2026-02-17
See Project
3

ydata-profiling

Create HTML profiling reports from pandas DataFrame objects

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

Downloads: 0 This Week

Last Update: 2026-01-13
See Project
4

The Timeline Project

Cross-platform app for displaying and navigating events on a timeline.

The Timeline Project aims to create a free, cross-platform application for displaying and navigating events on a timeline.

46 Reviews

Downloads: 137 This Week

Last Update: 2025-08-09
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

text-dedup

All-in-one text de-duplication

text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible deduplication strategies, making it ideal for cleaning web-scraped data, language model training datasets, or document archives.

Downloads: 1 This Week

Last Update: 2025-04-08
See Project
6

MarDRe

MapReduce-based tool to remove duplicate DNA reads

MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. ...

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
7

DataCleaner

Data quality analysis, profiling, cleansing, duplicate detection +more

DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io

3 Reviews

Downloads: 6 This Week

Last Update: 2019-02-12
See Project
8

PAICE: Rapid pathway visualization

PAICE is a rapid bioinformatics pathway visualization tool for KEGG-compatible accessions derived from Illumina Solexa next-gen and Affymetrix datasets. It colors KEGG pathways while appreciating detection-calls and duplicate gene copies.

Downloads: 0 This Week

Last Update: 2015-02-01
See Project
9

Joppelganger

A simple little engine to do fuzzy name & address searching. Helps improve data quality and avoids duplicate data entry.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project
Host LLMs in Production With On-Demand GPUs
NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.

Try Free
10

PHP::Duploc

PHP::Duploc helps you find duplicate code (a \"bad smell\", prompting a refactoring) in your PHP scripts. By looking for certain patterns in a \"dotplot\" graph, you can visually get a quick overview over a large body of code.

Downloads: 0 This Week

Last Update: 2015-04-14
See Project