Showing 203 open source projects for "duplicates"

View related business solutions
  • 1
    Czkawka

    Czkawka

    Multi functional app to find duplicates, empty folders, similar images

    Czkawka (Polish for “hiccup”) is a lightning‑fast, multi‑purpose file cleaning tool written in Rust. It helps users declutter storage by finding duplicate files, similar images or audio, empty folders, and unusually large files through CPU‑efficient multithreading. Available with both GUI (GTK‑based) and CLI versions for flexible usage.
    Downloads: 381 This Week
    Last Update:
    See Project
  • 2
    FDUPES

    FDUPES

    FDUPES is a program for identifying or deleting duplicate files

    ...It works by scanning directories and subdirectories, identifying sets of files with identical content through size and hash comparisons, and then listing them together so users can examine duplicates. Once duplicates are identified, the tool offers interactive deletion options where users can choose which copy to keep and which to remove, or apply flags to automate deletion while preserving a single instance. Because it operates directly on file content rather than just filenames, fdupes can accurately detect true copies and guide cleaning operations in data cleanup or migration tasks. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    fastdup

    fastdup

    An unsupervised and free tool for image and video dataset analysis

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 5
    Kuroba Experimental

    Kuroba Experimental

    Free and open source image board browser

    ...Ability to attach multiple media files to reply, attach media files that were shared by external apps (even by some keyboards), attach remote media files by URL, etc. New image downloader. Allows downloading images while the app is in the background, retrying failed to download images, resolving duplicates, etc.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 6
    janitor

    janitor

    Simple tools for data cleaning in R

    janitor provides simple, convenient tools for data cleaning, formatting, and exploration in R. It is especially useful for cleaning messy data frames, removing duplicates, formatting column names, and producing frequency tables in a tidy workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    PostgREST

    PostgREST

    REST API for any Postgres database

    ...The structural constraints and permissions in the database determine the API endpoints and operations. Using PostgREST is an alternative to manual CRUD programming. Custom API servers suffer problems. Writing business logic often duplicates, ignores or hobbles database structure. Object-relational mapping is a leaky abstraction leading to slow imperative code. The PostgREST philosophy establishes a single declarative source of truth: the data itself. It’s easier to ask PostgreSQL to join data for you and let its query planner figure out the details than to loop through rows yourself. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    PhotoPrism

    PhotoPrism

    AI-Powered Photos App for the Decentralized Web 🌈💎✨

    PhotoPrism® is an AI-Powered Photos App for the Decentralized Web. It makes use of the latest technologies to tag and find pictures automatically without getting in your way. You can run it at home, on a private server, or in the cloud. Our mission is to provide the most user- and privacy-friendly solution to keep your pictures organized and accessible. That's why PhotoPrism was built from the ground up to run wherever you need it, without compromising freedom, privacy, or functionality.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 9
    AIOStreams

    AIOStreams

    One addon to rule them all

    AIOStreams is a “super-add-on” for Stremio that consolidates results from many add-ons and debrid services into a single, customizable feed. Instead of juggling multiple add-ons, you point AIOStreams at them; it then queries, merges, de-duplicates, and re-ranks everything according to your rules. The project includes a powerful filtering and sorting engine—think conditions on resolution, codec, provider, cached status, seed count, and more—so power users can shape results precisely. It also provides a template-based formatter to control how streams are labeled in the UI, complete with live preview for quick iteration. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 10
    SortPhotos

    SortPhotos

    SortPhotos is a Python script that organizes photos and videos

    ...SortPhotos includes options for copying versus moving files, recursive searches, silent or test modes, and customizable start times for when a “day” begins. It also prevents duplicate files by comparing content, with an option to keep duplicates if needed. With support for automation through launch agents or cron jobs, SortPhotos is well-suited for photographers, archivists, and anyone looking to streamline large personal or professional media collections.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Aves

    Aves

    Aves is a gallery and metadata explorer app, built for Android

    Aves is a feature-rich gallery app focused on organizing, browsing, and viewing photos and videos with precision. It indexes media on device storage and surfaces powerful ways to explore content, such as by folders, albums, dates, tags, and embedded metadata. The viewer prioritizes smooth gestures (pinch-to-zoom, pan, fling) and a clean, immersive interface that keeps controls out of the way until needed. Aves emphasizes metadata awareness, reading details like EXIF and other tags to enable...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    websocket for Go

    websocket for Go

    Minimal and idiomatic WebSocket library for Go

    ...RFC 7692 permessage-deflate compression. Compile to Wasm. Transparent message buffer reuse with wsjson and wspb subpackages. Gorilla writes directly to a net.Conn and so duplicates features of net/http.Client. Gorilla requires registering a pong callback before sending a Ping. Compare godoc of nhooyr.io/websocket with gorilla/websocket side by side. Will enable easy HTTP/2 support in the future.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    EPLB

    EPLB

    Expert Parallelism Load Balancer

    ...In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing (grouped experts placed at node and then GPU level) and global load balancing depending on configuration. The logic is implemented in eplb.py and supports predicting placements given estimated expert usage weights. EPLB aims to reduce hot-spotting and ensure more uniform usage of compute resources in large MoE deployments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    gitignore.io

    gitignore.io

    Create useful .gitignore files for your project

    ...You can access it from a clean web UI, a simple REST API, or the command line, making it easy to script into new-project scaffolds and automation. The generator accepts multiple technologies in one request, normalizes duplicates, and orders rules sensibly so the result is readable and effective. Templates are versioned and updated over time as tools evolve, helping teams avoid accidentally committing build artifacts, credentials, caches, and other noisy files. The repository includes documentation, example invocations, and contribution guidelines so users can add or refine templates.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MusicBrainz Server

    MusicBrainz Server

    Server for the MusicBrainz project (website, API, database tools)

    MusicBrainz Server is the web application powering the MusicBrainz open music metadata database. It handles artist, album, track, and recording entities and allows users to submit edits, vote, merge duplicates, and browse relationships like releases or linked ISRCs. The server supports full-text search, advanced filtering, web APIs, data dumps, and integration with other services like Cover Art Archive and Discogs. Its architecture ensures referential integrity and versioned change tracking so edits can be audited or reverted, and moderation workflows manage conflicting contributions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Node Modules Inspector

    Node Modules Inspector

    Interactive UI for local node modules inspection

    This is a tool (CLI + interactive UI) for inspecting the node_modules directory of a JavaScript/TypeScript project, created by Anthony Fu. It supports projects using npm, pnpm or bun. The idea is to help developers visualise the dependency graph, see which dependencies are installed, their sizes, types (ESM vs CJS), origins (catalog vs registry), and filter or build a static report of a project’s dependency tree. The project includes a web UI version that you can try at node-modules.dev, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ...File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    pass import

    pass import

    A pass extension for importing data from most existing password

    A pass extension for importing data from most existing password managers. Password management should be simple and follow Unix philosophy. With pass, each password lives inside of a gpg encrypted file whose filename is the title of the website or resource that requires the password. These encrypted files may be organized into meaningful folder hierarchies, copied from computer to computer, and, in general, manipulated using standard command line file management utilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    anti-copy-paster

    A plugin for IntelliJ IDEA for just-in-time code duplicates extraction

    The plugin monitors the copying and pasting that takes place inside the IDE. As soon as a code fragment is pasted, the plugin checks if it introduces code duplication, and if it does, the plugin calculates a set of code metrics for it, and these metrics are compared against the currently selected metrics thresholds. If the chosen thresholds are surpassed, the plugin suggests the developer to perform the Extract Method refactoring and applies the refactoring if necessary.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    RemoveDuplicate

    RemoveDuplicate

    A software that can helps you remove duplicate files

    RemoveDuplicate is a Windows application designed to help users find and remove duplicate files within folders. It uses MD5 hashing to identify identical files and provides a user-friendly interface to manage and delete duplicates while keeping one copy of each file.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    Katalog

    Katalog

    Catalog and Search files from permanent or removable drives

    Katalog is a desktop application to manage catalogs of disks and files: - Create catalogs from different sources or devices, - Search files even when the devices are disconnected, and find duplicates or differences - Organize your Collection of catalogs, Storage devices, and Virtual storage devices and get Statistics, - Data is stored in csv (tab separated) files for full control by the user, - Available in multiple languages - OpenSource and cross-platform (Linux Plasma and Windows 64 installer or portable). ...
    Leader badge
    Downloads: 204 This Week
    Last Update:
    See Project
  • 23

    rsync-backup-ras

    Make backups using rsync

    ...It creates simple copies of the directory trees backed up that can be restored using traditional tools like cp, scp and rsync. It can backup to local file systems or remote ones accessible via ssh. It de-duplicates data, and can creates multiple aged copies that can be pruned in several ways to fit in within space limitations.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 24
    Audit Data Analytics
    Audit Data Analytics, LLC's ADA software is an open source solution for performing operations on audit data, such as filtering specific records and summarizing data sets.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Tartube

    Tartube

    Download videos/channels/playlists from YouTube and many other sites

    Tartube is a GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders. It is written in Python 3 / Gtk 3 and runs on MS Windows, Linux, BSD and MacOS.
    Leader badge
    Downloads: 1,065 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB