Showing 24 open source projects for "duplicates"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    fastdup

    fastdup

    An unsupervised and free tool for image and video dataset analysis

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    EPLB

    EPLB

    Expert Parallelism Load Balancer

    ...In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing (grouped experts placed at node and then GPU level) and global load balancing depending on configuration. The logic is implemented in eplb.py and supports predicting placements given estimated expert usage weights. EPLB aims to reduce hot-spotting and ensure more uniform usage of compute resources in large MoE deployments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    ...File sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata. Mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint). Comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).
    Downloads: 1 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    SortPhotos

    SortPhotos

    SortPhotos is a Python script that organizes photos and videos

    ...SortPhotos includes options for copying versus moving files, recursive searches, silent or test modes, and customizable start times for when a “day” begins. It also prevents duplicate files by comparing content, with an option to keep duplicates if needed. With support for automation through launch agents or cron jobs, SortPhotos is well-suited for photographers, archivists, and anyone looking to streamline large personal or professional media collections.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Tartube

    Tartube

    Download videos/channels/playlists from YouTube and many other sites

    Tartube is a GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders. It is written in Python 3 / Gtk 3 and runs on MS Windows, Linux, BSD and MacOS.
    Leader badge
    Downloads: 1,301 This Week
    Last Update:
    See Project
  • 8

    ar-mercurial

    fork of Mercurial SCM

    Fork of Mercurial SCM (https://mercurial.selenic.com/) with additions: - revision duplicates collision resolves - sparse fixes: -- purge cleanup all out-of-sparse (issue5626) -- merge only changes in sparse (issue6521) -- dirstat refresh on sparse conf changes -- `share` comand suport sparse -- ignore subrepos out of sparse - command `debugrevlog` enhances - #6745 feature : VFS failure on OS-impossible names now just drop file of conflict.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Data Preprocessing Automate

    Data Preprocessing Automate

    Data Preprocessing Automation: A GUI for easy data cleaning & visualiz

    Data Preprocessing Automation is a Python-based GUI application designed to simplify and automate data preprocessing tasks. It allows users to upload Excel files, automatically handle missing values, remove duplicates, and detect and remove outliers using statistical methods. The application provides data visualization tools, including box plots for distribution analysis and scatter plots for exploring relationships between variables. Users can download the processed data for further analysis. Built with Tkinter, Pandas, Matplotlib, and Seaborn, it ensures an intuitive interface and efficient performance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    pass import

    pass import

    A pass extension for importing data from most existing password

    A pass extension for importing data from most existing password managers. Password management should be simple and follow Unix philosophy. With pass, each password lives inside of a gpg encrypted file whose filename is the title of the website or resource that requires the password. These encrypted files may be organized into meaningful folder hierarchies, copied from computer to computer, and, in general, manipulated using standard command line file management utilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ZX Pokemaster

    ZX Pokemaster

    Tool for sorting/renaming files for ZX Spectrum

    This project is CLOSED. Do no expect any updates from me. I DO NOT do programming projects anymore. The source code is available here, you're welcome to fork it and take over development: github.com/ladyeklipse/ZX-Pokemaster Sorting/renaming files and managing cheats for ZX Spectrum. Files are sorted and renamed based on MD5 hashes (if availble). ZX Pokemaster incorporates AllTipshopPokes database, which contains all known multiface pokes, scraped directly from www.the-tipshop.co.uk...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 12
    Slam Mirror Bot

    Slam Mirror Bot

    Aria/qBittorrent Telegram Mirror/Leech Bot

    Slam Mirror Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Based on python-aria-mirror-bot.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    tootbot.py

    A small python 3.x script to replicate tweets on a mastodon account

    ...Forked from https://github.com/cquest/tootbot Specialized in RSS feed, in particular from Nitter (https://nitter.net - https://github.com/zedeus/nitter) It gets the tweets from RSS available at https://nitter.net, then does some cleanup on the content: twitter tracking links (t.co) are dereferenced twitter hosted pictures are retrieved and uploaded to mastodon the tweets from RSS source's are joined based on the domain name to avoid duplicates It can also toot RSS/atom feeds (see cron-example.sh). A sqlite database is used to keep track of tweets than have been tooted. The script is simply called by a cron job and can run on any server.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    IL Music Library Deduplicator

    IL Music Library Deduplicator

    Easily find and remove duplicates from a music collection

    ...It can distinguish between similar songs offering you a list with best matches highlighted in green. Audio files with lower similarity are listed as well in case there are duplicates with lower bitrate. These are highlighted in yellow and red. You can select them by color if you wish to delete them or you can add them to a list with excluded files in case you wish to keep duplicates or not to be scanned next time. First scan may take a while depending on your PC and number of audio files, but on subsequent scans only newly added files will be scanned. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Music Merge Manager

    Identify duplicate songs based on track metadata (ID3, etc)

    ...Toggles let you determine how specific the matching criteria must be: Track number and song name? Artist, album, and track length? All of the above? Ultimately this program will let you delete duplicates from either the left or right trees, or copy non-duplicated files from one tree to the other. However until the matching code gets a lot more reliable the only action allowed is to export the list of files suspected to be duplicated in both trees. (Prerequisite: uses the `Mutagen' library to process ID3 data and `wx' for the UI.) ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MusicPlayer

    MusicPlayer

    Music player - endlessly plays your music

    This music player is supposed to be simple and all centered around an infinite intelligent queue (some other players call this PartyShuffle or DJ mode). You can manually add songs to it - but if you don't or the queue gets too short, it will automatically intelligently fill it with further songs. The intelligent queue decision currently is based on: * song ratings * context-based choices, e.g. related songs more likely Other features of this player: * open source, simplified BSD...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Hierarchical cluster engine HCE

    Hierarchical Cluster Engine project

    The main idea of this project – to implement the solution that can be used to: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results (aggregation, duplicates elimination, sorting and so on), internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language... ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    This is the official homepage of PyURLSnooper, a program written to help users locate the urls of audio and video files so that they can be recorded. This is a OS independent equivalent of URL Snooper from http://www.donationcoder.com/ (similar to https://sourceforge.net/projects/mediasniffer/). Can be used in combination with RTMPDump (http://rtmpdump.mplayerhq.hu/) in order to capture streams.
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DeployAgility changes the build and deployment paradigm. Instead of enabling a "Tower of Babel" in which each development team duplicates efforts to create its own automated build and deployment process, DeployAgility inverts this practice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Muton is a musical garbage collector. It helps you rename (copy, move) files by tags, find duplicates, also you can intersect own collection with your friend collection, copy diff..., and more!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Terz the incredible AutoTagger is a automated Audio File Tagger and File Renamer. It uses information provided by musicbrainz. By using context information Terz achieves very good results. Other features: show incomplete albums, search for Duplicates ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    PyBookmark manipulates bookmark files. It can sync files (no server required), merge, sort, remove duplicates, and check links. Its library pybookmarklib provides access to these operations, data structures, and parser for further extensibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This an experimental project that is supposed to help you keep track of many files over multiple devices, finding duplicates and logging them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Video DeDup

    Find duplicate videos by content

    ...Then a lot of options and cache mechanism enable to get correct performance. At the end all duplicates are copied in an analysed folder. . To finalise you'll have to look at analysed folder to make your decisions : remove duplicate, flag some images as irrelevent or flag pair of videos as NOT dupes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB