Showing 33 open source projects for "python web crawler"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    ArchiveBox

    ArchiveBox

    Open source self-hosted web archiving

    ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline. Without active preservation effort, everything on the internet eventually disappears or degrades. Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Rockstor

    Rockstor

    BTRFS based NAS and private cloud storage solution

    ..." are powered by a Docker-based application hosting framework. And new ones can be simply added. These Rock-ons, combined with advanced NAS features, turn Rockstor into a private cloud storage solution accessible from anywhere, giving users complete control of cost, ownership, privacy and data security. Rockstor UI is written in Javascript, making it simple to manage everything from your Web browser. The backend is written in Python and exposes RESTful APIs to easily extend functionality!
    Downloads: 26 This Week
    Last Update:
    See Project
  • 3
    Plum Cave

    Plum Cave

    A cloud backup solution that employs advanced cryptography

    A cloud backup solution that employs the "ChaCha20 + Serpent-256 CBC + HMAC-SHA3-512" authenticated encryption scheme for data encryption and ML-KEM-1024 for quantum-resistant key exchange. Check it out at https://plum-cave.netlify.app/ GitHub page: https://github.com/Northstrix/plum-cave
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    migrid

    migrid

    A grid middleware with minimal user and resource requirements

    [This project moved to Github and is no longer maintained here] Minimum intrusion Grid (MiG) is an attempt to design a new platform for Grid computing which is driven by a stand-alone approach to Grid, rather than integration with existing systems. The goal of the MiG project is to provide Grid infrastructure where the requirements on users and resources alike is as small as possible (minimum intrusion). MiG strives for minimum intrusion but will seek to provide a feature rich and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 5

    Delayter

    Utility to queue files for deferred deletion, days/weeks/months later

    ... not remember whether they are important or not. There are safeguards to protect against accidental deletion. Scheduled deletions can be viewed and also retracted. Files will not be deleted if they have been modified since scheduled for deletion. The documentation at the Delayter web site is well-written and extensive. Please take a minute to read the "Overview" section.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Configuration Backup (ConfiBack)

    Configuration Backup (ConfiBack)

    Project for backing up network device configuration

    Using this project you can make backup and track changes of configuration of network devices like switches, routers, etc.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    A set of tools (command line and GUI) to provide a complete digital photo workflow for Unixes. EXIF headers are used as the central information repository, so users may change their software at any time without loosing any data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    diskover

    diskover

    File system crawler and disk space usage software

    diskover is a file system crawler and disk space usage software that uses Elasticsearch to index your file metadata. diskover crawls and indexes your files on a local computer or remote storage server over network mounts. diskover helps manage your storage by identifying old and unused files and give better insights into data change "hotfiles", file duplication "dupes" and wasted space. It is designed to help deal with managing large amounts of data growth and provide detailed storage...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    mediaTUM is free software written in Python for archiving and retrieval of images, documents and other research data. It was originally developed in the framework of the DFG project IntegraTUM and is continuously expanded with new functionalities as required.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 10
    angular-filemanager

    angular-filemanager

    JavaScript file manager Material Design folder explorer

    A very smart filemanager to manage your files in the browser developed in AngularJS following Material Design styles by Jonas Sciangula Street. This project provides a web file manager interface, allowing you to create your own backend connector following the connector API. By the way, we provide some example backend connectors in many languages as an example (PHP-FTP, PHP-local, python, etc). Pick files callback for third parties apps. Directory tree navigation. Copy, Move, Rename (Interactive...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Cloud Export is a tool to automatically extract your data from web applications and save it to your local file system for backup purposes, but more extensive than Google Takeout. Plans are based on http://www.dataliberation.org.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    RiverGlass EssentialScanner is an open source web and file system crawler which indexes the text content of discovered files so they can be retrieved and analyzed. It provides simple scanner capabilities as part of larger enterprise search solutions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Sushi, huh? is an aplication for download GNU/Linux packages from another OS or Linux distribution, for an posterior offline installation. Thinked for people that not have conexion to Internet.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    pyTarget
    Implement a powerful iSCSI target in python, easily use under most popular systems. It also includes the following features: multi-target, multi-connect/session support chap authentication support header & data digest support erl =2, VTL, etc...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Backup and restore of files to web mail systems, ftp, sftp. Uses free storage of gmail/hotmail etc. Archives files, splits large files, encrypts and uploads. Requires python (tested with python 2.5)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    A small Python script that allows administrators to place quotas on *nix accounts without much technical knowledge or root access. It is ideal for those who share and/or resell web hosting or other resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Universal information crawler is a fast precise and reliable Internet crawler. Uicrawler is a program/automated script which browses the World Wide Web in a methodical, automated manner and creates the index of documents that it accesses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Mutualized distant storage space management tool (using a distributed system).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Intronet is a light weighted framework which allows user to work with remote Linux system and to do some administration tasks using web browser. It's fully usable in browsers at mobile devices such pda, modern cell phones, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Arrowbase is a collection of tools for backup persoses. Together they combine a backup system that can be used on more then one Operating system. This makes the project not only widely spread but portable as wel.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    XSDB XML is to DATA as HTML is to DOCUMENT. Publish and combine data as easily as HTML format and web browsers publish and view documents. Implementations in Python, javascript, java, C#/.NET.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    A single purpose id3 tagger / file renamer which populates an album of songs with proper album/track information from Amazon Web Services (AWS).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    CAIRN is a modular copy and restore program for the imaging of a computer. It copies every file on a computer and figures out how to recreate it from scratch. It is primarily network oriented but is also flexible enough to boot from any possible method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.