Showing 1049 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 1
    MechanicalSoup

    MechanicalSoup

    A Python library for automating interaction with websites

    A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do JavaScript. MechanicalSoup was created by M Hickford, who was a fond user of the Mechanize library. Unfortunately, Mechanize was incompatible with Python 3 until 2019 and its development stalled for several years. MechanicalSoup provides a similar API, built on Python giants Requests (for HTTP sessions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Uplink

    Uplink

    A Declarative HTTP Client for Python

    A Declarative HTTP Client for Python. Inspired by Retrofit. Uplink is in beta development. The public API is still evolving, but we expect most changes to be backward compatible at this point. Uplink turns your HTTP API into a Python class. Build an instance to interact with the web service. Then, executing an HTTP request is as simply as invoking a method. Use decorators and type hints to describe each HTTP request. JSON, URL-encoded, and multipart request body and file upload. URL parameter...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PyFCM

    PyFCM

    Python client for FCM - Firebase Cloud Messaging

    Python client for FCM - Firebase Cloud Messaging (Android, iOS and Web) Firebase Cloud Messaging (FCM) is the new version of GCM. It inherits the reliable and scalable GCM infrastructure, plus new features. GCM users are strongly recommended to upgrade to FCM. Using FCM, you can notify a client app that new email or other data is available to sync. You can send notifications to drive user reengagement and retention. For use cases such as instant messaging, a message can transfer a payload of up...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    FastHX

    FastHX

    FastAPI server-side rendering with built-in HTMX support.

    FastHX is a high-performance HTTP and WebSocket server framework designed for Haxe, enabling fast and scalable web application development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 5
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we don't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Mezzanine

    Mezzanine

    CMS framework for Django

    Mezzanine is a powerful open source content management platform built using the Django framework. In many ways it is like many other content management tools, offering an intuitive interface for managing all of your content. But Mezzanine is different in that it provides most of its functionality by default. While other platforms rely heavily on modules or reusable applications, Mezzanine comes ready with all the functionality you need, making it the more efficient choice. Mezzanine has a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CapRover

    CapRover

    Scalable PaaS (automated Docker+nginx), aka Heroku on Steroids

    CapRover is an extremely easy-to-use app/database deployment & web server manager for your NodeJS, Python, PHP, ASP.NET, Ruby, MySQL, MongoDB, Postgres, WordPress (and etc...) applications! It's blazingly fast and very robust as it uses Docker, Nginx, LetsEncrypt and NetData under the hood behind its simple-to-use interface. For a developer who does not like spending hours and days setting up a server, building tools, sending code to the server, building it, getting an SSL certificate...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ConsoleMe

    ConsoleMe

    A central control plane for AWS permissions and access

    ConsoleMe is a web service that makes AWS IAM permissions and credential management easier for end-users and cloud administrators. ConsoleMe provides numerous ways to log in to the AWS Console. An IAM Self-Service Wizard lets users request IAM permissions in plain English. Cross-account resource policies will be automatically generated and can be applied with a single click for certain resource types. Weep (ConsoleMe’s CLI) supports 5 different ways of serving AWS credentials locally. Cloud...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Shynet

    Shynet

    Modern, privacy-friendly, and detailed web analytics

    Modern, privacy-friendly, and detailed web analytics that works without cookies or JS. There are a lot of web analytics tools. Unfortunately, most of them come with the following caveats. They require handing all of your visitors' info to a third-party company They use cookies to track visitors across sessions, so you need to have those annoying cookie notices. They collect so much personal data that even the NSA is jealous. They are closed source and/or expensive, often with limited data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 10
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DNS Crawler

    DNS Crawler

    A Bulk Domain Assessment Tool

    DNS Crawler is a lightweight, Python-based utility designed for efficient batch processing and assessment of internet domain names. It reads from a list of domains formatted as: domain_name <tab> or ; optional_comment and generates a detailed, Excel-compatible CSV report with columns including: DOMAIN: Domain name REG: Registrar SOA, NS, MX, TXT, SPF, DMARC, MS, A, PTR: Common DNS records for comprehensive domain analysis NOTE: Optional comments from the original input file...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Web Shortcuts
    This tool is used to open websites/links by pressing one or more keys on the keyboard, acting as a true shortcut for web pages. When the shortcut keys are pressed, you will be directed to the site previously entered through the main browser set in the system (if the tool does not work after setting the shortcuts, try restarting it).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    HackTools

    HackTools

    The all-in-one Red Team extension for Web Pentesters

    The all-in-one Red Team browser extension for Web Pentesters. HackTools, is a web extension facilitating your web application penetration tests, it includes cheat sheets as well as all the tools used during a test such as XSS payloads, Reverse shells and much more. With the extension you no longer need to search for payloads in different websites or in your local storage space, most of the tools are accessible in one click. HackTools is accessible either in pop-up mode or in a whole tab...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Web Link Collector 1000

    Web Link Collector 1000

    Automatically collect all links from websites to a clean txt file

    ## About Easily and automatically collect all your links into a neat txt list from a particular website or an entire section of a multi-page website network! Web Link Collector 1000 is a simple tool for gathering links from websites with minimal effort. It helps you collect resources for research, create reference lists, or save useful links without manual copying and pasting. ## Features - Two Collection Modes: Single page or multiple pages of specific website section, or even...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Ascoos Web Extended Studio

    Ascoos Web Extended Studio

    Is a portable web server suite for windows 64Bit, for Web Development.

    The Ascoos Web Extended Studio is a special 64Bit freeware version of web server for all Web Developers and Designers and is based on Apache, PHP, MariaDB, MongoDB, Filezilla and other. It offers to user the option of executing different versions of PHP and MariaDB. It is structured for easy upgrading Each new version of the Ascoos Web Extended Studio, includes the latest versions of individual programs without repealing earlier versions. So, you have the opportunity for experiments...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Proxy_Pool

    Proxy_Pool

    Python crawler proxy IP pool (proxy pool)

    The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Scrapyd

    Scrapyd

    A service daemon to run Scrapy spiders

    Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders. A common (and useful) convention to use for the version name is the revision number of the version control tool you’re using to track your Scrapy project code. For example: r23. The versions are not compared alphabetically but using a smarter algorithm (the same packaging uses) so r10 compares greater to r9, for example. Scrapyd is an...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    KemonoDownloader

    KemonoDownloader

    Kemono Downloader - A cross-platform Python app built with PyQt6

    Welcome to Kemono Downloader, a versatile Python-based desktop application built with PyQt6, designed to download content from Kemono.su. This tool enables users to archive individual posts or entire creator profiles from services like Patreon, Fanbox, and more, supporting a wide range of file types with customizable settings and advanced features.
    Leader badge
    Downloads: 517 This Week
    Last Update:
    See Project
  • 22
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 23
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one...
    Leader badge
    Downloads: 273 This Week
    Last Update:
    See Project
  • 24
    Endian Firewall Community
    ... for email traffic (POP and SMTP), content filtering of Web traffic and a "hassle free" VPN solution (based on both OpenVPN and IPsec).
    Leader badge
    Downloads: 357 This Week
    Last Update:
    See Project
  • 25
    Eric Integrated Development Environment

    Eric Integrated Development Environment

    Python Development Environment with all batteries included

    Eric is a Python IDE written using PyQt and QScintilla. It provides various features such as any number of open editors, an integrated (remote) debugger, project management facilities, unit test, refactoring and much more.
    Leader badge
    Downloads: 180 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.