Showing 739 open source projects for "python web crawler"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Wifipumpkin3

    Wifipumpkin3

    Powerful framework for rogue access point attack

    wifipumpkin3 is powerful framework for rogue access point attack, written in Python, that allow and offer to security researchers, red teamers and reverse engineers to mount a wireless network to conduct a man-in-the-middle attack.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    AIOHTTP

    AIOHTTP

    Asynchronous HTTP client/server framework for asyncio and Python

    Asynchronous HTTP Client/Server for asyncio and Python. AIOHTTP supports both client and server side of HTTP protocol. A long awaited new feature is tracing client request life cycle to figure out when and why client request spends a time waiting for connection establishment, getting server response headers etc. Now it is possible by registering special signal handlers on every request processing stage. The main change is dropping yield from support and using async/await everywhere. Farewell...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    Flask RESTX

    Flask RESTX

    Fully featured framework for fast, easy and documented API development

    Fork of Flask-RESTPlus fully featured framework for fast, easy and documented API development with Flask. Flask-RESTX is an extension for Flask that adds support for quickly building REST APIs. Flask-RESTX encourages best practices with minimal setup. If you are familiar with Flask, Flask-RESTX should be easy to pick up. It provides a coherent collection of decorators and tools to describe your API and expose its documentation properly using Swagger. With Flask-RESTX, you only import the api...
    Downloads: 9 This Week
    Last Update:
    See Project
  • Simply solve complex auth. Easy for devs to set up. Easy for non-devs to use. Icon
    Simply solve complex auth. Easy for devs to set up. Easy for non-devs to use.

    Transform user access with Frontegg CIAM: login box, SSO, MFA, multi-tenancy, and 99.99% uptime.

    Custom auth drains 25% of dev time and risks 62% more breaches, stalling enterprise deals. Frontegg platform delivers a simple login box, seamless authentication (SSO, MFA, passwordless), robust multi-tenancy, and a customizable Admin Portal. Integrate fast with the React SDK, meet compliance needs, and focus on innovation.
    Start for Free
  • 5
    StabilityMatrix

    StabilityMatrix

    Multi-Platform Package Manager for Stable Diffusion

    StabilityMatrix is a multi-platform package manager and inference UI for Stable Diffusion / AI image generation tools. It lets users install, update, and manage different Stable Diffusion based Web UIs (like Automatic1111, ComfyUI, SD-Next etc.), manage plugins/extensions, models, provide inference UI (generate images) with project/workspace support, drag & drop gallery, shared model directories etc. It is designed to be portable, embedded dependencies, cross-platform, with features like syntax...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    Ulauncher

    Ulauncher

    Feature rich application Launcher for Linux

    ​ Type in an application name without worrying about spelling. Ulauncher will figure out what you meant. It also remembers your previous choices and automatically selects the best option for you. Ulauncher provides 4 themes built in. But if you need something different you can always create a custom color theme. Improve your workflow with customizable shortcuts and extensions. Create a shortcut for web search or your scripts or install a 3rd party extension.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    Metaflow

    Metaflow

    A framework for real-life data science

    Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    ... electron). Customize as you wish, simulate another browser environment. Perfect HTML5 support, friendly to various front-end libraries (support HTML5, and friendly to front framework). After turning off the cross-domain switch, you can use various cross-domain functions (support cross-domain). Headless mode, which greatly saves resources and is suitable for crawlers (headless mode, be suitable for Web Crawler).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    NiceGUI

    NiceGUI

    Create web-based user interfaces with Python

    NiceGUI is a Python-based UI framework that enables developers to create interactive web applications using only Python code. It abstracts away the complexities of HTML, CSS, and JavaScript, allowing for rapid development of web interfaces directly from Python scripts. NiceGUI is suitable for building dashboards, control panels, and other web-based tools, especially in contexts like robotics and data visualization.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    TorBot

    TorBot

    Dark Web OSINT Tool

    Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    PyMySQL

    PyMySQL

    MySQL client library for Python

    PyMySQL is a 100% Python implementation of the MySQL client protocol, allowing Python applications to connect to MySQL and MariaDB databases without requiring binary extensions. It supports standard DB‑API 2.0 features, such as cursors, transactions, and parameterized queries. PyMySQL is versatile for web applications, scripts, and tools, offering compatibility with ORMs like SQLAlchemy and frameworks like Django.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    OSRFramework

    OSRFramework

    OSRFramework, the Open Sources Research Framework is a AGPLv3+ project

    OSRFramework is a GNU AGPLv3+ set of libraries developed by i3visio to perform Open Source Intelligence collection tasks. They include references to a bunch of different applications related to username checking, DNS lookups, information leaks research, deep web search, regular expressions extraction and many others. At the same time, by means of ad-hoc Maltego transforms, OSRFramework provides a way of making these queries graphically as well as several interfaces to interact with like...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    SimpleLogin

    SimpleLogin

    The SimpleLogin back-end

    With email aliases, you can be anonymous online and protect your inbox against spams and phishing. Open-source. Made and hosted in Europe. Receive and send emails anonymously. Next time a website asks for your email address, give an alias instead of your real email. Emails sent to an alias are instantly forwarded to your inbox without the sender knowing anything. Just hit "Reply" if you want to reply to a forwarded email: the reply is sent from your alias and your real email stays hidden....
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    Whoogle Search

    Whoogle Search

    A self-hosted, ad-free, privacy-respecting metasearch engine

    Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile. Autocomplete/search suggestions. POST request search and suggestion queries (when possible). View images at full res without site redirect (currently mobile only). Light/Dark/System theme modes (with...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    pywebview

    pywebview

    Build GUI for your Python program with JavaScript, HTML, and CSS

    pywebview is a lightweight cross-platform wrapper around a webview component that allows to display HTML content in its own native GUI window. It gives you power of web technologies in your desktop application, hiding the fact that GUI is browser based. You can use pywebview either with a lightweight web framework like Flask or Bottle or on its own with a two way bridge between Python and DOM. pywebview uses native GUI for creating a web component window: WinForms on Windows, Cocoa on macOS...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    Venom

    Venom

    Venom is the most complete javascript library for Whatsapp

    Venom is a high-performance system developed with JavaScript to create a bot for WhatsApp, support for creating any interaction, such as customer service, media sending, sentence recognition based on artificial intelligence and all types of design architecture for WhatsApp. It's a high-performance alternative API to whatzapp, you can send, text messages, files, images, videos and more. Remember, the API was developed on a platform called RESTful Web services, providing interoperability between...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    Indico

    Indico

    A feature-rich event management system

    The effortless open-source tool for event organization, archival, and collaboration. Event-organization workflow that fits lectures, meetings, workshops, and conferences. A feature-rich event management system, made @ CERN, the place where the Web was born. A powerful and flexible hierarchical content management system for events, a full-blown conference organization workflow with call for Abstracts and abstract reviewing modules; flexible registration form creation and configuration...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Flask App Builder

    Flask App Builder

    Simple and rapid application development framework

    Simple and rapid application development framework, built on top of Flask. includes detailed security, auto CRUD generation for your models, google charts and much more. Automatic permissions lookup, based on exposed methods. Inserts on the Database all the detailed permissions possible on your application. Public (no authentication needed) and Private permissions. Role-based permissions. Authentication support for OpenID, Database and LDAP. Support for self-user registration. Automatic,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Locust

    Locust

    Scalable open source load testing tool

    Locust is an open source user load testing tool written in Python. The idea behind Locust is to swarm your web site or other systems with attacks from simulated users during a test, with each user behavior defined by you using Python code. This swarming process is then monitored from a web UI in real-time, and will help identify any bottlenecks in your code before real users can come in. As it is completely event-based, Locust can have thousands or even millions of simultaneous users...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    AWS Toolkit for JetBrains

    AWS Toolkit for JetBrains

    A plugin for interacting with AWS from JetBrains IDEs

    The AWS Toolkit for JetBrains makes it easier to write applications built on Amazon Web Services. If you come across bugs with the toolkit or have feature requests, please raise an issue on our GitHub repository. See the user guide for how to get started, along with what features/services are supported. CodeWhisperer uses machine learning to generate code suggestions from the existing code and comments in your IDE. Supported languages include: Java, Python, and JavaScript. In addition...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    SeleniumBase

    SeleniumBase

    A framework for browser automation and testing with Selenium

    SeleniumBase automatically handles common WebDriver actions such as launching web browsers before tests, saving screenshots during failures, and closing web browsers after tests. SeleniumBase lets you customize test runs from the command line. SeleniumBase uses simple syntax for commands. pytest includes automatic test discovery. If you don't specify a specific file or folder to run, pytest will automatically search through all subdirectories for tests to run. No More Flaky Tests! SeleniumBase...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Douyin TikTok Download API

    Douyin TikTok Download API

    Douyin TikTok Download API

    Use the official interface to capture Douyin|TikTok data, support API calls, Web portals, and batch analysis. Fast, asynchronous, free, open source, ad-free, long-term maintenance. This project is based on PyWebIO , FastAPI , HTTPX , a fast and asynchronous Douyin / TikTok data crawling tool, and realizes online batch parsing and downloading of watermark-free videos or atlases through the web, data crawling API, and iOS shortcut instructions for watermark-free download and other functions. You...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Wagtail

    Wagtail

    A Django content management system focused on flexibility & UX

    Wagtail is a powerful, open source content management system that’s focused on flexibility and user experience. Built on Django, Wagtail offers precise control and flexibility for designers, developers and editors. Designed by developers for developers, Wagtail plays nicely with everything else in your tech stack so you can do more and focus on perfecting your site. Designers will find Wagtail’s simple templating system ideal for building beautiful websites just the way they want, without...
    Downloads: 4 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.