Search Results for "python web crawler" - Page 7

Showing 2703 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 1
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Flask-Limiter

    Flask-Limiter

    Rate Limiting extension for Flask

    Flask-Limiter provides rate-limiting features to flask applications. It allows configuring various backends to persist the rate limits, which is provided by the limits library. Sponsored by Zuplo - fully-managed API Gateway with rate limiting, authentication, and more. Add rate limiting to your API in minutes, try it at zuplo.com Test it out. The fast endpoint respects the default rate limit while the slow endpoint uses the decorated one. ping has no rate limit associated with it. By adding...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    AWS Toolkit for JetBrains

    AWS Toolkit for JetBrains

    A plugin for interacting with AWS from JetBrains IDEs

    The AWS Toolkit for JetBrains makes it easier to write applications built on Amazon Web Services. If you come across bugs with the toolkit or have feature requests, please raise an issue on our GitHub repository. See the user guide for how to get started, along with what features/services are supported. CodeWhisperer uses machine learning to generate code suggestions from the existing code and comments in your IDE. Supported languages include: Java, Python, and JavaScript. In addition...
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas | Run databases anywhere Icon
    MongoDB Atlas | Run databases anywhere

    Ensure the availability of your data with coverage across AWS, Azure, and GCP on MongoDB Atlas—the multi-cloud database for every enterprise.

    MongoDB Atlas allows you to build and run modern applications across 125+ cloud regions, spanning AWS, Azure, and Google Cloud. Its multi-cloud clusters enable seamless data distribution and automated failover between cloud providers, ensuring high availability and flexibility without added complexity.
    Learn More
  • 5
    Kinto

    Kinto

    A generic JSON document store with sharing and synchronisation options

    Kinto is a minimalist JSON storage service with synchronization and sharing abilities. It is meant to be easy to use and easy to self-host. Kinto is used at Mozilla and released under the Apache v2 license. It’s hard for frontend developers to respect users' privacy when building applications that work offline, store data remotely and synchronize across devices. Existing solutions either rely on big corporations that crave user data or require a non-trivial amount of time and expertise to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Synapse

    Synapse

    Matrix reference homeserver

    Matrix is an ambitious new ecosystem for open federated Instant Messaging and VoIP. Everything in Matrix happens in a room. Rooms are distributed and do not exist on any single server. Rooms can be located using convenience aliases like #matrix:matrix.org or #test:localhost:8448. Synapse is currently in rapid development, but as of version 0.5 we believe it is sufficiently stable to be run as an internet-facing service for real usage! Create and manage fully distributed chat rooms with no...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Confluent's .NET Client for Apache Kafka

    Confluent's .NET Client for Apache Kafka

    Confluent's Apache Kafka .NET client

    confluent-kafka-dotnet is Confluent's .NET client for Apache Kafka and the Confluent Platform. Confluent-kafka-dotnet is a lightweight wrapper around librdkafka, a finely tuned C client. There are a lot of details to get right when writing an Apache Kafka client. We get them right in one place (librdkafka) and leverage this work across all of our clients (also confluent-kafka-python and confluent-kafka-go). Confluent, founded by the creators of Kafka, is building a streaming platform...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    TorBot

    TorBot

    Dark Web OSINT Tool

    Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Requests for PHP

    Requests for PHP

    Requests for PHP is a humble HTTP request library

    Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python library. Requests is ISC Licensed (similar to the new BSD license) and has no dependencies, except for PHP 5.6+. Despite PHP’s use as a language for the web, its tools for sending HTTP requests are severely lacking. cURL has an interesting API, to say the least, and you can’t always rely on it being available. Sockets provide only low-level access and require you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Lexbor

    Lexbor

    Lexbor is development of an open source HTML Renderer library

    Lexbor is the development of a web browser engine available as a software library; it ships with a free license and has no extra dependencies. For us, speed is an absolute must-have. In our development process, we focus on fastest parsing techniques for HTML, CSS, and fonts, fastest data processing methods, and fastest ways to serve content to end users. Whether you are building a backend that handles millions of HTML documents or a UI-heavy user app, your software’s response rate always...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Venom

    Venom

    Venom is the most complete javascript library for Whatsapp

    Venom is a high-performance system developed with JavaScript to create a bot for WhatsApp, support for creating any interaction, such as customer service, media sending, sentence recognition based on artificial intelligence and all types of design architecture for WhatsApp. It's a high-performance alternative API to whatzapp, you can send, text messages, files, images, videos and more. Remember, the API was developed on a platform called RESTful Web services, providing interoperability between...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    MechanicalSoup

    MechanicalSoup

    A Python library for automating interaction with websites

    A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do JavaScript. MechanicalSoup was created by M Hickford, who was a fond user of the Mechanize library. Unfortunately, Mechanize was incompatible with Python 3 until 2019 and its development stalled for several years. MechanicalSoup provides a similar API, built on Python giants Requests (for HTTP sessions...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Textual

    Textual

    Textual is a TUI (Text User Interface) framework for Python

    Textual is a Python framework for creating interactive applications that run in your terminal. Textual adds interactivity to Rich with a Python API inspired by modern web development. On modern terminal software (installed by default on most systems), Textual apps can use 16.7 million colors with mouse support and smooth flicker-free animation. A powerful layout engine and re-usable components makes it possible to build apps that rival the desktop and web experience. Textual runs on Linux...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    DrissionPage

    DrissionPage

    Python based web automation tool. Powerful and elegant

    DrissionPage is a Python-based automation framework that blends the capabilities of Selenium for browser automation with Requests-HTML for fast, headless web data extraction. It enables seamless switching between browser-controlled and headless HTTP sessions within the same interface. Ideal for web scraping, testing, and automation, DrissionPage is lightweight and highly efficient, offering more flexibility than standard Selenium or Requests usage alone.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    redis-py

    redis-py

    Redis Python client

    redis-py is the official Python client for interacting with Redis, the in-memory data structure store. It supports all Redis commands and data types, making it easy to build caching, messaging, or real-time analytics features in Python applications. With both synchronous and asyncio support, redis-py is suited for modern Python projects and integrates smoothly into web frameworks, task queues, and backend services.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Payloads All The Things

    Payloads All The Things

    A list of useful payloads and bypass for Web Application Security

    A list of useful payloads and bypasses for Web Application Security. Feel free to improve with your payloads and techniques. The API key is a unique identifier that is used to authenticate requests associated with your project. Some developers might hardcode them or leave it on public shares.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Browser Use MCP Server

    Browser Use MCP Server

    Browse the web, directly from Cursor etc.

    A browser automation server implementing the Model Context Protocol, designed to allow AI assistants to browse the web directly from applications like Cursor. It supports natural language commands for web navigation and interaction. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Dulwich

    Dulwich

    Pure-Python Git implementation

    Dulwich is a Python implementation of the Git file formats and protocols, which does not depend on Git itself. All functionality is available in pure Python. Optional C extensions can be built for improved performance. Dulwich takes its name from the area in London where the friendly Mr. and Mrs. Git once attended a cocktail party. Supported Python versions are Python 3.5 and later. Versions of Dulwich prior to 0.20 also supported Python 2.7. Supported platforms include Linux, Mac OS X...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Django MarkdownX

    Django MarkdownX

    Comprehensive Markdown plugin built for Django

    Django MarkdownX is a comprehensive Markdown plugin built for Django, the renowned high-level Python web framework, with flexibility, extensibility, and ease-of-use at its core.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    socketify.py

    socketify.py

    Bringing Http/Https and WebSockets High Performance servers for PyPy3

    Socketify.py is a reliable, high-performance Python web framework for building large-scale app backends and microservices. With no precedents websocket performance and a really fast HTTP server that can delivery encrypted TLS 1.3 quicker than most alternative servers can do even unencrypted, cleartext messaging.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Starlette

    Starlette

    The little ASGI framework that shines

    Starlette is a lightweight ASGI framework/toolkit, which is ideal for building async web services in Python. It is production-ready and gives you a lightweight, low-complexity HTTP web framework. WebSocket support. In-process background tasks. Startup and shutdown events. Test client built on httpx. CORS, GZip, Static Files, streaming responses. Session and Cookie support. 100% test coverage. 100% type annotated codebase. Few hard dependencies. Compatible with asyncio and trio backends.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    ScrapydWeb

    ScrapydWeb

    Web app for Scrapyd cluster management

    Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization. Make sure that Scrapyd has been installed and started on all of your hosts. Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings on the first startup.) Add your Scrapyd servers, both formats of string and tuple are supported, you can attach basic auth for accessing the Scrapyd server, as well as a string for grouping or labeling. You can select any number...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    img2dataset

    img2dataset

    Easily turn large sets of image urls to an image dataset

    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets. Opt-out directives: Websites can pass the http headers X-Robots-Tag: noai, X-Robots-Tag: noindex , X-Robots-Tag: noimageai and X-Robots-Tag: noimageindex By default img2dataset will ignore images with such headers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 1 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.