Search Results for "python web crawler" - Page 6

Showing 2669 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Letterboxd Recommendations

    Letterboxd Recommendations

    Scraping publicly-accessible Letterboxd data for movie recommendations

    Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username. A user's "star" ratings are scraped from their Letterboxd profile and assigned numerical ratings from 1 to 10 (accounting for half stars). Their ratings are then combined with a sample of ratings from the top 4000 most active users on the site to create a collaborative filtering recommender model using singular value...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Flask-Limiter

    Flask-Limiter

    Rate Limiting extension for Flask

    Flask-Limiter provides rate-limiting features to flask applications. It allows configuring various backends to persist the rate limits, which is provided by the limits library. Sponsored by Zuplo - fully-managed API Gateway with rate limiting, authentication, and more. Add rate limiting to your API in minutes, try it at zuplo.com Test it out. The fast endpoint respects the default rate limit while the slow endpoint uses the decorated one. ping has no rate limit associated with it. By adding...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 5
    MedicalGPT

    MedicalGPT

    MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training

    MedicalGPT training medical GPT model with ChatGPT training pipeline, implementation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. MedicalGPT trains large medical models, including secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Feast

    Feast

    Feature Store for Machine Learning

    Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference. Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Kinto

    Kinto

    A generic JSON document store with sharing and synchronisation options

    Kinto is a minimalist JSON storage service with synchronization and sharing abilities. It is meant to be easy to use and easy to self-host. Kinto is used at Mozilla and released under the Apache v2 license. It’s hard for frontend developers to respect users' privacy when building applications that work offline, store data remotely and synchronize across devices. Existing solutions either rely on big corporations that crave user data or require a non-trivial amount of time and expertise to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    TorBot

    TorBot

    Dark Web OSINT Tool

    Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MechanicalSoup

    MechanicalSoup

    A Python library for automating interaction with websites

    A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do JavaScript. MechanicalSoup was created by M Hickford, who was a fond user of the Mechanize library. Unfortunately, Mechanize was incompatible with Python 3 until 2019 and its development stalled for several years. MechanicalSoup provides a similar API, built on Python giants Requests (for HTTP sessions...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Uplink

    Uplink

    A Declarative HTTP Client for Python

    A Declarative HTTP Client for Python. Inspired by Retrofit. Uplink is in beta development. The public API is still evolving, but we expect most changes to be backward compatible at this point. Uplink turns your HTTP API into a Python class. Build an instance to interact with the web service. Then, executing an HTTP request is as simply as invoking a method. Use decorators and type hints to describe each HTTP request. JSON, URL-encoded, and multipart request body and file upload. URL parameter...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Textual

    Textual

    Textual is a TUI (Text User Interface) framework for Python

    Textual is a Python framework for creating interactive applications that run in your terminal. Textual adds interactivity to Rich with a Python API inspired by modern web development. On modern terminal software (installed by default on most systems), Textual apps can use 16.7 million colors with mouse support and smooth flicker-free animation. A powerful layout engine and re-usable components makes it possible to build apps that rival the desktop and web experience. Textual runs on Linux...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    uvicorn-gunicorn-fastapi

    uvicorn-gunicorn-fastapi

    Docker image with Uvicorn managed by Gunicorn

    Docker image with Uvicorn managed by Gunicorn for high-performance FastAPI web applications in Python with performance auto-tuning. Optionally with Alpine Linux.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Mercury

    Mercury

    Convert Python notebook to web app and share with non-technical users

    Turn Python notebooks to web applications with open-source Mercury framework. Hide code and add interactive widgets. Non-technical users can tweak widgets and execute notebook with new parameters. The core of Mercury is Open Source under AGPLv3. We provide Mercury Pro with additional features, dedicated support and friendly commercial license. Mercury is a perfect tool to convert Python notebook to interactive web application and share with non-programmers. You define interactive widgets...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Selectolax

    Selectolax

    Python binding to Modest and Lexbor engines

    A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    PyFCM

    PyFCM

    Python client for FCM - Firebase Cloud Messaging

    Python client for FCM - Firebase Cloud Messaging (Android, iOS and Web) Firebase Cloud Messaging (FCM) is the new version of GCM. It inherits the reliable and scalable GCM infrastructure, plus new features. GCM users are strongly recommended to upgrade to FCM. Using FCM, you can notify a client app that new email or other data is available to sync. You can send notifications to drive user reengagement and retention. For use cases such as instant messaging, a message can transfer a payload of up...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Payloads All The Things

    Payloads All The Things

    A list of useful payloads and bypass for Web Application Security

    A list of useful payloads and bypasses for Web Application Security. Feel free to improve with your payloads and techniques. The API key is a unique identifier that is used to authenticate requests associated with your project. Some developers might hardcode them or leave it on public shares.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Dulwich

    Dulwich

    Pure-Python Git implementation

    Dulwich is a Python implementation of the Git file formats and protocols, which does not depend on Git itself. All functionality is available in pure Python. Optional C extensions can be built for improved performance. Dulwich takes its name from the area in London where the friendly Mr. and Mrs. Git once attended a cocktail party. Supported Python versions are Python 3.5 and later. Versions of Dulwich prior to 0.20 also supported Python 2.7. Supported platforms include Linux, Mac OS X...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Django MarkdownX

    Django MarkdownX

    Comprehensive Markdown plugin built for Django

    Django MarkdownX is a comprehensive Markdown plugin built for Django, the renowned high-level Python web framework, with flexibility, extensibility, and ease-of-use at its core.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    socketify.py

    socketify.py

    Bringing Http/Https and WebSockets High Performance servers for PyPy3

    Socketify.py is a reliable, high-performance Python web framework for building large-scale app backends and microservices. With no precedents websocket performance and a really fast HTTP server that can delivery encrypted TLS 1.3 quicker than most alternative servers can do even unencrypted, cleartext messaging.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Starlette

    Starlette

    The little ASGI framework that shines

    Starlette is a lightweight ASGI framework/toolkit, which is ideal for building async web services in Python. It is production-ready and gives you a lightweight, low-complexity HTTP web framework. WebSocket support. In-process background tasks. Startup and shutdown events. Test client built on httpx. CORS, GZip, Static Files, streaming responses. Session and Cookie support. 100% test coverage. 100% type annotated codebase. Few hard dependencies. Compatible with asyncio and trio backends.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    ScrapydWeb

    ScrapydWeb

    Web app for Scrapyd cluster management

    Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization. Make sure that Scrapyd has been installed and started on all of your hosts. Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings on the first startup.) Add your Scrapyd servers, both formats of string and tuple are supported, you can attach basic auth for accessing the Scrapyd server, as well as a string for grouping or labeling. You can select any number...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    img2dataset

    img2dataset

    Easily turn large sets of image urls to an image dataset

    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets. Opt-out directives: Websites can pass the http headers X-Robots-Tag: noai, X-Robots-Tag: noindex , X-Robots-Tag: noimageai and X-Robots-Tag: noimageindex By default img2dataset will ignore images with such headers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Tarsier

    Tarsier

    Vision utilities for web interaction agents

    At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to build a powerful perception system for web agents... Tarsier! In the video below, we use Tarsier to provide webpage perception for a minimalistic GPT-4 LangChain web agent. Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements...
    Downloads: 1 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.