Showing 79 open source projects for "websites"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    MechanicalSoup

    MechanicalSoup

    A Python library for automating interaction with websites

    A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do JavaScript. MechanicalSoup was created by M Hickford, who was a fond user of the Mechanize library. Unfortunately, Mechanize was incompatible with Python 3 until 2019 and its development stalled for several years.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    django CMS

    django CMS

    Easy-to-use and developer-friendly enterprise CMS powered by Django

    Create modern websites that content editors love. django CMS was originally conceived by web developers frustrated with the technical and security limitations of other systems. Its lightweight core makes it easy to integrate with other software and put to use immediately, while its ease of use makes it the go-to choice for content managers, content editors and website admins.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications.
    Downloads: 36 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    crawler

    crawler

    Collection of JS reverse engineering examples for web scraping study

    crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    LinkChecker

    LinkChecker

    Check links in web documents or full websites

    LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub Packages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    img2dataset

    img2dataset

    Easily turn large sets of image urls to an image dataset

    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets. Opt-out directives: Websites can pass the http headers X-Robots-Tag: noai, X-Robots-Tag: noindex , X-Robots-Tag: noimageai and X-Robots-Tag: noimageindex By default img2dataset will ignore images with such headers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OnionShare

    OnionShare

    Securely and anonymously share files of any size

    OnionShare is an open source tool that allows you to securely and anonymously share files of any size, host websites, and chat with friends using the Tor network. There's no need for middlemen that could very well violate the privacy and security of the things you share online. With OnionShare, you can share files directly with just an address in Tor Browser. OnionShare works because it is accessible as a Tor Onion Service. All you need to do is open it and drag and drop the files you want to share into it, and start sharing. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    CommunityScrapers

    CommunityScrapers

    This is a public repository containing scrapers

    ...The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing users to enrich their media libraries with accurate and detailed information without manual entry. The project supports both automatic installation through in-app feeds and manual configuration for advanced use cases. Some scrapers require additional configuration such as API keys or cookies, highlighting its flexibility and adaptability to different sources.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    owllook

    owllook

    Vertical novel search engine with unified reading and tracking tools

    Owllook is an open source vertical search engine designed for discovering and reading online novels from multiple sources. Instead of redirecting users to different sites, the system parses content from many novel platforms and presents it in a unified reading interface. It focuses on providing a simple and comfortable reading experience with features such as searching for books, following updates, bookmarking chapters, and maintaining a personal bookshelf. It aggregates results from...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Wagtail

    Wagtail

    A Django content management system focused on flexibility & UX

    ...Designed by developers for developers, Wagtail plays nicely with everything else in your tech stack so you can do more and focus on perfecting your site. Designers will find Wagtail’s simple templating system ideal for building beautiful websites just the way they want, without any CMS constraints. Editors can create beautiful, modular streams of content that they can create once and publish everywhere. Simply put, it’s the CMS that makes everyone happy!
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    ...Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the browser steps configuration, add basic steps before performing change detection, such as logging into websites, adding a product to a cart, accepting cookie logins, entering dates, and refining searches. Monitor and track PDF file changes, and know when a PDF file has text changes. Know when your favourite product is on sale, or other special deals are announced before anyone else. Detect and monitor changes in JSON API responses.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Offline HTML Viewer

    Offline HTML Viewer

    Fast offline HTML viewer for opening local HTML files on Windows

    ...Typical use cases: • Open saved HTML files without using a web browser • View archived websites offline • Read documentation stored as HTML files • Quickly preview local HTML files
    Downloads: 45 This Week
    Last Update:
    See Project
  • 20
    WhakerKit

    WhakerKit

    A seamless toolkit to manage dynamic websites and shared documents

    WhakerKit is a versatile toolkit for building websites with both static and dynamic HTML pages, developed by Brigitte Bigi, CNRS. WhakerKit offers seamless management of public and authenticated access, and simplifies document sharing for collaborative environments. It is based on the following technologies: * python >= 3.9 * (optional) PyJWT and ldap3 for authentication (install with pip) * WhakerPy >= 1.3: <https://whakerpy.sourceforge.io> (install with pip) * Whakerexa >= 0.7: <https://whakerexa.sourceforge.io> (download package and unzip) * HTML-5, CSS-4 and JS technologies
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    YehDown

    YehDown

    Yeahdown: Easy-to-use video downloader for Windows

    Yeahdown is a straightforward, user-friendly Windows-based application designed to simplify the process of downloading videos and audio from popular websites like YouTube and Vimeo. Perfect for non-technical users, it offers an intuitive interface and fast, reliable downloads. Key features include improved download speeds, support for multiple major video platforms, and real-time updates for new features. Tested on windows 11.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 22
    Web Link Collector 1000

    Web Link Collector 1000

    Automatically collect all links from websites to a clean txt file

    ## About Easily and automatically collect all your links into a neat txt list from a particular website or an entire section of a multi-page website network! Web Link Collector 1000 is a simple tool for gathering links from websites with minimal effort. It helps you collect resources for research, create reference lists, or save useful links without manual copying and pasting. ## Features - Two Collection Modes: Single page or multiple pages of specific website section, or even the entire domain! - Smart Filtering: Include only same-domain links or gather external links too - Duplicate Prevention: Automatically removes duplicate links - Website-Friendly: Uses respectful delays between requests - Custom File Naming: Save your collections with custom meaningful names - Modern Interface: Clean design with status updates - Link Normalization: Standardizes URLs for proper formatting
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Web Shortcuts
    This tool is used to open websites/links by pressing one or more keys on the keyboard, acting as a true shortcut for web pages. When the shortcut keys are pressed, you will be directed to the site previously entered through the main browser set in the system (if the tool does not work after setting the shortcuts, try restarting it).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Proxy_Pool

    Proxy_Pool

    Python crawler proxy IP pool (proxy pool)

    The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    S.I.P.E.R.

    S.I.P.E.R.

    Advanced website blocking and productivity tool

    A powerful, user-friendly website blocking and productivity application built with modern GTK 4 and Libadwaita. S.I.P.E.R. helps you maintain focus and productivity by blocking distracting websites with advanced features like Pomodoro focus mode, comprehensive statistics, and multi-language support.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB