website scraper free download

Showing 16 open source projects for "website scraper"

View related business solutions

Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
$300 Free Credits to Build on Google Cloud
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.

Claim $300 Free
1

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

...Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details. JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.

Downloads: 0 This Week

Last Update: 2024-09-29
See Project
2

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 3 This Week

Last Update: 2026-07-08
See Project
3

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 3 This Week

Last Update: 2026-07-06
See Project
4

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...

Downloads: 2 This Week

Last Update: 2026-05-03
See Project
99.99% Uptime for MySQL and PostgreSQL Databases
Sub-second maintenance. 2x read/write performance. Built-in vector search for AI apps.

Cloud SQL Enterprise Plus delivers near-zero downtime with 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server.

Try Free
5

Logilocky AI Linux

Offline LLM, AI Development and Desktop Focused Linux Distribution

Logilocky AI Linux is a Linux distribution built on top of Debian. The goal of the distribution is to run offline AI models, provide a suitable environment for AI developers, make AI tools easily accessible, and offer an easy-to-use experience for Linux beginners. It also provides tuned kernel settings specifically optimized for AI workloads. The distribution has its own rich ecosystem, featuring numerous desktop applications built from scratch for Logilocky. This includes tools for daily...

1 Review

Downloads: 25 This Week

Last Update: 2026-06-14
See Project
6

Email Scraper and Validator

This is a simple desktop application built with Python and Tkinter that allows users to scrape email addresses from websites and validate them using an external API. It also provides features to save the scraped emails to a database, and export the data to various file formats. 1. Enter a list of website URLs or emails in the input field. 2. Click the Scrape button to scrape email addresses from the provided websites. 3. Click the Validate button to validate the scraped email...

Downloads: 0 This Week

Last Update: 2024-03-03
See Project
7

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

...Since it simplifies things DDS is not usable for all kinds of scrapers, but it is well suited for the relatively common case of regularly scraping a website with a list of updated items (e.g. news, events, etc.) and then dig into the detail page to scrape some more infos for each item. Django Dynamic Scraper tries to keep its data structure in the database as separated as possible from the models in your app, so it comes with its own Django model classes for defining scrapers, runtime information related to your scraper runs and classes.

Downloads: 0 This Week

Last Update: 2022-09-05
See Project
8

JAWS - Just Another Web Scraper

A simple Web Scraper using Regular Expression or Html Agility

JAWS or Just Another Web Scraper, is part of the Data Scraping Softwares developed by SVbook, alongside JATI (Image to Text) and JAVT (Video to Text). JAWS offer easy interface to scrape data from the website using regular expression, text preprocessing, or HTML Agility Pack.

Downloads: 1 This Week

Last Update: 2018-03-30
See Project
9

IAD dispatch web scraper

A very simple web scraper for taxi dispatch data.

Introduction: The Dulles International Airport (IAD) near Washington, D.C. has a taxi service provided by the Washington Flyer. Taxi cabs are leased by drivers and rides are regulated using a queue system. Drivers enter a corral near the Arrival gate and wait for dispatchers to announce passengers. There is a website that displays useful information about the queue. The number of taxis waiting in queue, the wait time of the last vehicle out, and the number of taxis to exit the corral in...

Downloads: 0 This Week

Last Update: 2015-12-05
See Project
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
10

htmlparser

Products of the project: Java HTMLParser - VietSpider Web Data Extractor - Extractor VietSpider News. Click on "Show project details" to see more feature about each product.

Downloads: 0 This Week

Last Update: 2015-06-24
See Project
11

python-web_excavator

Genral Data Mining API: Only write html parsing code.

A general web scraper that uses the requests library to communicate with the website. Scraper() contains a parser object, which you can add parsing handles to. ParseHandle() is the code mining for you data from an html source. Repo: https://github.com/crispycret/web_excavator

Downloads: 0 This Week

Last Update: 2014-12-15
See Project
12

IP Proxy Scraper

IP Proxy Scraper lets you extract multiple proxies

This lightweight yet powerful application extracts IPs and ports from a list of specified websites. If you are in need of multiple proxies simply insert the desired website URLs and with a single click your proxies are gathered and presented to you in the output window, ready to be copied and saved. IP Proxy Scraper is also available for Linux, check it out here: https://sourceforge.net/projects/ipproxyscraperlinux/

Downloads: 0 This Week

Last Update: 2013-12-30
See Project
13

IP Proxy Scraper - Linux

Extracts multiple proxies from a list of websites

Lightweigh and easy to use tool to extract multiple proxies from a list of websites. IP Proxy Scraper is also available for windows, check it out here: https://sourceforge.net/projects/ipproxyscraper/

Downloads: 0 This Week

Last Update: 2016-11-25
See Project
14

National Lottery Scraper

National Lottery Scraper is a tool to connect to South Africa's National Lottery website (http://www.nationallottery.co.za/), download and display Lotto, Lotto Plus, and PowerBall results.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
15

WebScraper - Web Data Extraction

A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
16

Blackfire Player

Web Crawling, Web Testing, and Web Scraping application

Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses. Some Blackfire Player use cases: Crawl a website/API and check expectations -- aka Acceptance Tests; Scrape a website/API and extract values; Monitor a website; Test code with unit test integration (PHPUnit, Behat, Codeception, ...); Test code behavior from the outside thanks to the native Blackfire Profiler integration -- aka Unit Tests from the HTTP layer (tm). ...

Downloads: 0 This Week

Last Update: 2019-06-11
See Project