Page 5 | Best Open Source Windows Web Scrapers 2024

Spider

Spider is web crawler written in the Java.Based on an Regular expression string the spider parses the internet for web pages matching this string and stores it in an MYSQL database.

Downloads: 0 This Week

Last Update: 2014-08-09

See Project

SpotiScrape

Downloads: 0 This Week

Last Update: 2023-10-30

See Project

The VB Web Crawler

A VB Web crawler that is currently under construction with the goal to be able to crawl and index the net most likely by distributed computing (via network).

Downloads: 0 This Week

Last Update: 2016-07-24

See Project

Till

DataHen Till is a companion tool to your existing web scraper

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes. Web scraping is usually easy to get started, especially on a small scale. However, as you try to scale it up, it gets exponentially difficult. Scraping 10,000 records can easily be done with simple web scraper scripts in any programming language, but as you try to scrape millions of pages, you would need to architect and build features on your web scraping script that allows you to scale, maintain and unblock your scrapers. Scraping to millions or even billions of records requires much more pre-planning. It's not simply running your existing web scraper script in a bigger CPU/Ram machine. More thoughts are needed.

Downloads: 0 This Week

Last Update: 2023-04-12

See Project

URL Web Crawler

It is basicly a program that can make you a search engine. It is a web crawler, has all the web site source code (in ASP, soon to be PHP as well), and a mysql database.

Downloads: 0 This Week

Last Update: 2015-05-23

See Project

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise your script as practically any browser.

Downloads: 0 This Week

Last Update: 2024-09-17

See Project

Universal Web Spider

UniWebSpider - universal search system. Bases on ActiveX, VC++, MSSQL technologies. The given realization is dissymetric, multithread system basing on COM technologies. Management of the elementary functions is accessible through Dispatch-interfaces. Cr

Downloads: 0 This Week

Last Update: 2013-03-20

See Project

VIT Marks Display

A small program that accesses VIT marks of a specific student

A small attempt while learning interfacing with the web while learning python to get the marks of a specific valid VIT student using basic web scraping techniques

Downloads: 0 This Week

Last Update: 2013-05-30

See Project

Wadsworth

Wadsworth is a java based web scripting engine. It uses user-defined XML scripts to define its actions. It can be used as a web testing tool, or as a web scraper, or to automate any web actions you wish. It can also be invoked and controlled by another

Downloads: 0 This Week

Last Update: 2013-02-22

See Project

Web Text eXtraction and analysis Tools

Web Textual eXtraction Tools C++ Parallel web crawler, noun phrase idenification, Multi-lingual Part of Speech Tagging, Tarjan's Algorithm, Co-RelationShip Mappings...

Downloads: 0 This Week

Last Update: 2014-06-03

See Project

Web-Scraper

Simple C# tool example project to scrape info from a webpage. This is a quick hack for a school project, done in one evening so I dont have to type the same printers into Excel or Access for the twentiest time ...

Downloads: 0 This Week

Last Update: 2013-04-24

See Project

WebCollector

WebCollector is an open source web crawler framework based on Java.

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java

Downloads: 0 This Week

Last Update: 2015-06-04

See Project

WebExtractServer

WebExtractServer use with WebExtractLte for use with web browsers

Browse data, fetched by WebExtractLte directly in your browser. Designed to be used with Webscraper (webscraper.io) - third party web scraper tool, available as plugin for Chrome and Firefox.

Downloads: 0 This Week

Last Update: 2019-04-29

See Project

WebMagic

A scalable web crawler framework for Java

WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features include the fact that it is multi-thread and has distribution support. WebMagic is very easy to integrate. Add dependencies to your pom.xml. WebMagic use slf4j with slf4j-log4j12 implementation. If you customized your slf4j implementation, please exclude slf4j-log4j12. You can write a class implementation of PageProcessor.

Downloads: 0 This Week

Last Update: 2023-12-05

See Project

WebNews Crawler

WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.

Downloads: 0 This Week

Last Update: 2013-04-23

See Project

WebSPHINX

WebSPHINX is a web crawler (robot, spider) Java class library, originally developed by Robert Miller of Carnegie Mellon University. Multithreaded, tollerant HTML parsing, URL filtering and page classification, pattern matching, mirroring, and more.

2 Reviews

Downloads: 0 This Week

Last Update: 2015-11-12

See Project

WebScrapa - Web Scraping Program

Pulls information from the web undetected.

This program will pull information from the web and display it into a textbox and will do all of it with a smart bot. You can connect bots to this program to make it more practical.

Downloads: 0 This Week

Last Update: 2019-10-01

See Project

WebScraper - Web Data Extraction

A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24

See Project

WebSeoAnalyzer

A C# coded web crawler that analyzes pagerank, total links, no follow links, and make a ranking for their best-positioned pages in Google

1 Review

Downloads: 0 This Week

Last Update: 2014-07-01

See Project

Webtools 4 larbin

Larbin is a Web crawler intended to fetch a large number of Web pages, it should be able to fetch more than 100 millions pages on a standard PC with much u/d. This set of PHP and Perl scripts, called webtools4larbin, can handle the output of Larbin and p

Downloads: 0 This Week

Last Update: 2013-03-21

See Project

X-RAY

The next web scraper, see through the <html> noise

Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped. Start on one page and move to the next easily. The flow is predictable, following a breadth-first crawl through each of the pages. X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly. Swap in different scrapers depending on your needs. Currently supports HTTP and PhantomJS driver drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.

Downloads: 0 This Week

Last Update: 2021-10-05

See Project

Yoshibot Web Spider

A basic Perl web spider with grandiose aspirations. Supports XML log file output and resumable spidering sessions.

Downloads: 0 This Week

Last Update: 2013-03-12

See Project

YouTube video web scraper 2 [ISA]

YouTube video web scraper 2 [Improved.Simplified.Alternative]

'YouTube video web scraper 2' is an desktop application developed using python 3.11.4 and other add-on libaries. Finds YouTube video based on user request and view as table. Export the table as excel. Compatible only for windows OS.

Downloads: 0 This Week

Last Update: 2023-07-22

See Project

arachnode.net

arachnode.net is an open source Web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages and is written in C# using SQL Server 2008. See http://arachnode.net for the LATEST.

1 Review

Downloads: 0 This Week

Last Update: 2014-06-25

See Project

bee-rain

bee-rain is a web crawler that harvest and index file over the network. You can see result by bee-rain website : http://bee-rain.internetcollaboratif.info/

1 Review

Downloads: 0 This Week

Last Update: 2013-04-18

See Project

Open Source Windows Web Scrapers - Page 5

Web Scrapers for Windows

Spider

SpotiScrape

The VB Web Crawler

Till

URL Web Crawler

Ulixee Hero

Universal Web Spider

VIT Marks Display

Wadsworth

Web Text eXtraction and analysis Tools

Web-Scraper

WebCollector

WebExtractServer

WebMagic

WebNews Crawler

WebSPHINX

WebScrapa - Web Scraping Program

WebScraper - Web Data Extraction

WebSeoAnalyzer

Webtools 4 larbin

X-RAY

Yoshibot Web Spider

YouTube video web scraper 2 [ISA]

arachnode.net

bee-rain

Open Source Windows Web Scrapers - Page 5

Web Scrapers for Windows

Spider

SpotiScrape

The VB Web Crawler

Till

URL Web Crawler

Ulixee Hero

Universal Web Spider

VIT Marks Display

Wadsworth

Web Text eXtraction and analysis Tools

Web-Scraper

WebCollector

WebExtractServer

WebMagic

WebNews Crawler

WebSPHINX

WebScrapa - Web Scraping Program

WebScraper - Web Data Extraction

WebSeoAnalyzer

Webtools 4 larbin

X-RAY

Yoshibot Web Spider

YouTube video web scraper 2 [ISA]

arachnode.net

bee-rain

Related Searches