web crawler source code free download

Showing 82 open source projects for "web crawler source code"

View related business solutions

Search Engines Linux Clear Filters & Widen Search

Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
1

MemFree

Hybrid AI Search Engine & AI Page Generator

memfree is an open source hybrid AI search engine and page generation platform designed to help users retrieve information from both personal knowledge bases and the public web through a unified interface. The project combines retrieval-augmented search with AI summarization to deliver concise answers instead of forcing users to manually sift through multiple sources.

Downloads: 0 This Week

Last Update: 2026-03-03
See Project
2

ahCrawler

A PHP search engine for your website and web analytics tool. GNU GPL3

ahCrawler is a set to implement your own search on your website and an analyzer for your web content. It can be used on a shared hosting. It consists of * crawler (spider) and indexer * search for your website(s) * search statistics * website analyzer (http header, short titles and keywords, linkchecker, ...) You need to install it on your own server. So all crawled data stay in your environment. You never know when an external webspider updated your content. Trigger a rescan...

1 Review

Downloads: 0 This Week

Last Update: 2025-12-11
See Project
3

Easyspider - Distributed Web Crawler

Easy Spider is a distributed Perl Web Crawler Project from 2006

Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https://easyperlspider.sourceforge.io/ https://www.sebastianenger.com/ https://www.artikelschreiber.com/opensource/ It is fun to look at some code that is few years ago and to see how one has improved himself. ...

1 Review

Downloads: 0 This Week

Last Update: 2025-03-16
See Project
4

LXR Cross Referencer

A general purpose source code indexer and cross-referencer that provides web-based browsing of source code with links to the definition and usage of any identifier. Supports multiple languages. Up-to-date information in http://lxr.sourceforge.net

14 Reviews

Downloads: 5 This Week

Last Update: 2023-07-17
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

ftserver-android

Self-hosted search engine with web service to share discoveries with

Full Text Search Engine for Android Mobile, Windows Desktop, Linux Server. You can use the KeyWord to find relative WebSites, dig in important information, search answers. It has a web server inside, use it to share discoveries with people. App's Source Codes included, can be freely distributed over the internet in an unchanged or changed form. Check the file size after downloaded the Android APK. https://sourceforge.net/projects/ftserver-android/files/ The Code Repository includes FTServer Android Version Source Code (Android) FTServer Java Server Version Source Code (Linux Windows) FTServer .NET Server Version Source Code (Linux Windows) https://sourceforge.net/p/ftserver-android/code/

Downloads: 0 This Week

Last Update: 2023-07-07
See Project
6

C-squares

Concise spatial query and representation system (c-squares)

C-squares is an easily implemented method for storage, querying and display of spatial data locations, based on a hierarchical, grid-based representation of the Earth' surface. Source code for encoding, decoding, mapping, etc. is provided via this site. Additional support is available by contacting the system developer, Tony.Rees@marinespecies.org; see also the c-squares home page at http://www.cmar.csiro.au/csquares/ .

Downloads: 0 This Week

Last Update: 2020-10-23
See Project
7

Bookmark manager

Bookmark manager web application

Web Page Application: https://shemeshg.github.io/desktop-search/ Github: https://github.com/shemeshg/desktop-search-code * Dropbox sync is performed manually (from admin screen) and not on interval.

Downloads: 0 This Week

Last Update: 2020-07-25
See Project
8

X-RAY

The next web scraper, see through the <html> noise

Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...

Downloads: 0 This Week

Last Update: 2021-10-05
See Project
9

panFMP

panFMP is a generic framework suitable for harvested XML metadata that is searchable through Apache Lucene without any additional RDBMS. Fields can be defined by XPath allowing for full text queries on all types of fields including numerical ranges. The code was moved to Github: https://github.com/pangaea-data-publisher/panfmp

Downloads: 1 This Week

Last Update: 2019-05-01
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

OpenSearchServer Search Engine

An open source search engine with RESTFul API and crawlers

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...

31 Reviews

Downloads: 11 This Week

Last Update: 2018-08-26
See Project
11

RainbowPortal

The Rainbow project is an open source initiative to build a comprehensive content management system using Microsoft's ASP.NET and C# technologies. It has ASP.NET 1.1 and ASP.NET 2.0 code bases.

2 Reviews

Downloads: 2 This Week

Last Update: 2018-01-09
See Project
12

OpenWebSpider

OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!

4 Reviews

Downloads: 4 This Week

Last Update: 2017-03-12
See Project
13

eXtensible Text Framework (XTF)

Framework for search and display of heterogenous document collections.

NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.

Downloads: 3 This Week

Last Update: 2019-07-29
See Project
14

MetaGen

Meta Tag Generator. Allows you to research SEO keywords generate proper compliant meta tags and output them to a HTML or text file for insertion into a finished web project.

1 Review

Downloads: 0 This Week

Last Update: 2015-12-27
See Project
15

WebCollector

WebCollector is an open source web crawler framework based on Java.

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
16

mindCMS

Small, fast and flexible Content Management System for PHP / MySQL

Small, fast and flexible Content Management System - CMS for PHP / MySQL A very small, fast, compact and flexible Content Management System (CMS) for PHP Webservers using a reasonable amount of functions. Easily maintain your web pages and online files in any webbrowser.

1 Review

Downloads: 0 This Week

Last Update: 2014-07-30
See Project
17

Zoozle Search & Download Suchmaschine

Zoozle 2008 - 2010 Webpage, Tools and SQL Files

Download search engine and directory with Rapidshare and Torrent - zoozle Download Suchmaschine All The files that run the World Leading German Download Search Engine in 2010 with 500 000 unique visitors a day - all the tools you need to set up a clone. Code Contains: - PHP Files for zoozle - Perl Crawler for gathering new content to database and all other cool tools i have...

1 Review

Downloads: 0 This Week

Last Update: 2025-03-16
See Project
18

webStraktor

webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...

Downloads: 0 This Week

Last Update: 2014-04-25
See Project
19

phpShare&Search

Group file share with advanced text parsing capability for easy search

Originally created as a church resource sharing system, phpShare&Search allows users to create accounts, share documents, search documents, and like or report documents. phpShare&Search's power comes from its advanced document parser which extracts text from .PDF, .TXT, .DOC, and .DOCX files and its community features of liking resources and reporting them as inappropriate or SPAM. Users also subscribe to weekly updates of new content. User's may choose to download and...

Downloads: 1 This Week

Last Update: 2015-06-25
See Project
20

Open Travel Data

Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).

Downloads: 0 This Week

Last Update: 2016-10-02
See Project
21

Regular Expression web replication

Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
22

SeerSuite

SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX. CiteSeerX has moved to GitHub, please get the latest code from: https://github.com/SeerLabs/CiteSeerX

2 Reviews

Downloads: 0 This Week

Last Update: 2014-01-24
See Project
23

pro-search

PRO-Search is a crawler of FTP servers, SMB shares, HTTP, dc++ networks, ... with powerful web search and navigation interface

1 Review

Downloads: 0 This Week

Last Update: 2013-04-17
See Project
24

OptimizeGoogle

===NOTICE=== After releasing a few updates, but far less than we wanted, we’ve made the decision to stop the OptimizeGoogle Project. The reasons for the decision were that there were not enough people on the team to keep it going. Google is changing things every day and it has become more and more frustrating to look at all the functions go broke piece by piece. The code will remain GPL, perhaps another person or team is interested in picking this up. For now, thank you for all...

4 Reviews

Downloads: 0 This Week

Last Update: 2014-07-12
See Project
25

SlinkE Distrubuted Cloud Computing

SlinkE is a highly elastic distributed cloud computing environment. All source code is included in all of the products. Our goal in making it open source is to allow others to contribute to the project.

Downloads: 1 This Week

Last Update: 2015-08-06
See Project