html search engine free download

Showing 28 open source projects for "html search engine"

View related business solutions

Web Scrapers Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
1

owllook

Vertical novel search engine with unified reading and tracking tools

Owllook is an open source vertical search engine designed for discovering and reading online novels from multiple sources. Instead of redirecting users to different sites, the system parses content from many novel platforms and presents it in a unified reading interface. It focuses on providing a simple and comfortable reading experience with features such as searching for books, following updates, bookmarking chapters, and maintaining a personal bookshelf.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
2

fess

Open source enterprise search server for websites, files, and data

Fess is an open source enterprise search server designed to provide powerful full-text search capabilities across multiple data sources. It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can...

Downloads: 0 This Week

Last Update: 2026-06-25
See Project
3

Spider

High-performance Rust web crawler and scraper for large-scale data

...These capabilities make the project suitable for building search indexers, data extraction pipelines, & SEO analysis tools.

Downloads: 2 This Week

Last Update: 2026-03-31
See Project
4

diskover-community

Open source file indexing & storage analytics powered by Elasticsearch

...Diskover also helps identify outdated or unused files, duplicate data, and inefficient storage usage that can waste resources or increase operational costs. A Python-based indexing engine performs the scanning and indexing tasks.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
5

FinalRecon

All-in-one Python web reconnaissance tool for fast target analysis

FinalRecon is an all-in-one web reconnaissance tool written in Python that helps security professionals gather information about a target website quickly and efficiently. It combines multiple reconnaissance techniques into a single command-line utility so users do not need to run several separate tools to collect similar data. FinalRecon focuses on providing a fast overview of a web target while maintaining accuracy in the collected results. It includes modules for gathering server...

Downloads: 1 This Week

Last Update: 1 hour ago
See Project
6

OpenSERP

Open-source SERP API for AI, SEO & automation - Google + 5 more

OpenSERP is a free self-hosted open-source SERP API & CLI with live browser-rendered search and content extraction for AI, SEO & automation. Use it as a search tool for LLMs, agents, and RAG pipelines, or as a scraper backend for SEO rank tracking across Google, Yandex, Baidu, and more. It is especially useful when your workflow needs RU/CN web coverage instead of another Google-only API.

Downloads: 3 This Week

Last Update: 3 days ago
See Project
7

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 1 This Week

Last Update: 2025-10-27
See Project
8

PHPScraper

A universal web-util for PHP

...All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already. You can find prepared extractors for various HTML tags, including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.

Downloads: 0 This Week

Last Update: 2024-04-09
See Project
9

Rapid Reference

An extension that allows for hassle-free website citation/referencing.

Please do not distribute with the goal of selling my program. How to attach to your Chrome/Edge/Brave etc Browser: 1. Download the extension(rapidreference.zip) 2. Extract 3. Go to chrome://extensions if on Chrome, or navigate to your extension management setting in your browser 4. Enable developer mode (usually top right) 5. Add unpacked extension 6. Choose the extracted extension's folder 7. There you go! How to use: 1. Start a session in the panel of the...

Downloads: 0 This Week

Last Update: 2024-07-30
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

go-dork

Fast Go-based CLI scanner for running automated search engine dorks

go-dork is an open source command-line tool designed to automate search engine dorking and reconnaissance tasks. Written in the Go programming language, it focuses on speed and efficiency when executing advanced search queries across multiple search engines. It allows users to run specialized queries, often referred to as “dorks,” to discover publicly exposed data, misconfigurations, or potentially vulnerable resources.

Downloads: 6 This Week

Last Update: 2026-03-11
See Project
11

JSSoup

JavaScript + BeautifulSoup = JSSoup

I'm a fan of Python library BeautifulSoup. It's feature-rich and very easy to use. But when I am working on a small react-native project, and I tried to find a HTML parser library like BeautifulSoup, I failed. So I want to write a HTML parser library that can be so easy to use just like BeautifulSoup in Javascript. JSSoup uses tautologistics/node-htmlparser as HTML dom parser, and creates a series of BeautifulSoup like API on top of it. JSSoup supports both node and react-native. JSSoup...

Downloads: 0 This Week

Last Update: 2023-04-10
See Project
12

gocrawl

Polite concurrent web crawler library for Go with flexible hooks

...Developers have full control over the crawling workflow, including which URLs are visited, inspected, and processed during execution. gocrawl integrates with HTML parsing tools so responses can be inspected and queried in a structured way while crawling. Instead of implementing a full search indexing pipeline, the library provides the core crawling engine and extension hooks.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
13

GoogleScraper

Python tool for scraping search engine results from many providers

GoogleScraper is a Python-based tool designed to automatically collect and process search engine results from multiple providers. It enables developers and researchers to programmatically query search engines and extract useful information such as links, titles, and result descriptions. GoogleScraper supports several major search engines and can be used to gather structured datasets from search result pages for further analysis.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
14

X-RAY

The next web scraper, see through the <html> noise

Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...

Downloads: 0 This Week

Last Update: 2021-10-05
See Project
15

WeChatSogou

Python library to crawl and retrieve data from WeChat accounts

WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
16

Save For Offline

Android app for saving webpages for offline reading

...A search of saved pages (By title only for now). User-agent change allows the saving of either desktop or mobile versions of pages. Nice UI for both phones and tablets, with various choices for layout and appearance.

Downloads: 3 This Week

Last Update: 2023-04-12
See Project
17

sqliv

Massive SQL injection vulnerability scanner for automated web testing

...Written primarily in Python, the project focuses on discovering potentially vulnerable web pages by analyzing URLs that contain database query parameters. It can perform large-scale scanning by using search engine queries known as SQL injection dorks to collect candidate websites and then test them for vulnerabilities. In addition to bulk scanning, SQLiv supports targeted analysis of specific domains or individual URLs, allowing security researchers to focus on particular web applications. When a domain is supplied, the scanner can crawl the site to gather URLs with parameters and evaluate them for potential SQL injection weaknesses. ...

Downloads: 4 This Week

Last Update: 1 hour ago
See Project
18

OpenWebSpider

OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!

4 Reviews

Downloads: 2 This Week

Last Update: 2017-03-12
See Project
19

webStraktor

webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...

Downloads: 1 This Week

Last Update: 2014-04-25
See Project
20

Constellio Enterprise Search engine

Open source Search Engine and Enterprise Search

Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.

Downloads: 0 This Week

Last Update: 2015-03-31
See Project
21

Web Crawler Security Tool

A web crawler oriented to information security.

Last update on tue mar 26 16:25 UTC 2012 The Web Crawler Security is a python based tool to automatically crawl a web site. It is a web crawler oriented to help in penetration testing tasks. The main task of this tool is to search and list all the links (pages and files) in a web site. The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of...

3 Reviews

Downloads: 0 This Week

Last Update: 2015-10-10
See Project
22

Sphider

Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. It is a great tool for adding search functionality to your web site or building your custom search engine. Sphider is small, easy to set up and...

1 Review

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
23

Broken url checker

...It can crawl any site and help to find broken links. It also having download CSV report option.The CSV file includes url ,parent page url and status of page [broken or ok]. It is be very useful for search engine optimization.

Downloads: 0 This Week

Last Update: 2013-04-05
See Project
24

WebNews Crawler

WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
25

Nomad - Tiny Search Engine

Nomad is tiny but efficient search engine and web crawler. This works very good for searching with in the set of corporate websites on internet and/or intranet's HTML documents or knowledge repositories.

Downloads: 0 This Week

Last Update: 2013-03-14
See Project