html xml free download

Showing 21 open source projects for "html xml"

View related business solutions

Search Engines Mac Clear Filters & Widen Search

Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
1

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 1 This Week

Last Update: 2025-10-27
See Project
2

OpenSearchServer Search Engine

An open source search engine with RESTFul API and crawlers

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...

31 Reviews

Downloads: 0 This Week

Last Update: 2018-08-26
See Project
3

cpDetector

cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.

Downloads: 22 This Week

Last Update: 2018-04-05
See Project
4

SSEP - Site Search Engine PHP-Ajax

A Free site search engine script build with PHP and Ajax.

A Site Search engine script that uses MySQL to store your website's indexed pages, to add Search Functionality to Your Web Site. It is build with PHP and JavaScript, the search results are loaded via Ajax. The search system combine MySQL full text with SQL regexp, and words weight according to their location in the HTML elements, to determine the relevance of the search results. It can be included in any web site.

3 Reviews

Downloads: 0 This Week

Last Update: 2017-03-25
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

eXtensible Text Framework (XTF)

Framework for search and display of heterogenous document collections.

...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.

Downloads: 0 This Week

Last Update: 2019-07-29
See Project
6

HyperSQL

HyperSQL is like a doxygen plus javadoc for SQL, hypermapping SQL views, packages, procedures, and functions to HTML source code listings and showing all code locations where these are used.

Downloads: 0 This Week

Last Update: 2016-09-19
See Project
7

CyberNeko HTML Parser

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.

17 Reviews

Downloads: 5 This Week

Last Update: 2015-04-17
See Project
8

regain

Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.

13 Reviews

Downloads: 9 This Week

Last Update: 2014-07-30
See Project
9

TestEl

TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.

1 Review

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

Information Extracter

A utility to extract meta-information (properties/comments) out of various file-types; e.g. HTML, PDF, RTF & various Office documents; OGG/MP3 files and JPEG/PNG/GIF images, which can be presented in various output formats (HTML, XML, LaTeX & plain t

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
11

Browser Search Box library

This library can be used to add your site to browser search box. It can generate HTML, Javascript and XML to pass information to browsers so they can add a site to the list of types of search that the browser can perform.

Downloads: 0 This Week

Last Update: 2015-11-15
See Project
12

zSearch -- The easy search engine

zSearch is a simple python based crawler and search engine. Raw HTML are stored in bzip2 archives, the index is created using pylucene, and twsited is used to provide internal http server. Results are sent back as XML over HTTP.

Downloads: 0 This Week

Last Update: 2016-07-24
See Project
13

bluebery

bluebery is an easy-to-use sql/php based content manager that provides php libraries and methods to use in your sites pages with which you can very easily access & print desired items, or an iteration of items that are stored through the bluebery web ui.

Downloads: 0 This Week

Last Update: 2014-07-06
See Project
14

JaWiki

JaWiki is Java Wiki with a file based database to manage the Content. The content is stored in XML files in the file system. A html frontend allows to edit the content by the users via an Browser. A standalone server also included.

Downloads: 0 This Week

Last Update: 2015-08-06
See Project
15

webnavigator

The project Navigator aims at supporting automated gathering of dynamic information from third party web sites, using their web interface to post queries and to gather replies. Navigator is written in OS-independent java language.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project
16

JLinkCheck

JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features

Downloads: 0 This Week

Last Update: 2016-04-26
See Project
17

Information Retrieval Toolkit

High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.

1 Review

Downloads: 0 This Week

Last Update: 2013-02-21
See Project
18

Distributed ISBN portal

A distributed search portal of common sources of ISBN numbers, with permanent caching of results. To provide a open-source free interface for ISBN retrieval using HTML, SQL or XML to be independent of any toolkits or software.

Downloads: 0 This Week

Last Update: 2013-07-14
See Project
19

webExtractor

webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.

Downloads: 4 This Week

Last Update: 2014-06-26
See Project
20

Artlight

100% Java multithread search engine. Communication between the client and server is transferred through TCP-IP. To index objects, it obtains the documents through HTTP protocol and parses HTML files, PDF files, XML files and Text Plain files. Artlight use

Downloads: 0 This Week

Last Update: 2013-02-27
See Project
21

ICECrawler

ICECrawler is a WWW crawler and map-generator intended to help understanding and analyzing links between websites and webdocuments.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project