html xml free download

Showing 15 open source projects for "html xml"

View related business solutions

Search Engines Java Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
1

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 1 This Week

Last Update: 2025-10-27
See Project
2

OpenSearchServer Search Engine

An open source search engine with RESTFul API and crawlers

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...

31 Reviews

Downloads: 2 This Week

Last Update: 2018-08-26
See Project
3

cpDetector

cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.

Downloads: 24 This Week

Last Update: 2018-04-05
See Project
4

eXtensible Text Framework (XTF)

Framework for search and display of heterogenous document collections.

...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.

Downloads: 0 This Week

Last Update: 2019-07-29
See Project
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
5

CyberNeko HTML Parser

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.

17 Reviews

Downloads: 2 This Week

Last Update: 2015-04-17
See Project
6

regain

Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.

13 Reviews

Downloads: 11 This Week

Last Update: 2014-07-30
See Project
7

webStraktor

webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy to master. ...

Downloads: 0 This Week

Last Update: 2014-04-25
See Project
8

TestEl

TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.

1 Review

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
9

RDF AutoPilot

Generates RDF and RDFS ontology documents automatically from HTML pages once given a set of rules.

Downloads: 0 This Week

Last Update: 2016-08-07
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

JaWiki

JaWiki is Java Wiki with a file based database to manage the Content. The content is stored in XML files in the file system. A html frontend allows to edit the content by the users via an Browser. A standalone server also included.

Downloads: 0 This Week

Last Update: 2015-08-06
See Project
11

webnavigator

The project Navigator aims at supporting automated gathering of dynamic information from third party web sites, using their web interface to post queries and to gather replies. Navigator is written in OS-independent java language.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project
12

JLinkCheck

JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features

Downloads: 0 This Week

Last Update: 2016-04-26
See Project
13

webExtractor

webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.

Downloads: 5 This Week

Last Update: 2014-06-26
See Project
14

Artlight

100% Java multithread search engine. Communication between the client and server is transferred through TCP-IP. To index objects, it obtains the documents through HTTP protocol and parses HTML files, PDF files, XML files and Text Plain files. Artlight use

Downloads: 0 This Week

Last Update: 2013-02-27
See Project
15

REST Information Interchange Primitives

This is a collection of REST specifications, and implementations of those specs, for very low-level information sharing and workflow operations using REST actions over HTTP. Implementations are in various languages, mainly Java, Python, and Ruby.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project