Showing 21 open source projects for "html xml"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 4
    SSEP - Site Search Engine PHP-Ajax

    SSEP - Site Search Engine PHP-Ajax

    A Free site search engine script build with PHP and Ajax.

    A Site Search engine script that uses MySQL to store your website's indexed pages, to add Search Functionality to Your Web Site. It is build with PHP and JavaScript, the search results are loaded via Ajax. The search system combine MySQL full text with SQL regexp, and words weight according to their location in the HTML elements, to determine the relevance of the search results. It can be included in any web site.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    ...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    HyperSQL is like a doxygen plus javadoc for SQL, hypermapping SQL views, packages, procedures, and functions to HTML source code listings and showing all code locations where these are used.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    A utility to extract meta-information (properties/comments) out of various file-types; e.g. HTML, PDF, RTF & various Office documents; OGG/MP3 files and JPEG/PNG/GIF images, which can be presented in various output formats (HTML, XML, LaTeX & plain t
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    This library can be used to add your site to browser search box. It can generate HTML, Javascript and XML to pass information to browsers so they can add a site to the list of types of search that the browser can perform.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    zSearch is a simple python based crawler and search engine. Raw HTML are stored in bzip2 archives, the index is created using pylucene, and twsited is used to provide internal http server. Results are sent back as XML over HTTP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    bluebery is an easy-to-use sql/php based content manager that provides php libraries and methods to use in your sites pages with which you can very easily access & print desired items, or an iteration of items that are stored through the bluebery web ui.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    JaWiki is Java Wiki with a file based database to manage the Content. The content is stored in XML files in the file system. A html frontend allows to edit the content by the users via an Browser. A standalone server also included.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    The project Navigator aims at supporting automated gathering of dynamic information from third party web sites, using their web interface to post queries and to gather replies. Navigator is written in OS-independent java language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    A distributed search portal of common sources of ISBN numbers, with permanent caching of results. To provide a open-source free interface for ISBN retrieval using HTML, SQL or XML to be independent of any toolkits or software.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    100% Java multithread search engine. Communication between the client and server is transferred through TCP-IP. To index objects, it obtains the documents through HTTP protocol and parses HTML files, PDF files, XML files and Text Plain files. Artlight use
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ICECrawler is a WWW crawler and map-generator intended to help understanding and analyzing links between websites and webdocuments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo