Showing 59 open source projects for "java xml"

View related business solutions
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
    99.99% Uptime for MySQL and PostgreSQL on Google Cloud

    Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

    Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
    Try Cloud SQL Free
  • 1
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    ResCarta

    ResCarta

    Archive your personal history

    ResCarta Toolkit offers an open source solution to creating, storing, viewing, and searching digital collections. Applications in the toolkit let users create and edit metadata, convert data to open standard ResCarta format, index and host collections.
    Leader badge
    Downloads: 24 This Week
    Last Update:
    See Project
  • 3
    panFMP
    panFMP is a generic framework suitable for harvested XML metadata that is searchable through Apache Lucene without any additional RDBMS. Fields can be defined by XPath allowing for full text queries on all types of fields including numerical ranges. The code was moved to Github: https://github.com/pangaea-data-publisher/panfmp
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • 5
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Digital Learning Sciences (DLS) is a mission-centered, not-for-profit organization dedicated to improving learning through the use of digital content and tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9

    ScraperEdit for XBMC

    XML bindings and a GUI for creating and editing XBMC Scrapers

    This program is an editor for creating XBMC Scrapers. It is similar to ScraperEditor, an other editor using ScraperXML, that runs under .Net environment. This program runs under Sun/Oracle's Java Runtime. HELP WANTED! I am looking for someone, who would help me writing documentation, like user's manual and on-line help. Also if someone want to help, translated language files are always welcome...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • 10
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 11
    IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    Infofuze

    Data migration/conversion library based on STX and XSLT transformation

    Infofuze is a Java library and server application that can be used to transform and combine data from various sources into a specific XML or other text output format that can be stored or indexed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    XML documents To Generated dynamic web application supporting CRUD actions. Credits to Ministry of Culture and Communication, France; UNESCO; Ecole Nationale des Chartes, France; PASS-TECH, France.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    A universal platform for resource discovery and description that shares XML meta-data over existing peer-to-peer (P2P) networks such as Gnutella and JXTA.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    DBPrism is a framework to generate dynamic XML from a database, it provides an high performance DBGenerator for Cocoon2. Also is a J2EE replacement for Oracle mod_plsql. This project also includes a Restlet-Oracle connector exam. and Lucene Domain In
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    NewsRack is a tool/service that attempts to automate news monitoring. Based on user-specified definitions and rules, NewsRack will enable automated downloading, classification, filing, and long-term archiving of news.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    jSEO -- Pluggable SEO (Search Engine Optimization) for dynamic JEE web applications
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The Semantic Web implementation using native xml database as backend storage. A SPARQL java compiler to XQuery using Jena. There are XQuery scripts for native xml database Sedna(http://modis.ispras.ru/sedna/).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    The Java-Sitemapper is a Java API for building sitemap files to improve search indexing on Google, Yahoo!, MSN, and Ask.com. This project strives to implement the latest in search technology for use on the Java platform.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A Java library as a wrapper for the Google Search Appliance's search protocol XML API. The XML API is publicly available at: http://code.google.com/gsa_apis/xml_reference.html The homepage and tutorial for this project is at: http://gsa-japi.sf.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    JxtASK is a P2P system that is aimed to search, download and share academic content hosted on websites that will join the JxtASK community. Joining is simple: siteadmins must generate(even automatically)a XML catalog which describes the files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Google(™) meets the Matrix. Red Piranha combines Lucene (Searching Ability), XML-RDF (ability to learn), Tomcat (for P2P Power) and Spring (Ease of use) to not only let you find anything, anywhere, but to actually understand what you are looking for.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →