Showing 49 open source projects for "java html parser"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Agents-Flex

    Agents-Flex

    Agents-Flex is an elegant LLM Application Framework like LangChain

    ... definitions, parsing, callbacks through LLMs, and executing local methods to obtain results. Agents-Flex offers Loader, Parser, and Splitter components for the Document. Each component has multiple implementations, making it easy to load data from the web, local files, databases, and various data types.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    BudouX

    BudouX

    Standalone, small, language-neutral

    Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning-powered line break organizer tool. It is standalone. It works with no dependency on third-party word segmenters such as Google cloud natural language API. It is small. It takes only around 15 KB including its machine learning model. It's reasonable to use it even on the client-side. It is language-neutral. You can train a model for any language by feeding a dataset to BudouX’s training...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Leader badge
    Downloads: 17 This Week
    Last Update:
    See Project
  • 6
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. Pdf text...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM is a electronic document management system and record management system EDRMS ( DMS, RMS, CMS ). It provides modern and flexible architecture that meet today's IT demands, based on open technology (Java, Tomcat, GWT, Lucene, Hibernate, Spring and jBPM), powerful and scalable multiplatform application. OpenKM is a Web 2.0 application that works with Internet Explorer, Firefox, Safari and Opera. Can be configured in major DMBS like Oracle, PostgreSQL and MySQL among others...
    Leader badge
    Downloads: 500 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 10
    html2canvas

    html2canvas

    A JavaScript HTML screenshot renderer

    html2canvas is a JavaScript HTML renderer. The script provides you with the tools to take screenshots of webpages directly on the browser. The screenshot is based on the DOM and therefore, it may not be 100% accurate to the real representation, given that it is not an actual screenshot, but a type of screenshot built based on the available data and information of the page. The script renders such page as a canvas image, by reading the DOM and the different styles of the featured elements...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    libpostal

    libpostal

    A C library for parsing/normalizing street addresses around the world

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Provides a GUI interface to grammatical structure and relations (as parsed by the Stanford Parser) of any text.Contains grammatical relation editor to modify, import, export grammatical relation definitions (tregex patterns and features).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    perkun

    perkun

    two experimental AI languages + zubr

    Two experimental AI languages - Perkun and its successor Wlodkowic. Attempt to maximize the expected value of the payoff function by appropriate choosing the actions (output variables values). The package contains also a tool called zubr - a Java code generator based on Perkun. Take also a look at my blog: http://pawel-biernacki.blogspot.fi/ For Windows users there is an installer: http://www.pawelbiernacki.net/perkun.msi
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MIT Deep Learning Book

    MIT Deep Learning Book

    MIT Deep Learning Book in PDF format by Ian Goodfellow

    ... is the only comprehensive book on the subject. This is not available as PDF download. So, I have taken the prints of the HTML content and bound them into a flawless PDF version of the book, as suggested by the website itself. Printing seems to work best printing directly from the browser, using Chrome. Other browsers do not work as well.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 16
    Command Line Parser GetPot

    Command Line Parser GetPot

    Tool to parse the command line and configuration files.

    Powerful command line and configuration file parsing for C++, Python, Ruby and Java (others to come). This tool provides many features, such as separate treatment for options, variables, and flags, unrecognized object detection, prefixes and much more.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Panzer Combat II

    Panzer Combat II

    Computer-assisted miniature tank game.

    ...://server.panzercombat.com/PCII_Web/move.htm Look at battle reports : http://www.flickr.com/photos/panzercombatii Or watch a demo : http://www.youtube.com/watch?v=WcjfV8Odtss 100% CLEAN : http://games.softpedia.com/progClean/Panzer-Combat-II-Clean-95530.html
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    smartblob

    tiny code, html webcam game or plug brain in each blob, java server

    ... together. The physics and vision algorithm is half working in version 0.3.0. The gameplay is better experienced in 0.2 which is controlled with the mouse so you dont have enough freedom of movement compared to webcam. This small file contains its own source code including occamserver, a tiny general java server I built which I'll adjust to allow http streaming connections to stay open for faster ajax than a new web call each time. The reshaping and bouncing physics is by springs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Rootvole

    a text parsing library that matches text with concepts.

    For general processing of voice queries we developed a text parsing library named 'Rootvole' that can be used to match text with semantic concepts. The algorithm was implemented in Java and can be described as a form of a parsing expression grammar, where we generate the expressions to be detected beforehand by regular expressions and store them in a vocabulary. The central class is the parser class, which is instantiated as a series of vocabularies, simple text lists that describe tokens...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Intelligent Keyword Miner

    Intelligent Keyword Miner

    Intelligent SEO keyword miner and predicing tool

    THIS IS A NETBEANS 8.02 PROJECT ENGLISH ONLY This program was made to help me with the patent research. It simply generates the search keywords, based on your upvotes or a downvotes of the input parameters. It can accept a text or URL (text takes a prescedence over the URL). If you input URL, it goes to a page, and learns its text from HTML format. This program is intelligent as it predicts what you may want to search next, based on your personal trends. After searching the suggestions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Azul OS

    Azul OS

    Azul OS version dev(Linux) IA

    ... 0.4.1 . Disponible [changelog] software added : php5-mysql gcc-c++ php5-gd php5-ctype perl-HTML-Tagset php5-zip php5-curl kernel-source mysql-connector-java php5-pear php5-mcrypt php5-ftp devel_C_C++ gimp gedit recode libreoffice MozillaFirefox wireshark audacity nano This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. #Blog : http://azul0.wordpress.com/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    MSTParser

    MSTParser is a non-projective dependency parser that searches for maxi

    MSTParser is a non-projective dependency parser that searches for maximum spanning trees over directed graphs. Models of dependency structure are based on large-margin discriminative training methods. Projective parsing is also supported. mstparser 0.5.1 is now available via Maven Central. If you use Maven as your build tool, then you can add it as a dependency in your pom.xml file: <dependency> <groupId>net.sourceforge.mstparser</groupId> <artifactId>mstparser</artifactId...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DGiovanni
    A multi-agent architecture for building interactive dramas. It uses the Jason's BDI engine, being the Jason's agent-oriented programming language utilized for performing the drama management and for authoring behaviors for the characters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    RDFaMaker is a Java application that enable users to inser and modify semantic content in XHTML pages using RDFa extension
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.