Showing 72 open source projects for "data formats html/xhtml tidy"

View related business solutions
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • 1
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    CSSBox

    CSSBox

    Pure Java HTML / CSS rendering engine

    CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete information about the rendered page suitable for further processing. However, it also allows displaying the rendered document.
    Leader badge
    Downloads: 15 This Week
    Last Update:
    See Project
  • 4
    XRichText

    XRichText

    An Android rich text class library that supports graphic & text mixing

    An Android-rich text class library that supports graphic and text mixing, supports editing and previewing and supports inserting and deleting pictures. Use ScrollView as the outermost layout containing LineaLayout, filled with TextView and ImageView. When deleting, delete the TextView and ImageView according to the position of the cursor, and the text will be automatically merged. The generated data is a list collection, and the data format can be customized. Version V1.4 opens the image...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Run your private office with the ONLYOFFICE Icon
    Run your private office with the ONLYOFFICE

    Secure office and productivity apps

    A Comprehensive Alternative to Office 365 for Business
    Learn More
  • 5

    ConcatPDF

    PDF Concatenation Tool

    ConcatPDF is the tool to concatenate PDF files. It can concatenate, extract, encrypt, decrypt, configure PDF files, convert image files to PDF. GUI version and CUI version are both available. iText.NET is iText porting on .NET Framework by J#. This library allows you to generate PDF, (X)HTML, XML, RTF files on Microsoft.NET Framework including ASP.NET.
    Leader badge
    Downloads: 37 This Week
    Last Update:
    See Project
  • 6
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 284 This Week
    Last Update:
    See Project
  • 7
    Web Widget Toolkit (WTK): Server-side components for easily creating web-based user interfaces with complex navigation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    JuniCoder is a Java project that uses unicode as a base for decoding and encoding formats that invented workarounds to express characters not covered by ASCII. Decoders translate those inventions to unicode. Encoders encode to these inventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ango Hub | All-in-one data labeling platform Icon
    Ango Hub | All-in-one data labeling platform

    For AI teams and Computer Vision team in organizations of all size

    AI-Assisted features of the Ango Hub will automate your AI data workflows to improve data labeling efficiency and model RLHF, all while allowing domain experts to focus on providing high-quality data.
    Learn More
  • 10
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Chunk, an HTML Template Engine for Java

    Chunk, an HTML Template Engine for Java

    Clean, powerful templates for Java

    A powerful Java Template Engine, great for building HTML or XML docs. Chunk can handle many other needs and situations as well. In-tag filters & default values, multiple snippets per file, layered themes, macros, conditional includes, localization & more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    wikihtml

    Converts wikitext documents into HTML documents

    This project is an application that converts wikitext documents into HTML documents. Wiki markup or wikitext is a markup language to write documents in wiki-based systems, such as web sites powered by MediaWiki.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 16
    Aspose Java for Liferay

    Aspose Java for Liferay

    Provides export options for blogs, journals and dynamic lists

    This is Liferay CMS / Portal plugin released by Aspose pty ltd. Aspose.Total Java for Liferay (hook plugin app) provides options for exporting web-contents and blogs created in html to MS-WORD, MS-EXCEL and PDF file formats using Aspose.Total Java APIs. (Aspose.Words, Aspose.Cells and Aspose.PDF) The Plugin also provides very useful functionality / options for exporting the Dynamic Data Lists to MS-WORD, MS-EXCEL and PDF file formats using Aspose.Total Java APIs. (Aspose.Words, Aspose.Cells and Aspose.PDF)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FM-Classic provides an easy way to get data from Java servlets into Web pages, and helps you keep graphic design separate from application logic. FM-Classic is a continuation of the FreeMarker 1.x code base.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 19
    wingS
    wingS (wingS is net generation Swing) is a Java library for developing web based AJAX applications in a way like developing Swing based applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    WebDiff

    Graphical tool for visualizing changes in web pages

    WebDiff is a graphical tool for visualizing changes in web pages. It is written in Java and uses Eclipse's SWT toolkit. You can view changes between any two HTML files on your file system or a web server, distinguishing them in a manner of your choice. There is a plan to eventually support viewing changes between Git/Subversion/Mercurial clients.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    SwingBox

    Java Swing HTML / CSS rendering component

    SwingBox is a Java Swing component that allows displaying the (X)HTML documents including the CSS support. It is designed as a JEditorPane replacement with considerably better rendering results. SwingBox is pure Java and it is using the CSSBox rendering engine for rendering the documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ImportCargo_airplane

    ImportCargo_airplane

    항공 수입화물 이력조회

    항공 수입화물의 화물관리번호 조회 및 화물 처리 이력을 조회 하는 프로그램.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Special Characters HTMLizer

    Special Characters HTMLizer

    Automatically converts special characters to their HTML codes.

    Especially aimed at German and Turkish special characters, this small tool only offers one JEditorPane for input (left) and output (right) each, and a button. When the button is clicked, the contents of the left editor field are processed and delivered to the right editor field.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    phpMyBitTorrent: BitTorrent Tracker written in PHP. Features include: hosting torrents from remote trackers, DHT, Compact Announce, alternate links (eD2K, Magnet), HTTP-Basic Authentication, Passkey Authentication, embedded HTML Editor, Mass-upload of to
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Calenco XML CMS
    Calenco is a Web collaborative platform that enable remote teams of writers, proofreader, graphic designers, translators, etc. to produce together XML documents like user guides, security procedures, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next