Showing 17 open source projects for "scrape text from html"

View related business solutions
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • Free CRM Software With Something for Everyone Icon
    Free CRM Software With Something for Everyone

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    Think CRM software is just about contact management? Think again. HubSpot CRM has free tools for everyone on your team, and it’s 100% free. Here’s how our free CRM solution makes your job easier.
    Get free CRM
  • 1
    OmegaT - multiplatform CAT tool

    OmegaT - multiplatform CAT tool

    The free computer aided translation (CAT) tool for professionals

    OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects.
    Leader badge
    Downloads: 1,788 This Week
    Last Update:
    See Project
  • 2
    Writer2LaTeX and Writer2xhtml is a collection of converters from OpenDocument Format (ODF) to LaTeX/BibTeX, HTML+MathML and EPUB. It is delivered as a standalone java library, as a command line application and as extensions for LibreOffice.
    Leader badge
    Downloads: 43 This Week
    Last Update:
    See Project
  • 3
    Kisekae UltraKiss

    Kisekae UltraKiss

    Kisekae UltraKiss is a full featured integrated development environmen

    UltraKiss is a computer program that implements the Kisekae Set system, KiSS, a Japanese graphics system originally developed to facilitate costume changes on virtual dolls. UltraKiss was developed to help artists build their KiSS sets. It is a full featured viewer for all KiSS dolls, games, and visual applications. It is also a complete graphical development environment for creating KiSS applications. It fully implements the FKiSS event driven programming language up to and including...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4

    RecordEditor

    Editor for Fixed Width, Csv and Existing Xml files.

    The RecordEditor is a Data File editor for Flat Files (delimited and fixed field position). It supports Unix / PC / Legacy (e.g. Mainframe) file formats, both Text and binary files. The Editor uses a Record-Layout description to format the files. This is ideal for Fixed width (Text or Binary) files, Cobol Data Files, Mainframe files and complicated Csv files. Cobol Copybooks can be used to format Cobol Data files. As well as an editor, The following utilities are supplied * Formatted...
    Leader badge
    Downloads: 43 This Week
    Last Update:
    See Project
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 5
    XML Editor/Validator/Designer with CAMV

    XML Editor/Validator/Designer with CAMV

    CAM XML Editor for XML+JSON+Hibernate+SQL Open-XDX sponsored by Oracle

    ..., & OASIS modes) + JAXB bindings; Mindmap FreeMind or UML models(XMI); XML unit test & live SQL data; HTML docs + spreadsheets (NIEM IEPDs). Canonical component dictionaries from schema sets, SQL, JSON, ERwin XSD, or spreadsheets. The XML CAM templates (OASIS standard) store the exchange structure, content model, code lists, DBMappings, SQL lookups+business rules (XPath). Java CAMV XML/JSON validation engine is a complete exchange test framework [XMLUnit, TEAM(Schematron)]. Java/Eclipse +Saxon/XSL
    Downloads: 39 This Week
    Last Update:
    See Project
  • 6
    Web Book Downloader

    Web Book Downloader

    Download websites as e-book: pdf, txt, epub.

    This application allows user to download chapters from website in 3 ways: - from table of contents; - from range: first chapter address, last chapter address; - by crawling from first chapter to n; In settings you can customize language, input(website encoding) for simplicity output is in the same encoding. If you want your language add new class into strings package, and new fields into Settings class and GUI menu(initialize method).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest versions...
    Leader badge
    Downloads: 511 This Week
    Last Update:
    See Project
  • 8
    Command-line/Ant-task/embeddable text file preprocessor. Macros, flow control, expressions. Recursive directory processing. Extensible in Java to display data from any data sources (as database). Can generate complete homepages (tree of HTML-s, images, etc.)
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    NAT Braille

    NAT Braille

    A free universal Braille Transcriber

    NAT is a free universal Braille translator. It supports French Braille grade 1, mathematical Braille, Braille layout and reverse transcription. French Braille grade 2, music and other languages are currently under development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Red Hat Ansible Automation Platform on Microsoft Azure Icon
    Red Hat Ansible Automation Platform on Microsoft Azure

    Red Hat Ansible Automation Platform on Azure allows you to quickly deploy, automate, and manage resources securely and at scale.

    Deploy Red Hat Ansible Automation Platform on Microsoft Azure for a strategic automation solution that allows you to orchestrate, govern and operationalize your Azure environment.
    Learn More
  • 10
    Asterix IDE
    ... a text editor. Asterix IDE offers superior support for JAVA and HTML5 developers, providing comprehensive editors and tools. Asterix IDE can be installed on all operating systems that support Java, from Windows to Linux to Mac OS systems. Write Once, Run Anywhere, is as true for Asterix IDE as it is for your own applications, because Asterix IDE itself is written in Java, too!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    SutraReader

    Arranges a Sutra text in the traditional layout

    This is an application designed to arrange / lay out a Chinese or Japanese Sutra text in the traditional layout (from top to bottom, from right to left). The input can be any file (the application can pick out the relevant parts) and the output is the layout (arranged in HTML file(s)) and the content (a plain text file with the content). Beside this, you can get a statistics about the ideograms and can exclude certain characters or ideograms from the content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PODR is a PHP mailmerging and converting library mostly designed to parse and convert ODT templates to DOC/PDF. Templating is based on Savant, Conversion uses a webservice of JODConverter. A filter is available to include runtime generated images.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Use Xilize to create XHTML pages or entire websites with just a plain-text editor. The markup is similar to Textile and extensible via BeanShell. Run as a jEdit plugin, from the command line, or embed in a Java program. Small, fast, easy-to-use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Strip out useless tags and other junk from HTML files. Shrink files, enhance readability of HTML source, promote privacy, and clean HTML exported from Microsoft Word (MS-Word). Run HTMLStrip as-is or customize it with your own regular expressions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    EsTexte is a text-to-HTML based on an intuitive text format akin to various wiki formats and ascii text files. Written in Java, it can be used from the command-line or from other Java programs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    WH2FO is a java application that separate the content and the stile from an html file generated by Word 2000. The conversion is made in a way that the content will be stored inside an XML file and the style is saved in a XSL Attribute file. WH2FO also gen
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    A knowledgment management system written in Java under JBoss 4.2.3 Server, with richfaces 3.3.0BETA4. Including fileconversion from html to pdf and rich:editor component without special syntaxing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next