Showing 66 open source projects for "scrape text from html"

View related business solutions
  • Speech-to-Text: Automatic Speech Recognition Icon
    Speech-to-Text: Automatic Speech Recognition

    Accurately convert voice to text in over 125 languages and variants by applying Google's powerful machine learning models with an easy-to-use API.

    New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.
  • Omnichannel contact center platform for enterprises. Icon
    Omnichannel contact center platform for enterprises.

    For Call centers or BPOs with a very high volume of calls

    Deliver a personalized customer experience with every interaction, across every channel, with uContact, net2phone’s cloud contact center solution.
  • 1
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Karate

    Karate

    Test automation made simple

    Karate is the only open-source tool to combine API test-automation, mocks, performance-testing and even UI automation into a single, unified framework. The BDD syntax popularized by Cucumber is language-neutral, and easy for even non-programmers. Assertions and HTML reports are built-in, and you can run tests in parallel for speed. There’s also a cross-platform stand-alone executable for teams not comfortable with Java. You don’t have to compile code. Just write tests in a simple, readable...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 3
    Schema Spy

    Schema Spy

    SchemaSpy code home

    This is a new code repository for SchemaSpy tool initially created and maintained by John Currier. I personally believe that work on SchemaSpy should be continued, and a lot of still existing issues should be resolved. Last released version of the SchemaSpy was in 2010, and I have a plan to change this. Process of installation is very simple because SchemaSpy is only one Java .jar application. You can learn more read the installation doc. When you environment will be ready, and you can start...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    OmegaT - multiplatform CAT tool

    OmegaT - multiplatform CAT tool

    The free computer aided translation (CAT) tool for professionals

    OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects.
    Leader badge
    Downloads: 1,814 This Week
    Last Update:
    See Project
  • The Voice API that just works | Twilio Icon
    The Voice API that just works | Twilio

    Build a scalable voice experience with the API that's connecting millions around the world.

    With Twilio Voice, you can build unique phone call experiences with one API, to create, receive, control and monitor calls with just a few lines of code. Create an engaging voice experience that you can quickly scale and modify with a wide array of customization options and resources.
  • 5
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    .... Due to its technological architecture design, OpenKM meets the document management needs of businesses of all sizes (from SMEs to big corporations). Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy
    Leader badge
    Downloads: 1,130 This Week
    Last Update:
    See Project
  • 6
    iSphere

    iSphere

    The iSphere Project for and RDi 9.5.1.3+

    ... screen. That is where the iSphere Project comes into play, filling in those gaps. The iSphere library requires V7R1 or higher. For lower releases you can try to compile the library from an i Project by hand. Refer to the iSphere help for details. Available at SourceForge since December 3rd, 2013.
    Leader badge
    Downloads: 260 This Week
    Last Update:
    See Project
  • 7
    Java Tablesaw

    Java Tablesaw

    Java dataframe and visualization library

    Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.) Tablesaw supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Writer2LaTeX and Writer2xhtml is a collection of converters from OpenDocument Format (ODF) to LaTeX/BibTeX, HTML+MathML and EPUB. It is delivered as a standalone java library, as a command line application and as extensions for LibreOffice.
    Leader badge
    Downloads: 58 This Week
    Last Update:
    See Project
  • 9

    RecordEditor

    Editor for Fixed Width, Csv and Existing Xml files.

    The RecordEditor is a Data File editor for Flat Files (delimited and fixed field position). It supports Unix / PC / Legacy (e.g. Mainframe) file formats, both Text and binary files. The Editor uses a Record-Layout description to format the files. This is ideal for Fixed width (Text or Binary) files, Cobol Data Files, Mainframe files and complicated Csv files. Cobol Copybooks can be used to format Cobol Data files. As well as an editor, The following utilities are supplied * Formatted...
    Leader badge
    Downloads: 69 This Week
    Last Update:
    See Project
  • Translate docs, audio, and videos in real time with Google AI Icon
    Translate docs, audio, and videos in real time with Google AI

    Make your content and apps multilingual with fast, dynamic machine translation available in thousands of language pairs.

    Google Cloud’s AI-powered APIs help you translate documents, websites, apps, audio files, videos, and more at scale with best-in-class quality and enterprise-grade control and security.
  • 10
    Kisekae UltraKiss

    Kisekae UltraKiss

    Kisekae UltraKiss is a full featured integrated development environmen

    UltraKiss is a computer program that implements the Kisekae Set system, KiSS, a Japanese graphics system originally developed to facilitate costume changes on virtual dolls. UltraKiss was developed to help artists build their KiSS sets. It is a full featured viewer for all KiSS dolls, games, and visual applications. It is also a complete graphical development environment for creating KiSS applications. It fully implements the FKiSS event driven programming language up to and including...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    TeXtidote

    TeXtidote

    Spelling, grammar and style checking on LaTeX documents

    If so, you probably know that the process is far from simple. Since LaTeX documents contain special commands and keywords (the so-called "markup") that are not part of the "real" text, you cannot run a grammar checker directly on these files: it cannot tell the difference between markup and text. The other option is to remove all this markup, leaving only the "clear" text; however, when a grammar tool points to a problem at a specific line in this clear text, it becomes hard to retrace...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    JQM Java Quine McCluskey

    JQM Java Quine McCluskey

    JQM - Java Quine McCluskey for minimization of Boolean functions.

    ... and editing the truth table that can be saved and loaded. The results can be exported in HTML format. It generates the Karnaugh Map for educational purposes and the actual truth table from the obtained expressions even when multiple solutions for each function are found. This implementation supports PLC programming, so results can be presented in many forms including Structured Text (ST) and Ladder Diagram (LD) along with conventional Boolean expression.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    JDynamiTe, Dynamic Template in Java

    JDynamiTe, Dynamic Template in Java

    Dynamically generate documents from templates

    JDynamiTe is a tool which allows you to dynamically create documents in any format from "template" documents. And very few lines of code (or no line at all!) are needed to do that. Some typical usage domains of JDynamiTe are: - dynamic Web pages creation, - text document generation, - source code generation... In fact, it can be useful in any case where pre-defined documents (templates) have to be dynamically populated with data. The main benefit of JDynamiTe is to allow a true...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Halimede

    Halimede

    Halimede Certificate Authority

    Halimede is a simple to use Certificate Authority. It supports multiple CA (Certificate Authorities) from a single interface, with each CA is stored within it's own datastore instance. Halimede supports a large range of public key ciphers, including RSA, DSA, ECDSA (NIST/SEC/ANSI X9.62/Brainpool Curves), EdDSA (ED25519/ED448), GOST R34.10, DSTU 4145-2002 and numerous Post-Quantum Ciphers including Rainbow, SPHINCS-256, XMSS/XMSS-MT and qTESLA for X509 Certificate generation. Halimede...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HTML Article Generator

    HTML Article Generator

    Quickly create custom webpages from your content

    HTML Article Generator is a tool for quickly generating webpages based on content you enter, including both text and images. These webpages can be customised to give a unique appearance, with a selection of 5 different themes. Other features include the ability to save the current values you have entered and restore these values after future changes have been made. Images can have caption text added to them and given alt text to improve accessibility. Each webpage can also be given...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    XML Editor/Validator/Designer with CAMV

    XML Editor/Validator/Designer with CAMV

    CAM XML Editor for XML+JSON+Hibernate+SQL Open-XDX sponsored by Oracle

    ..., & OASIS modes) + JAXB bindings; Mindmap FreeMind or UML models(XMI); XML unit test & live SQL data; HTML docs + spreadsheets (NIEM IEPDs). Canonical component dictionaries from schema sets, SQL, JSON, ERwin XSD, or spreadsheets. The XML CAM templates (OASIS standard) store the exchange structure, content model, code lists, DBMappings, SQL lookups+business rules (XPath). Java CAMV XML/JSON validation engine is a complete exchange test framework [XMLUnit, TEAM(Schematron)]. Java/Eclipse +Saxon/XSL
    Downloads: 39 This Week
    Last Update:
    See Project
  • 17
    SQLeo Visual Query Builder

    SQLeo Visual Query Builder

    Helping users to quickly understand SQL queries

    SQLeo is a professional lightweight SQL Query tool that permits to create or display complex sql queries (from OBIEE, Microstrategy, SSRS, Cognos, Hyperion, Pentaho ...) and permits to reverse engineer database models as db designers do. This SQL GUI supports all JDBC drivers: Oracle, MySQL, MSSQL, Firebird, HSQLDB, H2, PostgreSQL, CsvJdbc, SQLite, UCanAccess, MonetDB ... (but MySQL jdbc and CsvJdbc are the only driver included in the package) Can be compared with : FreeQueryBuilder...
    Leader badge
    Downloads: 41 This Week
    Last Update:
    See Project
  • 18
    JCppEdit v4.0

    JCppEdit v4.0

    Best IDE for Beginners

    JCppEdit is a free as well as "best IDE for Beginners" and is your one-stop IDE for all your coding needs. Whether you need to finish your Java project or submit your first HTML web page or perhaps have a need to code in C language while executing a java program into a Java IDE, JCppEdit will help you achieve your goals easily. Exploring your project and detecting an error in your code is much easier. You will not waste time detecting errors before compiling codes because you will get a real...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19

    FreeMarker template engine

    Generates text that depends on changing data (like dynamic HTML).

    FreeMarker is a template engine. That is, it provides an easy way to generate text (HTML, source code, configuration files, emails, etc.) that depends on changing data. It's designed to separate the rendering/formatting logic (like visual design, HTML issues, etc.) from the backing application logic and technical complexity. It has a flexible API so you can integrate it into your application the way that best fits it.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 20
    Android File Search

    Android File Search

    Show files on Android device

    A new version has been uploaded. Filename is "filesearch4.apk". This new version allows to delete files, rename files, delete directories, rename directories, and view text files when the file extension is : txt, xml, htm, html, xsl, xslt, text, ascii, ini, asp, aspx, java, kt, cs, js, jsp, php, bat, css, csv, kml, svg (feel free to ask for more file extensions, that should be a setting in the application in the future). The application starts browsing from the internal storage...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Web Book Downloader

    Web Book Downloader

    Download websites as e-book: pdf, txt, epub.

    This application allows user to download chapters from website in 3 ways: - from table of contents; - from range: first chapter address, last chapter address; - by crawling from first chapter to n; In settings you can customize language, input(website encoding) for simplicity output is in the same encoding. If you want your language add new class into strings package, and new fields into Settings class and GUI menu(initialize method).
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    LumberJack4Logs
    LumberJack4Logs is a viewer for log and trace files with the ability to extend the recognized data formats by adding text parser plugins.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest versions...
    Leader badge
    Downloads: 600 This Week
    Last Update:
    See Project
  • 24
    KAREL 3D

    KAREL 3D

    Learning programming language for kids

    This is learning programming language for children Karel-3D. By words from LightBot: "Get kids hooked on coding with minutes!" Created by Karel 3D from the 8-bit microcomputer PMD 85-2 in 1986. His later version of Karel the Robot in 3D, created first in the Slovak Republic. JavaScript variant include only one small HTML file tested and works on all devices with keyboard and full JavaScript support in internet browser, or alternative pre-compiled JAVA V8 .jar file with webEngine...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Command-line/Ant-task/embeddable text file preprocessor. Macros, flow control, expressions. Recursive directory processing. Extensible in Java to display data from any data sources (as database). Can generate complete homepages (tree of HTML-s, images, etc.)
    Leader badge
    Downloads: 14 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next