Showing 36 open source projects for "scrape text from html"

View related business solutions
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • Free CRM Software With Something for Everyone Icon
    Free CRM Software With Something for Everyone

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    Think CRM software is just about contact management? Think again. HubSpot CRM has free tools for everyone on your team, and it’s 100% free. Here’s how our free CRM solution makes your job easier.
    Get free CRM
  • 1
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 2
    Fidus Writer

    Fidus Writer

    Fidus Writer is an online collaborative editor for academics

    Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    .... Due to its technological architecture design, OpenKM meets the document management needs of businesses of all sizes (from SMEs to big corporations). Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy
    Leader badge
    Downloads: 1,042 This Week
    Last Update:
    See Project
  • 4
    CopyQ

    CopyQ

    Clipboard manager with advanced features

    CopyQ is advanced clipboard manager with searchable and editable history with support for image formats, command line control and more.
    Leader badge
    Downloads: 140 This Week
    Last Update:
    See Project
  • Save hundreds of developer hours with components built for SaaS applications. Icon
    Save hundreds of developer hours with components built for SaaS applications.

    The #1 Embedded Analytics Solution for SaaS Teams.

    Whether you want full self-service analytics or simpler multi-tenant security, Qrvey’s embeddable components and scalable data management remove the guess work.
    Try Developer Playground
  • 5
    Writer2LaTeX and Writer2xhtml is a collection of converters from OpenDocument Format (ODF) to LaTeX/BibTeX, HTML+MathML and EPUB. It is delivered as a standalone java library, as a command line application and as extensions for LibreOffice.
    Leader badge
    Downloads: 43 This Week
    Last Update:
    See Project
  • 6
    Office Search

    Office Search

    Desktop Full-Text Search inside text and Microsoft Office files.

    Search inside Microsoft Office (Word, Excel, Power Point), LibreOffice (Writer, Calc, Impress), Visio and text/ASCII files (RTF/TXT/CSV/MD/HTML etc.). For all other files it will use fuzzy logic to check if file is text or binary. If text, it will search contents of the file for a match. Works on Windows 7 or above. Requires .NET framework 4.7 or above. Open source software developed in VB.NET 2019.
    Leader badge
    Downloads: 42 This Week
    Last Update:
    See Project
  • 7
    FastReport Open Source

    FastReport Open Source

    Free Open Source Reporting tool for .NET

    Free Open Source Reporting tool for .NET Core/.NET Framework that helps your application generate document-like reports.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 8
    Super PDF Editor Lite

    Super PDF Editor Lite

    Create, Edit, Delete, Organize , Convert, Export, Secure & Sign.

    Super PDF Editor Lite is a robust and versatile PDF management software designed to streamline your document handling needs. Whether you're an individual, student, or professional, this software offers a comprehensive suite of tools to create, edit, and manage your PDFs with ease. Key Features: Extract Page: Easily extract specific pages from a PDF document. Split Page: Divide a single PDF page into multiple smaller pages. Rotate Page: Rotate pages to adjust their orientation. Merge Page...
    Leader badge
    Downloads: 17 This Week
    Last Update:
    See Project
  • 9
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing in C++17/20

    DocWire SDK, a standout C++17/20 data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. The upcoming integration of C++17 and C++20 will bring advanced functionalities, particularly in areas like HTTP capabilities and web data extraction. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 10
    Java Tablesaw

    Java Tablesaw

    Java dataframe and visualization library

    Tablesaw is a dataframe and visualization library that supports loading, cleaning, transforming, filtering, and summarizing data. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and can be used to prepare data for working with machine learning libraries like Smile, Tribuo, H20.ai, DL4J. Import data from RDBMS, Excel, CSV, TSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.) Tablesaw supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    adx - addressbook.xml

    adx - addressbook.xml

    Minimalistic address book in web browser. No server or plugin needed.

    Minimalistic but full-featured addressbook in your web browser. adx is a standalone and portable web app (online and offline). FEATURES Contact Management, portable, small (~350KB), lightweight, contact tagging, geo mapping, web accounts, trigger phone/Skype calls, etc. EXPORT FUNCTIONALITY vCard (as file or QR code via offline generator) HOW IT WORKS Your address-book (XML file) is transformed in your web browser (via XSLT) to a full-featured web application (HTML...
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Pinasi(win32bit)

    Pinasi(win32bit)

    Array Data Processing Application

    Pinasi v1.15 Pinasi is a data processing application, which is used to input, process and output data. some examples of input are text, numbers, files, dates and others some examples of the process are mathematical. some examples of output are tables, graphs, pivots, and others. Pinasi is licensed under CC BY-NC 4.0. and created with NWjs. NW.js is an app runtime based on Chromium and node.js. You can write native apps in HTML and JavaScript with NW.js. It also lets you call...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Pinasi(win64bit)

    Pinasi(win64bit)

    Array Data Processing Application

    Pinasi v1.15 Pinasi is a data processing application, which is used to input, process and output data. some examples of input are text, numbers, files, dates and others some examples of the process are mathematical. some examples of output are tables, graphs, pivots, and others. Pinasi is licensed under CC BY-NC 4.0. and created with NWjs. NW.js is an app runtime based on Chromium and node.js. You can write native apps in HTML and JavaScript with NW.js. It also lets you call...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Tailwind Starter Kit

    Tailwind Starter Kit

    Tailwind Starter Kit a beautiful extension for TailwindCSS, Free

    Tailwind Starter Kit is Free and Open Source. It does not change or add any CSS to the already one from TailwindCSS. It features multiple HTML elements and it comes with dynamic components for ReactJS, Vue and Angular. Tailwind Starter Kit comes with a huge number of Fully Coded CSS components. This extension also comes with 3 sample pages. They are fully coded so you can start working instantly. We also feature many dynamic components for React, Vue and Angular. Putting together a page has...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HTML Article Generator

    HTML Article Generator

    Quickly create custom webpages from your content

    HTML Article Generator is a tool for quickly generating webpages based on content you enter, including both text and images. These webpages can be customised to give a unique appearance, with a selection of 5 different themes. Other features include the ability to save the current values you have entered and restore these values after future changes have been made. Images can have caption text added to them and given alt text to improve accessibility. Each webpage can also be given...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    IPyPublish

    IPyPublish

    Workflow for creating and editing publication ready scientific reports

    A program for creating and editing publication-ready scientific reports and presentations, from one or more Jupyter Notebooks. Dynamically (and reproducibly) explore data, run code, and output the results. Dynamically edit and visualize the basic components of the document (text, math, figures, tables, references, citations, etc). Have precise control over what elements are output to the final document and how they are layed out and typeset. Also be able to output the same source document...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Data Science at the Command Line

    Data Science at the Command Line

    Data science at the command line

    Command Line by Jeroen Janssens, published by O’Reilly Media in October 2021. Obtain, scrub, explore, and model data with Unix Power Tools. This repository contains the full text, data, and scripts used in the second edition of the book Data Science at the Command Line by Jeroen Janssens. This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Helpy

    Helpy

    A Modern Helpdesk Platform

    Helpy is a modern, self-hosted, on-premise customer support helpdesk platform designed from the ground up to give your customers a heroic customer service experience. Written in Ruby on Rails, Helpy seamlessly integrates support ticketing, Knowledgebase and a public community into one powerful solution. Helpy powers your helpcenter by providing a host of exceptional features, including multichannel ticketing, a full text searchable and SEO optimized Knowledgebase, community support forums...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LimeReport

    LimeReport

    Report generator for Qt Framework

    ... use SQL database or data passed from application using QAbstractTableModel interface. Besides one can initialize variables which available as database request parameters. LimeReport goal is to provide your application with functionaly abundant and at the same time simple to use tool for a report generation to be used even by inexperienced in IT users.
    Leader badge
    Downloads: 43 This Week
    Last Update:
    See Project
  • 20
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 21
    Cleaver

    Cleaver

    30-second slideshows for hackers

    Cleaver is a one-stop-shop for generating HTML presentations in record time. Using some spiced up markdown, you can produce good-looking, interactive presentations with a just a few lines of text. Cleaver supports several basic options that allow you to further customize the look and feel of your presentation, including author info, stylesheets, and custom templates. Cleaver has substantial theme support to give you more fine-grained control over your presentation, similar to options. Instead...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    JAWS - Just Another Web Scraper

    JAWS - Just Another Web Scraper

    A simple Web Scraper using Regular Expression or Html Agility

    JAWS or Just Another Web Scraper, is part of the Data Scraping Softwares developed by SVbook, alongside JATI (Image to Text) and JAVT (Video to Text). JAWS offer easy interface to scrape data from the website using regular expression, text preprocessing, or HTML Agility Pack.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CuteReport

    CuteReport

    Qt based report solution

    ..., but yet simple to use for inexperienced user and report designers, reporting system. It is supposed to be a product that combines eXaro ideas from Qt world and FastReport functionality from Delphi world and brings the best of them to C++/Qt world and then shares it with Python, Ruby, Perl developers using bindings. СuteReport also has a commercial version. Read about it on the official web site.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 24

    Convert HTML to PDF in .NET with C#

    Convert HTML to PDF in .NET with C# using EVO HTML to PDF for .NET

    EVO HTML to PDF Converter for .NET is a library that can be easily integrated and distributed in your ASP.NET and MVC web sites, desktop applications, Windows services and Azure cloud services to convert web pages, HTML strings and streams to PDF, to images or to SVG and to create nicely formatted and easily maintainable PDF reports and documents. The converter has full support for HTML5, CSS3, SVG, Canvas, Web Fonts and JavaScript. Does not require installation or any third party tools...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    PanDocElectron

    Graphical User Interface for PanDoc for Linux, Mac & Windows

    PanDoc Graphical User Interface implemented with Electron for Linux, Mac and Windows. It support users in converting source documents into various other formats like docx, odt, html and reveal documentation. The zip files contain the full source code because PanDocElectron is written in HTML/Javascript. Electron is used more or less as browser that runs the HTML/Javascript application. [Download PanDocElectron](https://sourceforge.net/p/pandocelectron/wiki/Home/) Extract the zip-file from...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next