Showing 402 open source projects for "scrape text from html"

View related business solutions
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
  • Achieve perfect load balancing with a flexible Open Source Load Balancer Icon
    Achieve perfect load balancing with a flexible Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.
  • 1
    Jupyter Notebook

    Jupyter Notebook

    Jupyter Interactive Notebook

    The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. The Jupyter notebook combines two components. A web application, which is a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output...
    Downloads: 1,535 This Week
    Last Update:
    See Project
  • 2
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 3
    CKEditor 5

    CKEditor 5

    Powerful rich text editor framework with a modular architecture

    CKEditor 5 is a powerful rich text editor framework with a modular architecture, modern integrations, and features like collaborative editing. CKEditor 5 provides every type of WYSIWYG editing solution imaginable. From editors similar to Google Docs and Medium, to Slack or Twitter like applications, all is possible within a single editing framework. Builds are ready-to-use solutions to common editing needs. Every build can be customized to include a completely custom set of features. Features...
    Downloads: 64 This Week
    Last Update:
    See Project
  • 4
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Gain Advanced Threat Protection for Your AWS Workloads Icon
    Gain Advanced Threat Protection for Your AWS Workloads

    Running FortiGate NGFW on AWS Graviton2 Lets You Boost Scalability With Reduced Compute Costs

    FortiGate-VM delivers comprehensive security and scalable VPN connectivity for your AWS workloads, while native AWS integrations unlock broad coverage for your environment. Now with support for AWS Graviton2 instances, FortiGate lets you optimize price performance and reduce your Amazon EC2 costs by up to 20 percent. Deploy today in AWS Marketplace.
  • 5
    Quill

    Quill

    Your powerful rich text editor

    ... behavior and HTML across all platforms. It’s being used in numerous projects already, from small to large Fortune 500 ones. See how well Quill can fit into your own project!
    Downloads: 15 This Week
    Last Update:
    See Project
  • 6
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 7
    Karate

    Karate

    Test automation made simple

    Karate is the only open-source tool to combine API test-automation, mocks, performance-testing and even UI automation into a single, unified framework. The BDD syntax popularized by Cucumber is language-neutral, and easy for even non-programmers. Assertions and HTML reports are built-in, and you can run tests in parallel for speed. There’s also a cross-platform stand-alone executable for teams not comfortable with Java. You don’t have to compile code. Just write tests in a simple, readable...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 8
    dirsearch

    dirsearch

    Web path scanner

    An advanced command-line tool designed to brute force directories and files in webservers, AKA web path scanner. Wordlist is a text file, each line is a path. About extensions, unlike other tools, dirsearch only replaces the %EXT% keyword with extensions from -e flag. For wordlists without %EXT% (like SecLists), -f | --force-extensions switch is required to append extensions to every word in wordlist, as well as the /. To use multiple wordlists, you can separate your wordlists with commas...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    Typed.js

    Typed.js

    A JavaScript typing animation library

    Typed.js is a library that types. Enter in any string, and watch it type at the speed you've set, backspace what it's typed, and begin a new sentence for however many strings you've set. Rather than using the strings array to insert strings, you can place an HTML div on the page and read from it. This allows bots and search engines, as well as users with JavaScript disabled, to see your text on the page. You can pause in the middle of a string for a given amount of time by including an escape...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Find out just how much your login box can do for your customer | Auth0 Icon
    Find out just how much your login box can do for your customer | Auth0

    With over 53 social login options, you can fast-track the signup and login experience for users.

    From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.
  • 10
    Markdig

    Markdig

    A fast, powerful, CommonMark compliant, extensible Markdown processor

    ... behavior. Parses trivia (whitespace, newlines and other characters) to support lossless parse ⭢ render roundtrip. This enables changing markdown documents without introducing undesired trivia changes. Special attributes or attached HTML attributes (inspired from PHP Markdown Extra - Special Attributes). Diagrams extension whenever a fenced code block contains a special keyword, it will be converted to a div block with the content as-is.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. Pdf text...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Markdown Monster

    Markdown Monster

    An extensible Markdown Editor, Viewer and Weblog Publisher for Windows

    ... and word counts keep your content streamlined. You can export Markdown to PDF or HTML on disk or copy Markdown selections as HTML to the clipboard. The HTML preview can display syntax-colored code snippets for most coding languages. Choose from light or dark app themes, and individual and fully customizable preview themes. Use the built-in folder browser to open, manage and drag files into content, use the document outline to quickly jump through content, or use our shell integration.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    MailCatcher

    MailCatcher

    Catches mail and serves it through a dream

    Catches mail and serves it through a dream. MailCatcher runs a super simple SMTP server that catches any message sent to it to display in a web interface. Run mailcatcher, set your favorite app to deliver to smtp://127.0.0.1:1025 instead of your default SMTP server, then check it out to see the mail that's arrived so far. Shows HTML, Plain Text and Source version of messages, as applicable. Rewrites HTML enabling display of embedded, inline images/etc and opens links in a new window. Command...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    bookdown

    bookdown

    Authoring Books and Technical Documents with R Markdown

    A open-source (GPL-3) R package to facilitate writing books and long-form articles/reports with R Markdown. Generate printer-ready books and ebooks from R Markdown documents. A markup language easier to learn than LaTeX, and to write elements such as section headers, lists, quotes, figures, tables, and citations. Multiple choices of output formats: PDF, LaTeX, HTML, EPUB, and Word. Possibility of including dynamic graphics and interactive applications (HTML widgets and Shiny apps) Support...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Chroma

    Chroma

    A general purpose syntax highlighter in pure Go

    As Chroma has just been released, its API is still in flux. That said, the high-level interface should not change significantly. Chroma takes source code and other structured text and converts it into syntax-highlighted HTML, ANSI-coloured text, etc. Chroma is based heavily on Pygments and includes translators for Pygments lexers and styles. ABAP, ABNF, ActionScript, ActionScript 3, Ada, Angular2, ANTLR, ApacheConf, APL, AppleScript, Arduino, Awk. PacmanConf, Perl, PHP, PHTML, Pig, PkgConfig...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Krajee

    Krajee

    An enhanced HTML 5 file input for Bootstrap 5.x/4.x./3.x

    An enhanced HTML 5 file input for Bootstrap 5.x or Bootstrap 4.x or Bootstrap 3.x with file preview for various files, offers multiple selection, and more. The plugin allows you a simple way to setup an advanced file picker/upload control built to work specially with Bootstrap CSS3 styles. It enhances the file input functionality further, by offering support to preview a wide variety of files i.e. images, text, html, video, audio, flash, and objects. In addition, it includes AJAX based uploads...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    pagedown

    pagedown

    Paginate the HTML Output of R Markdown with CSS for Print

    Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome or Microsoft Edge) to generate PDF. No need to install LaTeX to get beautiful PDFs. This R package stands on the shoulders of two giants to support typesetting with CSS for R Markdown documents: Paged.js and ReLaXed (we only borrowed some CSS from the ReLaXed repo and didn't really use the Node package).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Trix

    Trix

    A rich text editor for everyday writing

    A rich text editor for everyday writing. Compose beautifully formatted text in your web application. Trix is an editor for writing messages, comments, articles, and lists—the simple documents most web apps are made of. It features a sophisticated document model, support for embedded attachments, and outputs terse and consistent HTML. Trix is an open-source project from Basecamp, the creators of Ruby on Rails. Millions of people trust their text to Basecamp, and we built Trix to give them...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MultiMarkdown-6

    MultiMarkdown-6

    Lightweight markup processor to produce HTML, LaTeX, and more

    Lightweight markup processor to produce HTML, LaTeX, and more. MultiMarkdown is a superset of the Markdown lightweight markup syntax with support for additional output formats and features. Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Angular DataTables

    Angular DataTables

    DataTables with Angular

    An Angular2+ library for building complex HTML tables using DataTables JQuery plug-in. Implementation of the example on custom filtering with range search. The HTML element provides a Promise that returns the instance of the DataTable. Implementation of the example on individual column searching (text inputs). Sometimes, your DataTable options are stored or computed server-side. All you need to do is to return the expected result as a promise. You can use Angular Pipe to transform data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Laravel Response Cache

    Laravel Response Cache

    Speed up a Laravel app by caching the entire response

    This Laravel package can cache an entire response. By default, it will cache all successful get-requests that return text-based content (such as HTML and json) for a week. This could potentially speed up the response quite considerably. So the first time a request comes in the package will save the response before sending it to the users. When the same request comes in again we're not going through the entire application but just respond with the saved response.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Schema Spy

    Schema Spy

    SchemaSpy code home

    This is a new code repository for SchemaSpy tool initially created and maintained by John Currier. I personally believe that work on SchemaSpy should be continued, and a lot of still existing issues should be resolved. Last released version of the SchemaSpy was in 2010, and I have a plan to change this. Process of installation is very simple because SchemaSpy is only one Java .jar application. You can learn more read the installation doc. When you environment will be ready, and you can start...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Mithril.js

    Mithril.js

    A JavaScript framework for building brilliant applications

    ... be indented more naturally than HTML for complex tags, and since its syntax is just JavaScript, it's possible to leverage a lot of JavaScript tooling ecosystem. Mithril is all about getting meaningful work done efficiently. Doing file uploads? The docs show you how. Authentication? Documented too. Exit animations? You got it. No extra libraries, no magic.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next