Showing 332 open source projects for "scrape text from html"

View related business solutions
  • Sage Intacct Cloud Accounting and Financial Management Software Icon
    Sage Intacct Cloud Accounting and Financial Management Software

    Cloud accounting, payroll, and HR that grows with you

    Drive your organization forward with the right solution at the right price. AI-powered continuous accounting and ERP to support your growth now and into the future.
  • Manage Properties Better For Free Icon
    Manage Properties Better For Free

    For small to mid-sized landlords and property managers

    Innago is a free and easy-to-use property management solution. Whether you have 1 unit or 1000, student housing, or commercial properties, Innago is built for you. Our software is designed to save you time and money, so you can spend more time doing the things that matter most.
  • 1
    Jupyter Notebook

    Jupyter Notebook

    Jupyter Interactive Notebook

    The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. The Jupyter notebook combines two components. A web application, which is a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output...
    Downloads: 2,318 This Week
    Last Update:
    See Project
  • 2
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 3
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Quill

    Quill

    Your powerful rich text editor

    ... behavior and HTML across all platforms. It’s being used in numerous projects already, from small to large Fortune 500 ones. See how well Quill can fit into your own project!
    Downloads: 23 This Week
    Last Update:
    See Project
  • Rent Manager Software Icon
    Rent Manager Software

    Landlords, multi-family homes, manufactured home communities, single family homes, associations, commercial properties and mixed portfolios.

    Rent Manager is award-winning property management software built for residential, commercial, and short-term-stay portfolios of any size. The program’s fully customizable features include a double-entry accounting system, maintenance management/scheduling, marketing integration, mobile applications, more than 450 insightful reports, and an API that integrates with the best PropTech providers on the market.
  • 5
    Karate

    Karate

    Test automation made simple

    Karate is the only open-source tool to combine API test-automation, mocks, performance-testing and even UI automation into a single, unified framework. The BDD syntax popularized by Cucumber is language-neutral, and easy for even non-programmers. Assertions and HTML reports are built-in, and you can run tests in parallel for speed. There’s also a cross-platform stand-alone executable for teams not comfortable with Java. You don’t have to compile code. Just write tests in a simple, readable...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 6
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 7
    Typed.js

    Typed.js

    A JavaScript typing animation library

    Typed.js is a library that types. Enter in any string, and watch it type at the speed you've set, backspace what it's typed, and begin a new sentence for however many strings you've set. Rather than using the strings array to insert strings, you can place an HTML div on the page and read from it. This allows bots and search engines, as well as users with JavaScript disabled, to see your text on the page. You can pause in the middle of a string for a given amount of time by including an escape...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    Trix

    Trix

    A rich text editor for everyday writing

    A rich text editor for everyday writing. Compose beautifully formatted text in your web application. Trix is an editor for writing messages, comments, articles, and lists—the simple documents most web apps are made of. It features a sophisticated document model, support for embedded attachments, and outputs terse and consistent HTML. Trix is an open-source project from Basecamp, the creators of Ruby on Rails. Millions of people trust their text to Basecamp, and we built Trix to give them...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    MailCatcher

    MailCatcher

    Catches mail and serves it through a dream

    Catches mail and serves it through a dream. MailCatcher runs a super simple SMTP server that catches any message sent to it to display in a web interface. Run mailcatcher, set your favorite app to deliver to smtp://127.0.0.1:1025 instead of your default SMTP server, then check it out to see the mail that's arrived so far. Shows HTML, Plain Text and Source version of messages, as applicable. Rewrites HTML enabling display of embedded, inline images/etc and opens links in a new window. Command...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Integrate in minutes with our email API and trust your emails reach the inbox | SendGrid Icon
    Integrate in minutes with our email API and trust your emails reach the inbox | SendGrid

    Leverage the email service that customer-first brands trust for reliable inbox delivery at scale.

    Email is the backbone of your customer engagement. The Twilio SendGrid Email API is the email service trusted by developers and marketers for time-savings, scalability, and delivery expertise. Our flexible Email API and proprietary Mail Transfer Agent (MTA), intuitive console, powerful features, and email experts make it easy to ensure all your email gets delivered in seconds and without interruption.
  • 10
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. Pdf text...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Markdown Monster

    Markdown Monster

    An extensible Markdown Editor, Viewer and Weblog Publisher for Windows

    ... and word counts keep your content streamlined. You can export Markdown to PDF or HTML on disk or copy Markdown selections as HTML to the clipboard. The HTML preview can display syntax-colored code snippets for most coding languages. Choose from light or dark app themes, and individual and fully customizable preview themes. Use the built-in folder browser to open, manage and drag files into content, use the document outline to quickly jump through content, or use our shell integration.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Markdig

    Markdig

    A fast, powerful, CommonMark compliant, extensible Markdown processor

    ... behavior. Parses trivia (whitespace, newlines and other characters) to support lossless parse ⭢ render roundtrip. This enables changing markdown documents without introducing undesired trivia changes. Special attributes or attached HTML attributes (inspired from PHP Markdown Extra - Special Attributes). Diagrams extension whenever a fenced code block contains a special keyword, it will be converted to a div block with the content as-is.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Inkdown

    Inkdown

    A WYSIWYG Markdown editor, improve reading and editing experience

    Inkdown (bluestone) is a Markdown reading, editing, and sharing tool. Almost fully compatible with the GitHub Flavored Markdown standard, while extending the Mermaid graphics and Katex formula, supporting light and dark styles, and somewhat different from other WYSIWYG editors, Inkdown does not pursue complete customization. Its core goal is comfortable reading, smooth editing of Markdown, and document sharing in the simplest way possible. As a document publisher, markdown source code mode...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Angular DataTables

    Angular DataTables

    DataTables with Angular

    An Angular2+ library for building complex HTML tables using DataTables JQuery plug-in. Implementation of the example on custom filtering with range search. The HTML element provides a Promise that returns the instance of the DataTable. Implementation of the example on individual column searching (text inputs). Sometimes, your DataTable options are stored or computed server-side. All you need to do is to return the expected result as a promise. You can use Angular Pipe to transform data...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Application Inspector

    Application Inspector

    A source code analyzer built for surfacing features of interest

    Microsoft Application Inspector is a software source code characterization tool that helps identify coding features of first or third party software components based on well-known library/API calls and is helpful in security and non-security use cases. It uses hundreds of rules and regex patterns to surface interesting characteristics of source code to aid in determining what the software is or what it does from what file operations it uses, encryption, shell operations, cloud API's, frameworks...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    vimwiki

    vimwiki

    Personal Wiki for Vim

    Vimwiki is a personal wiki for Vim, interlinked, plain text files written in a markup language. Organize notes and ideas and quickly create links between them, manage todo-lists, and write a diary. VimWiki is a personal wiki for Vim, a number of linked text files that have their own syntax highlighting. See the VimWiki Wiki for an example website built with VimWiki! Three markup syntaxes supported, Vimwiki's own syntax, Markdown, MediaWiki. Export everything to HTML, link to other wiki pages...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Relative-Time Element

    Relative-Time Element

    Web component extensions to the standard <time> element

    Formats a timestamp as a localized string or as relative text that auto-updates in the user's browser. This allows the server to cache HTML fragments containing dates and lets the browser choose how to localize the displayed time according to the user's preferences. Every visitor is served the same markup from the server's cache. When it reaches the browser, the custom relative-time JavaScript localizes the element's text into the local timezone and formatting. Dates are displayed before months...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Remarkable for Linux

    Remarkable for Linux

    The Markdown Editor for Linux

    With Live Preview you can see your changes as you make them. There is no need to export first to check your syntax. This is accompanied by synchronized scrolling. Remarkable has Github Flavoured Markdown. This has a simple, easy-to-learn syntax with features like checklists, highlighting, links, images and more. Remarkable allows you to export your files to PDF and HTML from within the app. The HTML code is even prettified and PDFs have a TOC. You can style your markdown documents however you...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    MudBlazor

    MudBlazor

    Do more with Blazor, utilizing CSS and keeping Javascript to a minimum

    Trusted by thousands of users, from hobby developers to large enterprises. Use MudBlazor to rapidly build amazing web applications without leaving your loved C# language and toolchain. We bring together everything that's required to build amazing Blazor applications that scale from desktop to mobile. Apart from the library itself we also provide templates, a learning platform, theme manager, demo and example projects as well as an online code editor integrated with our documentation and issue...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Fidus Writer

    Fidus Writer

    Fidus Writer is an online collaborative editor for academics

    Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    pagedown

    pagedown

    Paginate the HTML Output of R Markdown with CSS for Print

    Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome or Microsoft Edge) to generate PDF. No need to install LaTeX to get beautiful PDFs. This R package stands on the shoulders of two giants to support typesetting with CSS for R Markdown documents: Paged.js and ReLaXed (we only borrowed some CSS from the ReLaXed repo and didn't really use the Node package).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Schema Spy

    Schema Spy

    SchemaSpy code home

    This is a new code repository for SchemaSpy tool initially created and maintained by John Currier. I personally believe that work on SchemaSpy should be continued, and a lot of still existing issues should be resolved. Last released version of the SchemaSpy was in 2010, and I have a plan to change this. Process of installation is very simple because SchemaSpy is only one Java .jar application. You can learn more read the installation doc. When you environment will be ready, and you can start...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    MultiMarkdown-6

    MultiMarkdown-6

    Lightweight markup processor to produce HTML, LaTeX, and more

    Lightweight markup processor to produce HTML, LaTeX, and more. MultiMarkdown is a superset of the Markdown lightweight markup syntax with support for additional output formats and features. Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next