Showing 30 open source projects for "pdf data mining"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Dompdf

    Dompdf

    HTML to PDF converter for PHP

    dompdf is an HTML to PDF converter. At its heart, dompdf is (mostly) a CSS 2.1 compliant HTML layout and rendering engine written in PHP. It is a style-driven renderer, it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes. PDF rendering is currently provided either by PDFLib or by a bundled version the R&OS CPDF class written by Wayne Munro. (Some important changes have...
    Downloads: 111 This Week
    Last Update:
    See Project
  • 2
    DeckTape

    DeckTape

    PDF exporter for HTML presentations

    DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    Crowbook

    Crowbook

    Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

    Crowbook's aim is to allow you to write a book in Markdown without worrying about formatting or typography and let the program generate HTML, PDF and EPUB output for you. Its focus is novels and fiction, and the default settings should (hopefully) generate readable books with correct typography without requiring you to worry about it. To see what Crowbook's output looks like, you can read the Crowbook guide rendered in HTML, PDF or EPUB. Crowbook will parse this file and generate HTML, EPUB,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    pagedown

    pagedown

    Paginate the HTML Output of R Markdown with CSS for Print

    Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome or Microsoft Edge) to generate PDF. No need to install LaTeX to get beautiful PDFs. This R package stands on the shoulders of two giants to support typesetting with CSS for R Markdown documents: Paged.js and ReLaXed (we only borrowed some CSS from the ReLaXed repo and didn't really use the Node package).
    Downloads: 1 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 5
    Shower Presentation Template

    Shower Presentation Template

    Shower HTML presentation engine

    Shower Presentation Template is a shower HTML presentation engine. Built on HTML, CSS and vanilla JavaScript, works in all modern browsers. Themes are separated from engine, and comes with fully keyboard accessible. Printable to PDF and includes Ribbon and Material themes, and core with plugins. You’ll need Node.js installed on your computer. Latest stable versions of Chrome, Edge, Firefox, and Safari are supported.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Ray Tracing in One Weekend Book Series

    Ray Tracing in One Weekend Book Series

    The Ray Tracing in One Weekend series of books

    The Ray Tracing in One Weekend series of books are now available to the public for free online. They are now released under the CC0 license. This means that they are as close to public domain as we can get. (While that also frees you from the requirement of providing attribution, it would help the overall project if you could point back to this web site as a service to other users.) These books are formatted for printing directly from your browser, where you can also (on most browsers) save...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    Markdown Monster

    Markdown Monster

    An extensible Markdown Editor, Viewer and Weblog Publisher for Windows

    Markdown Monster is a powerful, yet easy-to-use Markdown editor with syntax highlighting and sophisticated and fast edit features. A collapsible, synced, live preview lets you see your output as you type and scroll. Easily embed or paste images, links, tables and code using raw markup or our smart UI helpers to simplify many operations with a few keystrokes or a click or two. Paste images from the clipboard or drag and drop from Explorer or our built-in file browser. Inline spell-checking...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    docconv

    docconv

    Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

    A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. See go help install for details on the installation location of the installed docd executable. Make sure that the full path to the executable is in your PATH environment variable. To add image support to the docconv library you first need to install and build gosseract. Now you can add -tags ocr to any go command when building/fetching/testing...
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    html-pdf-chrome

    html-pdf-chrome

    HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium

    HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium. This library is NOT meant to accept untrusted user input. Doing so may have serious security risks such as Server-Side Request Forgery (SSRF). If you run into CORS issues, try using the --disable-web-security Chrome flag, either when you start Chrome externally, or in options.chromeFlags. This option should only be used if you fully trust the code you are executing during a print job. It is strongly recommended that you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    backslide

    backslide

    CLI tool for making HTML presentations with Remark.js using Markdown

    CLI tool for making HTML presentations with Remark.js using Markdown. Use bs init to create a new presentation along with a template directory in the current directory. The template directory is needed for backslide to transform your Markdown files into HTML presentations. You can create as many markdown presentations as you want in the directory, they will all be based on the same template. Use bs serve to start a development server with live reload. A page will automatically open in your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    node-html-pdf

    node-html-pdf

    HTML to PDF converter that uses phantomjs

    HTML to PDF converter that uses phantomjs. html-pdf can read the header or footer either out of the footer and header config object or out of the HTML source. You can either set a default header & footer or overwrite that by appending a page number (1 based index) to the id="pageHeader" attribute of an HTML tag. You can use any combination of those tags. The library tries to find any element, that contains the page header or pageFooter id prefix. The full options object gets converted to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Myrtille

    Myrtille

    A native HTML4 / HTML5 Remote Desktop Protocol and SSH client

    Myrtille provides simple and fast access to remote desktops, applications, and SSH servers through a web browser, without any plugin, extension or configuration. Technically, Myrtille is an HTTP(S) to RDP and SSH gateway. User input (keyboard, mouse, touchscreen) is forwarded from a web browser to an HTTP(S) gateway, then up to an RDP (or SSH) client which maintains a session with an RDP (or SSH) server. The display resulting (or not) of such actions is streamed back to the browser, from the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    pdf2htmlEX

    pdf2htmlEX

    Convert PDF to HTML without losing text or format

    pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. It aims to provide an accurate rendering, while being optimized for Web display. Text, fonts and formats are natively preserved in HTML. Mathematical formulas, figures and images are also supported. pdf2htmlEX is also a publishing tool: almost 50 options make it flexible for many different use cases: PDF preview, book/magazine publishing, personal resume. pdf2htmlEX is optimized for modern web browsers such as Mozilla...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 15
    posterdown

    posterdown

    Use RMarkdown to generate PDF Conference Posters via HTML

    Welcome to Posterdown! This is my attempt to provide a semi-smooth workflow for those who wish to take their RMarkdown skills to the conference world. Many creature comforts from RMarkdown are available in this package such as Markdown section notation, figure captioning, and even citations like this one (Allaire, Xie, McPherson, et al. 2018). The rest of this example poster will show how you can insert typical conference poster features into your own document. Posterdown was created as a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    IdeoType is a book compiler that converts manuscript (XHTML) to book (PDF) on the fly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DinkToPdf

    DinkToPdf

    C# .NET Core wrapper for wkhtmltopdf library that uses Webkit engine

    .NET Core P/Invoke wrapper for wkhtmltopdf library that uses Webkit engine to convert HTML pages to PDF. Copy the native library to root folder of your project. From there .NET Core loads the native library when the native method is called with P/Invoke. You can find the latest version of the native library. Select the appropriate library for your OS and platform (64 or 32-bit). The library was not tested with IIS. The library was tested in console applications and with Kestrel web server...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Shelk-test
    Open Source program for creating tests, which will be a compile of test and testing. It can be used by anyone who want to quickly create test and make testing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    xccdf2pdf renders XCCDF documents in PDF and other formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    APHID is an easy-to-install, easy-to-use DocBook environment. APHID transforms source documents (text or XML) into multiple output formats (HTML, PDF, HTML Help, etc.). APHID is a derivative work of eDE (http://www.e-novative.de).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Converter from FB2 to PDF format. Useful for ebook readers with bad or missing FB2 support.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    openRiverbed - the PHP5 framework. Ajax, TinyMCE, Plugins, XML based configuration, template based, XML2PDF pdf generation, multi-language support for application and content, encrypted sessions, test-driven, oo developed... Hardened by real projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    BTL is a template language that combines power of JSTL and XSLT to produce documents in XML, HTML, XHTML, XSL-FO, PDF or other formats, based on the JavaBean input.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Visual xsltproc is a tool which help to write xslt file, and debug it to find errors. It writes xml, and generates xml (Syntax highlighting of XML & line Nr.). Finally if the result is XSL-FO it generates the pdf on Apache FOP java. Build on QT4.2.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    dompdf - the PHP 5 HTML to PDF converter. dompdf is a (mostly) CSS compliant HTML rendering engine written in PHP. It supports external stylesheets, inline style tags, and the style attributes of individual HTML elements. Requires PHP 5.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next