Showing 25 open source projects for "pdf data mining"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    DeckTape

    DeckTape

    PDF exporter for HTML presentations

    DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Crowbook

    Crowbook

    Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

    Crowbook's aim is to allow you to write a book in Markdown without worrying about formatting or typography and let the program generate HTML, PDF and EPUB output for you. Its focus is novels and fiction, and the default settings should (hopefully) generate readable books with correct typography without requiring you to worry about it. To see what Crowbook's output looks like, you can read the Crowbook guide rendered in HTML, PDF or EPUB. Crowbook will parse this file and generate HTML, EPUB,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    pagedown

    pagedown

    Paginate the HTML Output of R Markdown with CSS for Print

    Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome or Microsoft Edge) to generate PDF. No need to install LaTeX to get beautiful PDFs. This R package stands on the shoulders of two giants to support typesetting with CSS for R Markdown documents: Paged.js and ReLaXed (we only borrowed some CSS from the ReLaXed repo and didn't really use the Node package).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Shower Presentation Template

    Shower Presentation Template

    Shower HTML presentation engine

    Shower Presentation Template is a shower HTML presentation engine. Built on HTML, CSS and vanilla JavaScript, works in all modern browsers. Themes are separated from engine, and comes with fully keyboard accessible. Printable to PDF and includes Ribbon and Material themes, and core with plugins. You’ll need Node.js installed on your computer. Latest stable versions of Chrome, Edge, Firefox, and Safari are supported.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Ray Tracing in One Weekend Book Series

    Ray Tracing in One Weekend Book Series

    The Ray Tracing in One Weekend series of books

    The Ray Tracing in One Weekend series of books are now available to the public for free online. They are now released under the CC0 license. This means that they are as close to public domain as we can get. (While that also frees you from the requirement of providing attribution, it would help the overall project if you could point back to this web site as a service to other users.) These books are formatted for printing directly from your browser, where you can also (on most browsers) save...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    html-pdf-chrome

    html-pdf-chrome

    HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium

    HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium. This library is NOT meant to accept untrusted user input. Doing so may have serious security risks such as Server-Side Request Forgery (SSRF). If you run into CORS issues, try using the --disable-web-security Chrome flag, either when you start Chrome externally, or in options.chromeFlags. This option should only be used if you fully trust the code you are executing during a print job. It is strongly recommended that you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    backslide

    backslide

    CLI tool for making HTML presentations with Remark.js using Markdown

    CLI tool for making HTML presentations with Remark.js using Markdown. Use bs init to create a new presentation along with a template directory in the current directory. The template directory is needed for backslide to transform your Markdown files into HTML presentations. You can create as many markdown presentations as you want in the directory, they will all be based on the same template. Use bs serve to start a development server with live reload. A page will automatically open in your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Myrtille

    Myrtille

    A native HTML4 / HTML5 Remote Desktop Protocol and SSH client

    Myrtille provides simple and fast access to remote desktops, applications, and SSH servers through a web browser, without any plugin, extension or configuration. Technically, Myrtille is an HTTP(S) to RDP and SSH gateway. User input (keyboard, mouse, touchscreen) is forwarded from a web browser to an HTTP(S) gateway, then up to an RDP (or SSH) client which maintains a session with an RDP (or SSH) server. The display resulting (or not) of such actions is streamed back to the browser, from the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    pdf2htmlEX

    pdf2htmlEX

    Convert PDF to HTML without losing text or format

    pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. It aims to provide an accurate rendering, while being optimized for Web display. Text, fonts and formats are natively preserved in HTML. Mathematical formulas, figures and images are also supported. pdf2htmlEX is also a publishing tool: almost 50 options make it flexible for many different use cases: PDF preview, book/magazine publishing, personal resume. pdf2htmlEX is optimized for modern web browsers such as Mozilla...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 11
    posterdown

    posterdown

    Use RMarkdown to generate PDF Conference Posters via HTML

    Welcome to Posterdown! This is my attempt to provide a semi-smooth workflow for those who wish to take their RMarkdown skills to the conference world. Many creature comforts from RMarkdown are available in this package such as Markdown section notation, figure captioning, and even citations like this one (Allaire, Xie, McPherson, et al. 2018). The rest of this example poster will show how you can insert typical conference poster features into your own document. Posterdown was created as a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    IdeoType is a book compiler that converts manuscript (XHTML) to book (PDF) on the fly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DinkToPdf

    DinkToPdf

    C# .NET Core wrapper for wkhtmltopdf library that uses Webkit engine

    .NET Core P/Invoke wrapper for wkhtmltopdf library that uses Webkit engine to convert HTML pages to PDF. Copy the native library to root folder of your project. From there .NET Core loads the native library when the native method is called with P/Invoke. You can find the latest version of the native library. Select the appropriate library for your OS and platform (64 or 32-bit). The library was not tested with IIS. The library was tested in console applications and with Kestrel web server...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    xccdf2pdf renders XCCDF documents in PDF and other formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    APHID is an easy-to-install, easy-to-use DocBook environment. APHID transforms source documents (text or XML) into multiple output formats (HTML, PDF, HTML Help, etc.). APHID is a derivative work of eDE (http://www.e-novative.de).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Converter from FB2 to PDF format. Useful for ebook readers with bad or missing FB2 support.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    openRiverbed - the PHP5 framework. Ajax, TinyMCE, Plugins, XML based configuration, template based, XML2PDF pdf generation, multi-language support for application and content, encrypted sessions, test-driven, oo developed... Hardened by real projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    BTL is a template language that combines power of JSTL and XSLT to produce documents in XML, HTML, XHTML, XSL-FO, PDF or other formats, based on the JavaBean input.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Visual xsltproc is a tool which help to write xslt file, and debug it to find errors. It writes xml, and generates xml (Syntax highlighting of XML & line Nr.). Finally if the result is XSL-FO it generates the pdf on Apache FOP java. Build on QT4.2.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    dompdf - the PHP 5 HTML to PDF converter. dompdf is a (mostly) CSS compliant HTML rendering engine written in PHP. It supports external stylesheets, inline style tags, and the style attributes of individual HTML elements. Requires PHP 5.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    The Nheengatu Project is a Java library that provides HTML markup abstraction allowing you to reutilize it to generate PDF files, OpenOffice documents, image files, etc. The goal of this project is to maximize the use of HTML markup procedures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Connla is a Java library for creating data collections which can be exported to TXT, CSV, HTML, XHTML, XML, PDF and XLS formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Java based tool to convert HTML/DHTM to PDF document.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    This is a tool to convert pdf files to html/text files and extract images.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PDML2 is a fork (continuation) of the PDML project - it is an informal markup language written in PHP that is similar to HTML. It allows for the creation of complex PDF documents for use in command line or web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next