Showing 30 open source projects for "pdf data mining"

View related business solutions
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    TeXworks

    TeXworks

    A simple interface for working with TeX documents

    TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.
    Downloads: 77 This Week
    Last Update:
    See Project
  • 2
    Crowbook

    Crowbook

    Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

    Crowbook's aim is to allow you to write a book in Markdown without worrying about formatting or typography and let the program generate HTML, PDF and EPUB output for you. Its focus is novels and fiction, and the default settings should (hopefully) generate readable books with correct typography without requiring you to worry about it. To see what Crowbook's output looks like, you can read the Crowbook guide rendered in HTML, PDF or EPUB. Crowbook will parse this file and generate HTML, EPUB,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    pagedown

    pagedown

    Paginate the HTML Output of R Markdown with CSS for Print

    Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome or Microsoft Edge) to generate PDF. No need to install LaTeX to get beautiful PDFs. This R package stands on the shoulders of two giants to support typesetting with CSS for R Markdown documents: Paged.js and ReLaXed (we only borrowed some CSS from the ReLaXed repo and didn't really use the Node package).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5
    Tarjamento de Dados Pessoais e Sigilosos

    Tarjamento de Dados Pessoais e Sigilosos

    Ferramenta de Tarjamento de Dados Pessoais e Sigilosos

    TarjaPDF v2.0 Beta — Ferramenta de Tarjamento de Dados Pessoais e Sigilosos Proteja dados sensíveis em PDFs com segurança irreversível. Interface moderna com dark mode, marcação manual (texto, linha e área livre), detecção automática de CPF, RG, e-mail, telefone, nomes próprios e endereços. Escaneamento inteligente com análise preditiva: destaca dados pessoais para revisão antes de tarjar. Detecção de nomes via heurística e base oficial, com dicionário customizável. Relatório de...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 6
    Markdown Monster

    Markdown Monster

    An extensible Markdown Editor, Viewer and Weblog Publisher for Windows

    Markdown Monster is a powerful, yet easy-to-use Markdown editor with syntax highlighting and sophisticated and fast edit features. A collapsible, synced, live preview lets you see your output as you type and scroll. Easily embed or paste images, links, tables and code using raw markup or our smart UI helpers to simplify many operations with a few keystrokes or a click or two. Paste images from the clipboard or drag and drop from Explorer or our built-in file browser. Inline spell-checking...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    TeXstudio - A LaTeX Editor

    TeXstudio - A LaTeX Editor

    An integrated writing environment for creating LaTeX documents

    NOTE: Active development has moved to https://github.com/texstudio-org/texstudio Please post issues and feature requests there. TeXstudio is a fully featured LaTeX editor. Our goal is to make writing LaTeX documents as easy and comfortable as possible. Some of the outstanding features of TeXstudio are an integrated pdf viewer with (almost) word-level synchronization, live inline preview, advanced syntax-highlighting, live checking of references, citations, latex commands, spelling and...
    Leader badge
    Downloads: 345 This Week
    Last Update:
    See Project
  • 8
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    Free alternative for Office productivity tools: Apache OpenOffice - formerly known as OpenOffice.org - is an open-source office productivity software suite containing word processor, spreadsheet, presentation, graphics, formula editor, and database management applications. OpenOffice is available in many languages, works on all common computers, stores data in ODF - the international open standard format - and is able to read and write files in other formats, included the format used by the most common office suite packages. OpenOffice is also able to export files in PDF format. OpenOffice has supported extensions, in a similar manner to Mozilla Firefox, making easy to add new functionality to an existing OpenOffice installation.
    Leader badge
    Downloads: 251,737 This Week
    Last Update:
    See Project
  • 9
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 10 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TextExtractor

    TextExtractor

    Extracts plain text from a variety of different file types

    TextExtractor extracts plain text from hundreds of different file types, storing the text extracted in suitably named text files. TextExtractor 1.10 works in six different modes :- Instant Mode - Just select any file and extract the text from it. Batch Mode - Select a group of files and extract the text from all of them in one go. Polling Mode - Watch a folder location, processing new files as they appear there. Hierarchical Mode - Extract Text from files in a directory...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12

    joy of text

    Editor with scripting language, security features & system interfaces.

    ...It is particularly useful for checking and cross-referencing between several source, intermediate and output files - a common requirement for CAD work. But jot's usefulness doesn't stop there. It's sophisticated search features can, for example, be used for interactive data mining or automating the extraction of numerical and textual data and reports from arrays of large text files. It's adaptable user interface, can be programmed to emulate emacs , vi UIs or mouse-driven systems - but who would want to do a thing like that? The display is highly configurable supporting popups, menus-event mouse callbacks etc. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    RTextDoc

    RTextDoc

    An editor for structured documents

    RTextDoc is an editor for structured text documents such as LaTeX, AsciiDoc, DocBook. RTextDoc has proofreading capabilities: on-the-fly spelling, instant grammar checking and built-in free dictionaries. RTextDoc has syntax highlighting, bracket matching, folding, document structure browser for sections and labels, bookmarks, manager for LaTeX symbols, an editor for mathematical equations,integrated BibTeX database manager and several tools to convert LaTeX to HTML and back. AsciiDoc...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    ConcatPDF

    PDF Concatenation Tool

    ConcatPDF is the tool to concatenate PDF files. It can concatenate, extract, encrypt, decrypt, configure PDF files, convert image files to PDF. GUI version and CUI version are both available. iText.NET is iText porting on .NET Framework by J#. This library allows you to generate PDF, (X)HTML, XML, RTF files on Microsoft.NET Framework including ASP.NET.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 15
    Canorus

    Canorus

    Music score editor

    Canorus is a free cross-platform music score editor. It supports an unlimited number and length of staffs, polyphony, a MIDI playback of notes, chord markings, lyrics, import/export filters to formats like MIDI, MusicXML, ABC Music, MusiXTeX and LilyPond
    Downloads: 28 This Week
    Last Update:
    See Project
  • 16
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 123 This Week
    Last Update:
    See Project
  • 17
    FireTeX: LaTeX Editor and Compiler

    FireTeX: LaTeX Editor and Compiler

    Edit Your files LaTeX and tex

    FireTeX, web based LaTeX editor complete, is a powerful, intuitive and stocked with useful functions for exporting the results in three useful formats. An editor with LaTeX compiler, highlight code, advanced search / replace and filesystem API HTML5. ======== Android app available on Play Store > https://play.google.com/store/apps/details?id=com.ulmdesign.ulmtex ======== Update 30.06.2017 Windows 7 and later and macOS 10.9 and later are supported. == Browser Extensions == Add-on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    A Swiss Army Knife GUI application for PDF documents: combine, split, rotate, reorder (n-up, booklet), watermark, edit bookmarks/fileinfo/pagetransition, compress, encrypt, decrypt, sign, repair, edit attachments and more.
    Leader badge
    Downloads: 96 This Week
    Last Update:
    See Project
  • 19

    Detexter

    Detexter is an app designed to extract text from PDF files.

    Detexter lets you extract text from multiple PDF files. Detexter uses the PDFBox library for its text extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    jPod is a rich PDF manipulation and rendering framework. A complete rendering library based on jPod is available here at "jPodRenderer". To see jPod & jPodRenderer at work, have a look at www.cabaret-solutions.com
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Dive Log Book
    Dive Log Book is a easy to use logbook for scuba diving. It makes possible to log basic dive data, notes with photos, weather conditions and provides simple statistics. Data can be exported into TXT document, PDF document and interactive HTML page.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A C++ library to read and write PDF files, plus a GUI editor.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    PDML is an informal markup language written in PHP that is similar to HTML. It allows for the creation of complex PDF documents and can also be used in conjunction with PHP, to define templates which can generate dynamic PDF documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Shared Questionnaire System
    Shared Questionnaire System(SQS) is a full-functional Optical Mark Reader(OMR) form processing system implemented in Java-Swing, XSL-FO and AJAX with straightforward GUIs. It is aimed at developing social platform to share knowledge about questionnaire.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    EnDecode is a tool to decode/decompress parts of a binary file like PDF. If using PDF then stream to endstream data can be decompressed. You select from start to end of the part you want to decode and it will decode the data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo