Showing 68 open source projects for "pdf data mining"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    ...It can be used for data mining, monitoring and automated testing.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    ...It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. Geziyor includes features for managing requests, handling cookies, respecting robots rules, and exporting collected data in multiple formats. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    ProM is the comprehensive, extensible framework for process mining. Process Mining deals with the a-posteriori analysis of (business) processes using enactment logs.
    Leader badge
    Downloads: 45 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Ada PDF Writer

    Ada PDF Writer

    A standalone, portable package for producing dynamically PDF documents

    PDF_Out is an Ada package for writing easily PDF files dynamically. Enables the automatic production of reports. Standalone and unconditionally portable code. No external resource is needed. More information on... http://apdf.sf.net Alire crate: https://alire.ada.dev/crates/apdf Mirror: https://github.com/zertovitch/ada-pdf-writer
    Leader badge
    Downloads: 21 This Week
    Last Update:
    See Project
  • 6
    Symfony Panther

    Symfony Panther

    A browser testing and web crawling library for PHP and Symfony

    Symfony Panther is a browser testing and web scraping tool that allows developers to interact with websites programmatically. It uses headless Chrome or Firefox to automate browser tasks, making it suitable for end-to-end testing and data extraction. Panther integrates well with Symfony and PHPUnit, allowing developers to write comprehensive tests for web applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Browserless

    Browserless

    Deploy headless browsers in Docker

    ...It lets developers connect existing Puppeteer and Playwright code to remote browser sessions over WebSocket, which helps move heavy browser work away from local machines or application servers. The project also provides REST APIs for common automation tasks such as screenshots, PDF generation, scraping, crawling, and content export. Browserless is useful for teams that need scalable browser execution for testing, data collection, rendering, or AI-agent browsing workflows. Its deployment model supports self-hosting, private infrastructure, queues, concurrency controls, and enterprise-oriented configuration. The project’s main value is turning browser automation into a managed service layer that can be reused across applications and workflows.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Rista Web Browser

    Rista Web Browser

    Rista Web Browser 6.0.0.0 is a free opensource software

    ...IT CONTAINS THE FOLLOWING MENU OPTIONS THOSE ARE AS PER STANDARD COMMON WEB BROWSER. STANDARD OPTIONS LIKE OPEN NEW WINDOW, TAB HANDELING, PRINTING, SEARCHING, VIEWING PAGE SOURCE, DELETE BROWSING DATA, BOOKMARKS, HISTORY, PRIVATE BROWSING ETC. PRIVACY POLICY THIS BROWSER IS BASED ON Microsoft Edge WebView2 NuGet package, hence it includes its own right click option for save images, pdf, audio, video to users' own folder of pc or laptop( THAT IS READING FOLDER EVENT) AND MAY BE EXPERIENCED WITH SHARING ACTIVITY. THIS IS THE ONLY TO DISCLOSE THE PRIVACY OF USER DATA THOSE MAY BE NOTED.
    Downloads: 5 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    BlueSpice free (Support archive)

    BlueSpice free (Support archive)

    Our support forum has moved: community.bluespice.com

    This freely available open-source software turns Wikipedia’s popular software engine MediaWiki into a fully-fledged enterprise wiki solution. Companies can continue cherishing MediaWiki’s numerous advantages and automation capabilities; with BlueSpice, they can now work even more comfortably, safely and more effectively. Compared with basic MediaWiki, BlueSpice provides, amongst other, the following enhancements: comfortable and sophisticated rights management capabilities, a visual editor...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    RY GeoIP 3

    RY GeoIP 3

    User-friendly network & geolocation tools, APIs integration and more!

    RY GeoIP 3 is a powerful application that combines network and geolocation tools for comprehensive analysis. With its user-friendly interface and integration with Google Maps API and API Ninja DNS Lookups service, you can perform a wide range of operations, from geolocation lookups and ping tests to DNS analysis, traceroute, SSL certificate inspection, header data retrieval, and open port scanning. The ability to save data as PDFs and maps as images further enhances the utility of the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    crawly

    crawly

    High-level web crawling and scraping framework for Elixir apps

    Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    The Lemur Project

    The Lemur Project

    Search engine and data mining applications and ClueWeb datasets.

    The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
    Downloads: 145 This Week
    Last Update:
    See Project
  • 15
    TemaTres: controlled vocabulary server

    TemaTres: controlled vocabulary server

    Manage, Publish and Share Ontologies, Taxonomies, Thesauri, Glossaries

    Web application for management formal representations of knowledge, thesauri, taxonomies and multilingual vocabularies / Aplicación para la gestión de representaciones formales del conocimiento, tesauros, taxonomías, vocabularios multilingües. For the latest version of code: https://github.com/tematres/TemaTres-Vocabulary-Server
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    PHP Pdf creation - R&OS
    MOVED TO GITHUB https://github.com/ole1986/pdf-php
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LymPHOS2

    LymPHOS2

    LymPHOS2 Web-App

    ...Proteomics 2009, 9, 3741–3751. DOI: 10.1002/pmic.200800701 - Gallardo, Ó., Ovelleiro, D., Gay, M., Carrascal, M., Abian, J., A collection of open source applications for mass spectrometry data mining. Proteomics 2014, 20, 2275-2279. DOI: 10.1002/pmic.20140012
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    XL-Parser

    XL-Parser

    XL-Parser is a tool for data extraction and analysis.

    XL-Parser provides a bunch of functions for data extraction and analysis. It also provides web log analysis features like a tool for detection of suspicious activities. More details and screenshots on http://le-tools.com.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    TCPDF - PHP class for PDF

    TCPDF - PHP class for PDF

    PHP class for PDF

    TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more. IMPORTANT: This version will be soon marked as deprecated and replaced by a new version currently under development: https://github.com/tecnickcom/tc-lib-pdf
    Leader badge
    Downloads: 166 This Week
    Last Update:
    See Project
  • 20
    PDF API HTML5 Web Apps

    PDF API HTML5 Web Apps

    Mini SDK JavaScript API library PDF web apps

    A condensed library designed to web modern applications, to quickly export your content html to pdf thanks the famous library in javascript: jsPDF. And a special thanks to the project canvg and html2canvas. Project documentation: http://ulmdevice.altervista.org/pdfapihtml5/#documentation ========== Also available service for Angular 7+: http://ulmdevice.altervista.org/pdfjsapi/ Mobile Applications: http://bit.ly/1MrlgKk Opera add-on: http://bit.ly/1kkMhTa
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21

    ConcatPDF

    PDF Concatenation Tool

    ConcatPDF is the tool to concatenate PDF files. It can concatenate, extract, encrypt, decrypt, configure PDF files, convert image files to PDF. GUI version and CUI version are both available. iText.NET is iText porting on .NET Framework by J#. This library allows you to generate PDF, (X)HTML, XML, RTF files on Microsoft.NET Framework including ASP.NET.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 22
    NASH OS

    NASH OS

    Nash Operating System for Modern Ecommerce

    The all-built-in-one, automatic, ready-to-go out-of-box, easy-to-use state-of-the-art, and really awesome NASH OS! Over 25,000+ flexible features and controls and all scalable!! The most powerful solution ever built to instantly deliver new heights of online ecommerce enterprise to you.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 135 This Week
    Last Update:
    See Project
  • 24
    FireTeX: LaTeX Editor and Compiler

    FireTeX: LaTeX Editor and Compiler

    Edit Your files LaTeX and tex

    FireTeX, web based LaTeX editor complete, is a powerful, intuitive and stocked with useful functions for exporting the results in three useful formats. An editor with LaTeX compiler, highlight code, advanced search / replace and filesystem API HTML5. ======== Android app available on Play Store > https://play.google.com/store/apps/details?id=com.ulmdesign.ulmtex ======== Update 30.06.2017 Windows 7 and later and macOS 10.9 and later are supported. == Browser Extensions == Add-on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    ...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next