Showing 29 open source projects for "pdf data mining"

View related business solutions
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 1
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Holochain

    Holochain

    The current, performant & industrial strength version of Holochain

    Holochain is a post-blockchain framework for building agent-centric, distributed applications. Instead of using global consensus, Holochain enables each agent (user) to maintain their own local state while validating actions with a shared set of rules. This allows for scalable, secure, and resilient apps where data is owned and controlled by users. Ideal for social apps, cooperatives, and data sovereignty platforms, Holochain focuses on enabling collaboration without central servers or...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Apache Sedona

    Apache Sedona

    Cluster computing framework for processing large-scale geospatial data

    ...According to our benchmark and third-party research papers, Sedona has 50% less peak memory consumption than other Spark-based geospatial data systems for large-scale in-memory query processing. Sedona offers Scala, Java, Spatial SQL, Python, and R APIs and integrates them into underlying system kernels with care. You can simply create spatial analytics and data mining applications and run them in any cloud environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    stkpp

    stkpp

    C++ Statistical ToolKit

    ...At a convenience, we propose the source packages on sourceforge. The library offers a dense set of (mostly) template classes in C++ and is suitable for projects ranging from small one-off projects to complete data mining application suites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    LangChain Apps on Production with Jina

    LangChain Apps on Production with Jina

    Langchain Apps on Production with Jina & FastAPI

    Jina is an open-source framework for building scalable multi-modal AI apps on Production. LangChain is another open-source framework for building applications powered by LLMs. long-chain-serve helps you deploy your LangChain apps on Jina AI Cloud in a matter of seconds. You can benefit from the scalability and serverless architecture of the cloud without sacrificing the ease and convenience of local development. And if you prefer, you can also deploy your LangChain apps on your own...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Laravel Report Generators

    Laravel Report Generators

    Rapidly Generate Simple Pdf, CSV, & Excel Report Package on Laravel

    Rapidly generate simple PDF reports on Laravel or CSV/Excel reports. This package provides simple PDF, csv & excel report generators to speed up your workflow. It also allows you to stream(), download(), or store() the report seamlessly.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    ...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    A little helper for EVE-Online. with Database fetching and handling routines for Eve Online. API and Raw web based functions. GUI and structure templates. Mining, Character, Bussiness, Analyzer, Infos, Market, EveMath, Parsers and more areas are touched.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Aspose Java for Liferay

    Aspose Java for Liferay

    Provides export options for blogs, journals and dynamic lists

    This is Liferay CMS / Portal plugin released by Aspose pty ltd. Aspose.Total Java for Liferay (hook plugin app) provides options for exporting web-contents and blogs created in html to MS-WORD, MS-EXCEL and PDF file formats using Aspose.Total Java APIs. (Aspose.Words, Aspose.Cells and Aspose.PDF) The Plugin also provides very useful functionality / options for exporting the Dynamic Data Lists to MS-WORD, MS-EXCEL and PDF file formats using Aspose.Total Java APIs. (Aspose.Words, Aspose.Cells and Aspose.PDF)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FOXopen
    FOXopen is a 4GL, feature-rich XML framework which facilitates the rapid development of web-based applications with sophisticated workflows. For more information and help, see http://www.foxopen.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    giServer

    giServer

    giServer the easy to use and extensible batch and integration server

    ...Instead of using complex XML configuration files an elaborate GUI for batch job management is included. Some possible usage scenarios are: - Automatic processing of incoming data files - Big Data applications - Process automation - Data Mining/Aggregation applications - Automatic Reporting - Processing and analysis of database records
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    jPod Renderer is based on the jPod library, also hosted here at "jpodlib". This is the long awaited release for platform specific rendering code, both on AWT and SWT. To see jPod and jPod Renderer at work, have a look at www.cabaret-solutions.com
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Math tools in Python to tackle down problems in Operational Research fields. Comes with a Django based web interface to allow remote access to complex simulation means.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    YAFF (Yet Another Factory Framework)

    YAFF (Yet Another Factory Framework)

    A Powerful Java Factory Framework

    A Powerful Java Factory Framework
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    eLML - eLesson Markup Language
    eLML (eLesson Markup Language) is an XML framework for creating structured eLessons based on a pedagogical model. eLML consists of an XMLSchema and XSLT files to create XHTML, PDF, LaTeX, IMS CP and SCORM versions, standards supported by most LMS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Osezno PHP Framework
    Osezno Framework PHP is a framework written in PHP that allows you to: Set HTML templates, tabbed contents, forms, dynamic list. All on a MVC pattern, and incorporate technologies such as active record and xajax.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SynApp2
    SynApp2 builds feature-packed web applications and versatile PDF reports for MySQL and Oracle Database. The SynApp2 web application generator and MVC framework is written in PHP and JavaScript.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Cross-platform/Cross-language (C++, .NET/Mono, PHP...) application framework. Libraries: [ UTillyty.Omnibus ] [ . (general) .DA (Data Access) .Net (Networking) .UI (User Interface) .UI.WF (Windows Forms UI) ] TrinacriaPDF (c# pdf printing)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The FSSearchIndex Framework project provides a framework that allows application developers to write their own content based file search and indexing applications. It currently supports content extraction and indexing on Text,Word, Excel, PDF files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Contextor
    Contextor is a light-weight simple-to-use Java based library to help developers and researchers working with the general concept of a resource; as examples, resources can be text resources, web resources, images and videos.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ** Guys I have built a much more powerful Fully Featured CMS system at: https://github.com/MacdonaldRobinson/FlexDotnetCMS Macs CMS is a Flat File ( XML and SQLite ) based AJAX Content Management System. It focuses mainly on the Edit In Place editing concept. It comes with a built in blog with moderation support, user manager section, roles manager section, SEO / SEF URL
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    The ProM Import Framework allows to extract process enactment event logs from a set of information systems. These can be exported in the MXML format, which is the standard event log data format for Process Mining analysis techniques.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    openRiverbed - the PHP5 framework. Ajax, TinyMCE, Plugins, XML based configuration, template based, XML2PDF pdf generation, multi-language support for application and content, encrypted sessions, test-driven, oo developed... Hardened by real projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ...Provide a flexible API for index, catalog and search text-based information with great performance. Excelent for implement custom search engines, researching, text retrieval, data mining and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next