Showing 32 open source projects for "pdf indexing"

View related business solutions
  • Save hundreds of developer hours with components built for SaaS applications. Icon
    Save hundreds of developer hours with components built for SaaS applications.

    The #1 Embedded Analytics Solution for SaaS Teams.

    Whether you want full self-service analytics or simpler multi-tenant security, Qrvey’s embeddable components and scalable data management remove the guess work.
    Try Developer Playground
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 1
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    ... is extracted to a text file for different usages (e.g. indexing the text). This library is in it's most basic form a node.js wrapper for pdf.js. It has default renderers to generate a default output, but is easily extended to incorporate custom logic or to generate different output. It uses a node.js DOM and the node domstub from pdf.js do make pdf parsing available on node.js without a browser.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    DB-GPT

    DB-GPT

    Revolutionizing Database Interactions with Private LLM Technology

    DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the text...
    Leader badge
    Downloads: 2,147 This Week
    Last Update:
    See Project
  • 4
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    .... Due to its technological architecture design, OpenKM meets the document management needs of businesses of all sizes (from SMEs to big corporations). Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy
    Leader badge
    Downloads: 934 This Week
    Last Update:
    See Project
  • Red Hat Ansible Automation Platform on Microsoft Azure Icon
    Red Hat Ansible Automation Platform on Microsoft Azure

    Red Hat Ansible Automation Platform on Azure allows you to quickly deploy, automate, and manage resources securely and at scale.

    Deploy Red Hat Ansible Automation Platform on Microsoft Azure for a strategic automation solution that allows you to orchestrate, govern and operationalize your Azure environment.
    Learn More
  • 5
    Hypernomicon

    Hypernomicon

    Hypertext-infused philosophy personal database software

    Hypernomicon is a personal productivity/database application for researchers that combines structured note-taking, mind-mapping, management of files (e.g., PDFs) and folders, and reference management into an integrated environment that organizes all of the above into semantic networks or hierarchies in terms of debates, positions, arguments, labels, terminology/concepts, and user-defined keywords by means of database relations and automatically generated hyperlinks (hence ‘Hyper’ in the...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    PdfgrepGui

    PdfgrepGui

    This is a simple GUI for the command line tool grep and pdfgrep

    This program is a GUI for the command line tool grep and pdfgrep. Pdfgrep search text in multiple PDF files and grep can serach text in multiple text files. You can use regular expressions for the search (https://en.wikipedia.org/wiki/Regular_expression). This GUI and the command line tools work without indexing. The following options are used: -i (ignore case) and -F (fixed strings), -n (Print page number or output lines) and -H (Print the file name for each match) from the command line...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    DocSearcher
    DocSearcher is a search tool for indexing and searching files on a personal computer. It uses API's to provide search functionality for common document formats. currently: Word, Excel, PDF, Libre/Open/StarOffice, RTF, Text, and HTML
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Free CRM Software With Something for Everyone Icon
    Free CRM Software With Something for Everyone

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    Think CRM software is just about contact management? Think again. HubSpot CRM has free tools for everyone on your team, and it’s 100% free. Here’s how our free CRM solution makes your job easier.
    Get free CRM
  • 10
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 11
    elibsrv

    elibsrv

    a light OPDS/HTML server indexing EPUB and PDF files

    elibsrv is a light, standalone OPDS server for Linux. It allows to generate an OPDS repository of EPUB and/or PDF files scanned from on-disk directories. It also provides a simple html interface for non-OPDS humans, which makes it a good fit for both OPDS-aware devices (like Android with FBReader or Aldiko) and browsers with EPUB/PDF capabilities (for ex. Firefox with the excellent EPUBReader plugin). It's worth noting that elibsrv is a complete solution - ie. it doesn't rely on third party...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Marcion

    Marcion

    The study environment of ancient languages (Coptic, Greek, Latin)

    Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex...
    Leader badge
    Downloads: 24 This Week
    Last Update:
    See Project
  • 13
    IndexFile (IFile)

    IndexFile (IFile)

    IFile, PHP based framework for indexing and search in the documents

    ...); OpenOffice.org Calc (.ods); Adobe Portable Document Format (.pdf); Text file (.txt); Web page (.htm - .html)
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14

    Personalized Search Engine

    Personalized Search Engine for Your Files

    MySearchEngine (Personalized Search Engine) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    eLibrary

    Personalized Search Engine for Commonly Used Files

    eLibrary (electric library) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can also extract...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Omega Base

    Omega Base

    Web-based knowledge base template.

    A Knowledge Base and document management system (DMS). With strong user management, security, and file indexing for search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    HAWK - PDF Text Search Java Project

    HAWK - PDF Text Search Java Project

    No more support for this project - TAKE A LOOK AT FALCONSEARCH

    No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    PatentX - EPOScan extra utilities

    EPOScan ext folder utilities

    This is a software to operate some functions over the "ext" folder created by EPOScan(European Patent Office software for indexing and scanning patent document images) when the downloading option is selected. This folder is usually used by the ST33 software to convert the indexed images into ST33 standard.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Qercus
    Desktop free-form text database in which each record may contain an arbitrary collection of fields. Each field and record has its own style and colour. Efficient text searching - text is indexed as it is entered. Inspired by Blackwell Idealist.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22

    Web Crawler Security Tool

    A web crawler oriented to information security.

    ... files into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    edocias

    Electronic Document Index And Search

    EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ANts P2P
    ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Blaze - Appliance for Solr
    Indexing and Search Appliance Powered by Apache Solr. It's major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next