Showing 37 open source projects for "pdf indexing"

View related business solutions
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
  • SKUDONET Open Source Load Balancer Icon
    SKUDONET Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    SKUDONET ADC, operates at the application layer, efficiently distributing network load and application load across multiple servers. This not only enhances the performance of your application but also ensures that your web servers can handle more traffic seamlessly.
  • 1
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    ... is extracted to a text file for different usages (e.g. indexing the text). This library is in it's most basic form a node.js wrapper for pdf.js. It has default renderers to generate a default output, but is easily extended to incorporate custom logic or to generate different output. It uses a node.js DOM and the node domstub from pdf.js do make pdf parsing available on node.js without a browser.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    DB-GPT

    DB-GPT

    Revolutionizing Database Interactions with Private LLM Technology

    DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the text...
    Leader badge
    Downloads: 2,493 This Week
    Last Update:
    See Project
  • 4
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    ... technological architecture design, OpenKM meets the document management needs of businesses of all sizes (from SMEs to big corporations). Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy
    Leader badge
    Downloads: 803 This Week
    Last Update:
    See Project
  • JobNimbus Construction Software Icon
    JobNimbus Construction Software

    For Roofers, Remodelers, Contractors, Home Service Industry

    Track leads, jobs, and tasks from one easy to use software. You can access your information wherever you are, get everyone on the same page, and grow your business.
  • 5
    PdfgrepGui

    PdfgrepGui

    This is a simple GUI for the command line tool grep and pdfgrep

    This program is a GUI for the command line tool grep and pdfgrep. Pdfgrep search text in multiple PDF files and grep can serach text in multiple text files. You can use regular expressions for the search (https://en.wikipedia.org/wiki/Regular_expression). This GUI and the command line tools work without indexing. The following options are used: -i (ignore case) and -F (fixed strings), -n (Print page number or output lines) and -H (Print the file name for each match) from the command line...
    Downloads: 49 This Week
    Last Update:
    See Project
  • 6
    myFilterWheel ASCOM DIY

    myFilterWheel ASCOM DIY

    Modify a manual filterwheel and add stepper motor and Arduino

    A project by Clive Stachon, Pete I, Paul P and Robert Brown in modifying a manual 5 slot filter wheel to automatic using an Arduino Nano and stepper motor. Windows application, ASCOM driver and Arduino firmware provided. Updated, reflecting new PDF and firmware and applications based on contributions from Pete. Project supports 4, 5, 7 and 9 slot filterwheels.
    Leader badge
    Downloads: 29 This Week
    Last Update:
    See Project
  • 7
    Hypernomicon

    Hypernomicon

    Hypertext-infused philosophy personal database software

    Hypernomicon is a personal productivity/database application for researchers that combines structured note-taking, mind-mapping, management of files (e.g., PDFs) and folders, and reference management into an integrated environment that organizes all of the above into semantic networks or hierarchies in terms of debates, positions, arguments, labels, terminology/concepts, and user-defined keywords by means of database relations and automatically generated hyperlinks (hence ‘Hyper’ in the...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 8
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    DocSearcher
    DocSearcher is a search tool for indexing and searching files on a personal computer. It uses API's to provide search functionality for common document formats. currently: Word, Excel, PDF, Libre/Open/StarOffice, RTF, Text, and HTML
    Downloads: 6 This Week
    Last Update:
    See Project
  • An All-in-One EMR Exclusively for Therapy and Rehab. Icon
    An All-in-One EMR Exclusively for Therapy and Rehab.

    Electronic Medical Records Software

    Managing your therapy and rehab practice is a time-consuming process. You spend hours on paperwork, billing, scheduling, and more. Raintree’s Therapy & Rehab EHR is here to help you manage your practice more efficiently. With our all-in-one solution, you’ll get the tools you need to streamline your therapy and rehab practice, improve patient care, and get back to doing what you love.
  • 10
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows...
    Downloads: 61 This Week
    Last Update:
    See Project
  • 12

    Object Oriented Streetmap

    C# class library for processing OpenStreetMap data

    This is a class library written in C# for processing OpenStreetMap XML file extracts into a SQLite database for routing with different vehicle types and restrictions. Before rating or contributing please see the README file for a more complete summary and a list of todos.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    elibsrv

    elibsrv

    a light OPDS/HTML server indexing EPUB and PDF files

    elibsrv is a light, standalone OPDS server for Linux. It allows to generate an OPDS repository of EPUB and/or PDF files scanned from on-disk directories. It also provides a simple html interface for non-OPDS humans, which makes it a good fit for both OPDS-aware devices (like Android with FBReader or Aldiko) and browsers with EPUB/PDF capabilities (for ex. Firefox with the excellent EPUBReader plugin). It's worth noting that elibsrv is a complete solution - ie. it doesn't rely on third party...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Marcion

    Marcion

    The study environment of ancient languages (Coptic, Greek, Latin)

    Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex Tchacos...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 15
    IndexFile (IFile)

    IndexFile (IFile)

    IFile, PHP based framework for indexing and search in the documents

    ... (.ods); Adobe Portable Document Format (.pdf); Text file (.txt); Web page (.htm - .html)
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16

    eLibrary

    Personalized Search Engine for Commonly Used Files

    eLibrary (electric library) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can also extract...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Personalized Search Engine

    Personalized Search Engine for Your Files

    MySearchEngine (Personalized Search Engine) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Omega Base

    Omega Base

    Web-based knowledge base template.

    A Knowledge Base and document management system (DMS). With strong user management, security, and file indexing for search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    HAWK - PDF Text Search Java Project

    HAWK - PDF Text Search Java Project

    No more support for this project - TAKE A LOOK AT FALCONSEARCH

    No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22

    PatentX - EPOScan extra utilities

    EPOScan ext folder utilities

    This is a software to operate some functions over the "ext" folder created by EPOScan(European Patent Office software for indexing and scanning patent document images) when the downloading option is selected. This folder is usually used by the ST33 software to convert the indexed images into ST33 standard.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Qercus
    Desktop free-form text database in which each record may contain an arbitrary collection of fields. Each field and record has its own style and colour. Efficient text searching - text is indexed as it is entered. Inspired by Blackwell Idealist.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Web Crawler Security Tool

    A web crawler oriented to information security.

    ... into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    edocias

    Electronic Document Index And Search

    EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next