pdf indexing free download

Showing 32 open source projects for "pdf indexing"

View related business solutions

Linux Clear Filters & Widen Search

Save hundreds of developer hours with components built for SaaS applications.
The #1 Embedded Analytics Solution for SaaS Teams.

Whether you want full self-service analytics or simpler multi-tenant security, Qrvey’s embeddable components and scalable data management remove the guess work.

Try Developer Playground
Bright Data - All in One Platform for Proxies and Web Scraping
Say goodbye to blocks, restrictions, and CAPTCHAs

Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.

Get Started
1

pdf-extractor

Node.js module for rendering pdf pages to images, svgs and HTML files

... is extracted to a text file for different usages (e.g. indexing the text). This library is in it's most basic form a node.js wrapper for pdf.js. It has default renderers to generate a default output, but is easily extended to incorporate custom logic or to generate different output. It uses a node.js DOM and the node domstub from pdf.js do make pdf parsing available on node.js without a browser.

Downloads: 6 This Week

Last Update: 2023-03-23
See Project
2

DB-GPT

Revolutionizing Database Interactions with Private LLM Technology

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Downloads: 0 This Week

Last Update: 2024-10-22
See Project
3

AnyTXT Searcher

A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the text...

12 Reviews

Downloads: 2,147 This Week

Last Update: 2024-09-12
See Project
4

OpenKM Document Management - DMS

Document Management System and Content Management System

.... Due to its technological architecture design, OpenKM meets the document management needs of businesses of all sizes (from SMEs to big corporations). Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy

33 Reviews

Downloads: 934 This Week

Last Update: 2022-11-25
See Project
Red Hat Ansible Automation Platform on Microsoft Azure
Red Hat Ansible Automation Platform on Azure allows you to quickly deploy, automate, and manage resources securely and at scale.

Deploy Red Hat Ansible Automation Platform on Microsoft Azure for a strategic automation solution that allows you to orchestrate, govern and operationalize your Azure environment.

Learn More
5

Hypernomicon

Hypertext-infused philosophy personal database software

Hypernomicon is a personal productivity/database application for researchers that combines structured note-taking, mind-mapping, management of files (e.g., PDFs) and folders, and reference management into an integrated environment that organizes all of the above into semantic networks or hierarchies in terms of debates, positions, arguments, labels, terminology/concepts, and user-defined keywords by means of database relations and automatically generated hyperlinks (hence ‘Hyper’ in the...

2 Reviews

Downloads: 16 This Week

Last Update: 2024-10-24
See Project
6

PdfgrepGui

This is a simple GUI for the command line tool grep and pdfgrep

This program is a GUI for the command line tool grep and pdfgrep. Pdfgrep search text in multiple PDF files and grep can serach text in multiple text files. You can use regular expressions for the search (https://en.wikipedia.org/wiki/Regular_expression). This GUI and the command line tools work without indexing. The following options are used: -i (ignore case) and -F (fixed strings), -n (Print page number or output lines) and -H (Print the file name for each match) from the command line...

Downloads: 8 This Week

Last Update: 2024-06-01
See Project
7

DocSearcher

DocSearcher is a search tool for indexing and searching files on a personal computer. It uses API's to provide search functionality for common document formats. currently: Word, Excel, PDF, Libre/Open/StarOffice, RTF, Text, and HTML

2 Reviews

Downloads: 3 This Week

Last Update: 2024-07-28
See Project
8

File System Crawler for Elasticsearch

Elasticsearch File System Crawler (FS Crawler)

This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.

Downloads: 0 This Week

Last Update: 2023-08-25
See Project
9

Paperless-ng

A supercharged version of paperless, scan, index and archive docs

Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have...

Downloads: 0 This Week

Last Update: 2022-03-04
See Project
Free CRM Software With Something for Everyone
216,000+ customers in over 135 countries grow their businesses with HubSpot

Think CRM software is just about contact management? Think again. HubSpot CRM has free tools for everyone on your team, and it’s 100% free. Here’s how our free CRM solution makes your job easier.

Get free CRM
10

OpenSearchServer Search Engine

An open source search engine with RESTFul API and crawlers

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows...

31 Reviews

Downloads: 21 This Week

Last Update: 2018-08-26
See Project
11

elibsrv

a light OPDS/HTML server indexing EPUB and PDF files

elibsrv is a light, standalone OPDS server for Linux. It allows to generate an OPDS repository of EPUB and/or PDF files scanned from on-disk directories. It also provides a simple html interface for non-OPDS humans, which makes it a good fit for both OPDS-aware devices (like Android with FBReader or Aldiko) and browsers with EPUB/PDF capabilities (for ex. Firefox with the excellent EPUBReader plugin). It's worth noting that elibsrv is a complete solution - ie. it doesn't rely on third party...

Downloads: 0 This Week

Last Update: 2017-12-30
See Project
12

Marcion

The study environment of ancient languages (Coptic, Greek, Latin)

Marcion is a software forming a study environment of ancient languages (esp. Coptic, Greek, Latin) and providing many tools and resources (dictionaties, grammars, texts). Although Marcion is focused on to study the gnosticism and early christianity, it is an universal library working with various file formats and allowing to collect, organize and backup texts of any kind. Overview of gnostic sources in Coptic language delivered with Marcion: Nag Hammadi Library; Berlin Codex; Codex...

4 Reviews

Downloads: 24 This Week

Last Update: 2020-07-11
See Project
13

IndexFile (IFile)

IFile, PHP based framework for indexing and search in the documents

...); OpenOffice.org Calc (.ods); Adobe Portable Document Format (.pdf); Text file (.txt); Web page (.htm - .html)

Downloads: 1 This Week

Last Update: 2016-03-28
See Project
14

Personalized Search Engine

Personalized Search Engine for Your Files

MySearchEngine (Personalized Search Engine) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can...

Downloads: 0 This Week

Last Update: 2015-11-19
See Project
15

eLibrary

Personalized Search Engine for Commonly Used Files

eLibrary (electric library) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can also extract...

Downloads: 0 This Week

Last Update: 2015-11-19
See Project
16

Omega Base

Web-based knowledge base template.

A Knowledge Base and document management system (DMS). With strong user management, security, and file indexing for search.

Downloads: 0 This Week

Last Update: 2015-08-06
See Project
17

IDRA InDexing & Retrieving Automatically

IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.

Downloads: 0 This Week

Last Update: 2014-05-14
See Project
18

regain

Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.

13 Reviews

Downloads: 6 This Week

Last Update: 2014-07-30
See Project
19

HAWK - PDF Text Search Java Project

No more support for this project - TAKE A LOOK AT FALCONSEARCH

No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"

Downloads: 0 This Week

Last Update: 2014-04-19
See Project
20

PatentX - EPOScan extra utilities

EPOScan ext folder utilities

This is a software to operate some functions over the "ext" folder created by EPOScan(European Patent Office software for indexing and scanning patent document images) when the downloading option is selected. This folder is usually used by the ST33 software to convert the indexed images into ST33 standard.

Downloads: 0 This Week

Last Update: 2015-04-26
See Project
21

Qercus

Desktop free-form text database in which each record may contain an arbitrary collection of fields. Each field and record has its own style and colour. Efficient text searching - text is indexed as it is entered. Inspired by Blackwell Idealist.

Downloads: 1 This Week

Last Update: 2016-10-23
See Project
22

Web Crawler Security Tool

A web crawler oriented to information security.

... files into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.

3 Reviews

Downloads: 1 This Week

Last Update: 2015-10-10
See Project
23

edocias

Electronic Document Index And Search

EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.

Downloads: 0 This Week

Last Update: 2015-07-10
See Project
24

ANts P2P

ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.

20 Reviews

Downloads: 3 This Week

Last Update: 2013-04-15
See Project
25

Blaze - Appliance for Solr

Indexing and Search Appliance Powered by Apache Solr. It's major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling.

1 Review

Downloads: 0 This Week

Last Update: 2014-07-01
See Project