CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
disk manager is a CD/DVD archiving tool. It storys the directory contents of any media so you can search it later. Its also designed as file explorer which makes it easy to find big files. Windows Version supports native file context menus.
Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (file systems, web sites, mail boxes, ...) and the file formats (documents, images, ...) occurring in these systems.
DuMP3 is a duplicate and similar file finder.
DuMP3 is a duplicate and similar file finder. It finds exact duplicate binaries by hash, similar text files by substring content, images (JPG, BMP, GIF, PNG, etc) by color and audio files (MP3, WAV, OGG, etc) by wave data. Future: fonts, video.
Puggle is a graphical desktop search engine written exclusively in Java. It provides full text and metadata search over files, folders, music, photos, web pages and more that are stored locally on your computer.
Digital Library Search Engine
SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX. CiteSeerX has moved to GitHub, please get the latest code from: https://github.com/SeerLabs/CiteSeerX
OpenEphyra is an open framework for question answering (QA). It retrieves answers to natural language questions from the Web and other sources. Visit http://www.ephyra.info/ for more details and information on joining this open research initiative.
Strigi is a desktop search engine.
JavaCat is an app for managing files on your different drives (cds, hdds...). It's especially usefull for searching files on your exchangeable drives such as cds and dvds. Instead of digging through several disc just use JavaCat to find reqired file and
Wilma is a program for quickly finding text lurking in the files on your computer. It does this by creating an index of what words are in which files, which allows it to later find files containing a given word or set of words almost instantaneously.
用c实现了常用的容器，如果rbtree,hashtable,list,vector,deque,heap,map,以及定时器，os api，应用开发框架。 实现了一个基于btree索引算法的文件数据库,提供了断电保护,以及事务提交与回滚等接口
simple BNF parser makes xml markup of matches
bnf2xml a simple BNF parser that takes text as input, searches according to a BNF query file, and outputs text marked up by the xml labels that show context. bnf2xml is as simple to use as any text binary ie, awk(1) grep(1). bnf2xml does not require C API because it outputs simple xml labeling. README is visible on file dl page. EXAMPLE: $ echo "hi" | bnf2xml patternfile <word><alph>h</alph><alph>i</alph></word> or <gas>hydrogen iodide</gas> patternfile says how to find needle in haystack and what to show, ie: <alph> ::= a | b | c | d ... <word> ::= <alph>+ bnf2xml is a top down recursive parser. Unlike buttom up parsers like gcc(1) or some top downs, bnf2xml is completely unambiguous / resolves ALL conflicts. Slower on ave. for parsing C or than sed(1) for simple searches. Far easier than using flex/C to create a parser. caveate: I do not suggest it's worth while to make a new gcc(1) using bnf2xml. bnf2xml an nth BETA release, but no complains yet.
rlocate is an implementation of the "locate" command that is always up-to-date
A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
Password protected zip file cracker.
Setra is a cross-platform command line utility used to brute-force password protected zip file. It is written in the Python programming language.
Condiskcat - Console Disk Cataloger - Utility to catalog and search for files on removable media
A highly adaptable, configuration-driven search engine that uses a data source connector interface that allows for many different data sources to be accessed. Needs no recompilation. Ever.
This small C# (mono or MS.NET 3.5 required) console program generates text or html output which lists directories and files. Copies of directory or file names will be marked in HTML output. I use it to find files in a messy company network.
CD Maze is an easy to use CD-ROM/DVD-ROM catalog system for the GNOME/Unix/Linux-Desktop.
The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. Think of DynaQ as a desktop search engine with enhanced usability for file, email and blog search.
Eventseer is a search engine for computer science conference and workshop events. It digests call-for-paper emails and extracts and indexes relevant information.
This software is designed to find files in stand alone PC or in other network PC It will copy files without make any folder and avoid duplicated. This software is Free of charge, pure java code and need java jre 1.6.
This package contains different tools to add NLP capabilities for Lucene 4.x (it has been tested using Lucene version from 4.6.x to 4.8.1). Although it was originally developed for German, it is, mostly, language independent. It allows the user to lemmatize words to be indexed, to weight termy ba their parts of speech (e.g. weighting nouns mor hevaily than pronouns), and to add synonyms taken from GermaNet or a list you provide to the search index and thereby increase recall of lucene.