An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
mairix is a tool for indexing and searching email messages stored in Maildir, MH, or mbox folders.
Puggle is a graphical desktop search engine written exclusively in Java. It provides full text and metadata search over files, folders, music, photos, web pages and more that are stored locally on your computer.
multi-encoding strings(1) replacement with language identification
Enhanced version of the standard Unix strings(1) program which uses language models for automatic language identification and character-set identification, supporting over 1400 languages, dozens of character encodings, and 4800+ language/encoding pairs.
A catalog application for various media types - CD, DVD, NetDrives, USB flash keys, etc. It can import data from famous WhereIsIt Windows applicaion. In a word this is a try to make a WhereIsIt-like application for Linux.
Common Resource Grep
CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you will find binary downloads and discussion (https://sourceforge.net/p/crgrep/discussion/) . The actual development and issue tracking can be found here: https://bitbucket.org/cryanfuse/crgrep
Lucene/Solr based search engine and workflow system
Important: This project has been moved to https://github.com/statsbiblioteket/summa/ Lucens (and Solr) based search engine with very flexible setup and workflow system. It supports incremental updates, hierarchical faceting and index lookup with low memory overhead. Note: Although Summa is open source, the focus is on features used at Statsbiblioteket. No explicit resources has been allocated for support of external users.
JavaCat is an app for managing files on your different drives (cds, hdds...). It's especially usefull for searching files on your exchangeable drives such as cds and dvds. Instead of digging through several disc just use JavaCat to find reqired file and
csart - Clever-Search-And-Replace-Text Search, find and replace text in named files or recursive in all directories (-r). If choosen, strings are only replaced if another key-string occures in the line (-w)
Sgrep (sorted grep) is a much faster alternative to traditional Unix grep when searching large files, because sgrep searches sorted input files using a fast binary search to find matching lines.
Universal information crawler is a fast precise and reliable Internet crawler. Uicrawler is a program/automated script which browses the World Wide Web in a methodical, automated manner and creates the index of documents that it accesses.
This is a mass file renamer utilizing regular expression power.
This is a python based UI application that can rename a part of a set of files inside a directory based on the provided regular expression. This gives a lot more flexibility compared to other renaming files since it used grouping principle. This had two level of applying the rename, first stage shows who it will be when you rename it, if you are fine with the change you can make it permanent by clicking commit. This avoid the risk of file getting corrupted. To rename all .py file to .pyc is as follows: Search for Regex: (.py)$ Rename using this regex: .pyw
This software is designed to find files in stand alone PC or in other network PC It will copy files without make any folder and avoid duplicated. This software is Free of charge, pure java code and need java jre 1.6.
You are looking for a installed font? Then you're right! JFontTools offer you a full system to search, analyse and compare fonts! Present fonts to other by generating HTML previews and print a example page. Manage your installed fonts and deinstall unne
This library implements several locality sensitive hashing(LSH) based algorithms, including indexing data structure for high dimensional spaces and metric spaces, sketch constructions and set embedding algorithms.
The MARKet for Open Source
MARKOS will realize the prototype of a service and an interactive application providing an integrated view on the Open Source projects available the on web, focusing on functional, structural and licenses aspects of software code.
Narrows search result produced by popular Internet search engines, allowing to put extra filtering conditions, as certain words presented, certain words excluded, and so on.
simple BNF parser makes xml markup of matches
bnf2xml a simple BNF parser that takes text as input, searches according to a BNF query file, and outputs text marked up by the xml labels that show context. bnf2xml is as simple to use as any text binary ie, awk(1) grep(1). bnf2xml does not require C API because it outputs simple xml labeling. README is visible on file dl page. EXAMPLE: $ echo "hi" | bnf2xml patternfile <word><alph>h</alph><alph>i</alph></word> or <gas>hydrogen iodide</gas> patternfile says how to find needle in haystack and what to show, ie: <alph> ::= a | b | c | d ... <word> ::= <alph>+ bnf2xml is a top down recursive parser. Unlike buttom up parsers like gcc(1) or some top downs, bnf2xml is completely unambiguous / resolves ALL conflicts. Slower on ave. for parsing C or than sed(1) for simple searches. Far easier than using flex/C to create a parser. caveate: I do not suggest it's worth while to make a new gcc(1) using bnf2xml. bnf2xml an nth BETA release, but no complains yet.
Tautomaton is a C++11 -template library for deterministic (DFA) and non-deterministic finite automata (NFA). It supports regular expressions and efficient input matching of multiple regexps simultaneously. The library comes with a somewhat grep-like command-line tool for showcasing these features.
The Advanced Media Playlist System searches and indexes files based on extension. The DB entries can be appended to the integrated playlist. Player based on libvlc. Deps: SQLite, wxWidgets, Boost, libvlc, arudetools. IDE: Codelite/wxFormBuilder
A high-performance implementation of bloom filters, a lightweight duplicate detection algorithm.
BullFrog is a search engine ranking program, written as a Mozilla Firefox extension. Simply enter one or more URLs and their corresponding keywords or key phrases, and BullFrog will see what position the URLs appear in Google.
CD Maze is an easy to use CD-ROM/DVD-ROM catalog system for the GNOME/Unix/Linux-Desktop.
Condiskcat - Console Disk Cataloger - Utility to catalog and search for files on removable media