Self-hosted search engine with web service to share discoveries with
...You can use the KeyWord to find relative WebSites, dig in important information, search answers. It has a web server inside, use it to share discoveries with people. App's Source Codes included, can be freely distributed over the internet in an unchanged or changed form.
Check the file size after downloaded the Android APK.
https://sourceforge.net/projects/ftserver-android/files/
The Code Repository includes
FTServer Android Version SourceCode (Android)
FTServer Java Server Version SourceCode (Linux Windows)
FTServer .NET Server Version SourceCode (Linux Windows)
https://sourceforge.net/p/ftserver-android/code/
panFMP is a generic framework suitable for harvested XML metadata that is searchable through Apache Lucene without any additional RDBMS. Fields can be defined by XPath allowing for full text queries on all types of fields including numerical ranges.
The code was moved to Github: https://github.com/pangaea-data-publisher/panfmp
cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).
SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX.
CiteSeerX has moved to GitHub, please get the latest code from: https://github.com/SeerLabs/CiteSeerX
Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
Jbox is a Java full-text search engine framework. It is not a complete application, but rather a code library and API that can easily be used for constructing a search engineer.
The WhereIsNow Web Service Client Library project is a java library used to query the WhereIsNow webservices. You can freely embed it in your code to easily develop new clients and integrate the WhereIsNow features in your own applications.
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.
Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
High performance faceted/parametric search implementation that handles various types of semi-structured data. Written in Java. * We have moved to Google code: http://code.google.com/p/browse-engine, this page is to be deprecated.
Dr. Micheal Kay: "Saxon 8.7 is the first release to be released simultaneously by Saxonica on the Java and .NET platforms." MDP: Mission accomplished! Saxon for the .NET platform from Saxonica is now available and supported via the http://saxon.sf.net
Group-CCS development Components, templates, tools, accessories, tutorial, modules, translations, documentation, codes, scripts, everything that can improve the work of who uses the powerful tool of development, CCS - CodeCharge Studio.
This code supplies miniature pedagogical Java implementations of information retrieval, spidering, and text-processing software. It was initially developed for an introductory course on Intelligent Information Retrieval and Web Search in UT Austin.
IGLU is a Java class library designed to facilitate sharing of code among Artificial Intelligence/Information Retrieval researchers to illustrate how various problems can be solved in Java. It is developed and maintained by the IGLU Research Group.
TouchGraph provides a set of interfaces for graph visualization using force-based layout and focus+context techniques. For now only older code is available, but we are planning to release new versions as well.
Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page
Frosttie (FROnt-end SchemaTron Text Internet Engine) takes XHTML pages and processes them with various user-definable filters such a W3C's WAI, Section 508 (US) web usability compliance, ad removal, etc. It can be used with zKnowMan.
This project contains all the code for the eXploringXML column on WebReference.com at http://exploringxml.com .
Currently this is only an applet for parsing and displaying Rich Site Summary (RSS) files, but more Java code for XML will come