A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
JaWiki is Java Wiki with a file based database to manage the Content.
The content is stored in XML files in the file system.
A html frontend allows to edit the content by the users via an Browser.
A standalone server also included.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Spidertron is a multithreaded web crawling API for web sites of moderate size (hundreds of thousands of pages) that allows you to focus not on the crawling but on processing of the information retreived.
JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform.
Geographic restrictions, eligibility, and terms apply.
The project Navigator aims at supporting automated gathering of dynamic information from third party web sites, using their web interface to post queries and to gather replies. Navigator is written in OS-independent java language.
Catalogo is a system for cataloguing resources on a web site. It allows semantic search of information on an intranet using metadata, RDF and ontology concepts. It provides a Catalog server (Java web applications) and a Catalog client (Firefox plug-in).
Toke is a webmining toolkit for web exploring, indexing and searching for Java. Toke allows to you crawl public or private web sites, in order to create web estatistics, web Pajek graphs, Lucene indexs and word frequency files for data clustering.
This project aims to create a free and open catalog over music that is popular to day including links to audio files and websites, created with our search engine, as well as statistics over genres and artist popularity.
Relased under GNU/GPL.
Searchy is a distributed metainformation search engine whose main goal is to federate search systems and integrate information. It uses RDF as abstract information model and may be used with Dublin Core, FOAF, vCard, etc
Roosster.org is a personal "on-demand" search engine. This means, it indexes only items/entries/files/URLs you explicitly tell it to index and provides a full-text-search over indexed items.
Goto http://roosster.org/dev for all details.
Thenali is a content management system software project aimed to support the publication and maintenance of educational counselling and career counselling information website.
SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
iCalGrabber is a java based application to grab event information from web sites. The events are stored on the filesystem based on Apples iCal format. These .ics files could be read by iCal specific applications like the Mozilla calendar.
Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.
myDbSearcher is a search engine for MySQL Databases. It is written in Java. It scans several tables on different databases. A XMLRPC-Server will give you access to the Index.
Currently it runs on http://www.idowa.de/ueberblick/suche/index_html
JMdRdf is the tool which creates RDF/RSS.
1.You can generate RDF/RSS about your homepage from your HTML(s) without programming. JMdRdf extract Information such as title, description, etc automatically from HTML.
2.You can paste RDF/RSS into your HTML
Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
Lude is an XML-RPC Lucene Daemon written in Java. Clients in any environment can create indexes, add/update/delete documents, and query the index through a simple XML-RPC API.
HouseSpider is a Java applet that adds search capability to your web site. It can search by two methods, by spidering through your site or by searching a cached index file. It has 100% i18n (internationalization) support.
Dr. Micheal Kay: "Saxon 8.7 is the first release to be released simultaneously by Saxonica on the Java and .NET platforms." MDP: Mission accomplished! Saxon for the .NET platform from Saxonica is now available and supported via the http://saxon.sf.net