Provide a robust and efficient implementation of n-gram based classifiers to Java. N-Gram algorithms have shown to be surprisingly good at tasks like guessing the language/encoding from an arbitrary text file. And there are many more applications.
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
Glue 2 is a Semantic Web Service discovery engine fully compatible with the WSMO meta-model and the WSML language that aims at solving polarization problems by using mediators.
EasyGIS simplifies GIS data management, sharing, and publishing. REST interfaces (json, html views). Lucene based FTS searches. Thematic maps, business cartography. Integration with external GIS data providers - Google, OSM.
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
jBingAPI is a java library to query the microsoft search engine bing (http://www.bing.com/) using their public api. jBingAPI just makes it a lot easier to communicate with this api.
HttpFinder is web content searching tool. It enables look for text content that matches given regular expression in html pages/scripts etc. All navigation is performed with use of other regexp which describes links to visit.
nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can
AIS - Associative Indexing Service, an application for storing bookmarks, memos, indexing of big (lifetime) archives for fast future access to the data by (personalized) keywords. In other words - it is an extension of human associative memory :)
This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.
Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Java/Swish-e bridge. This application is built arround a simple API and a Web container to provide access to the search facility (via web-services) and management/indexing (wep app).
JeCARS (Java Extendable Contents And Rights System) is a RESTful webservice which delivers pluggable output formats, e.g. Atom feeds or HTML.
Third party applications can be plugged in.
A JCR (JSR-170) repository (Jackrabbit) is used for storage.
The Semantic Web implementation using native xml database as backend storage. A SPARQL java compiler to XQuery using Jena. There are XQuery scripts for native xml database Sedna(http://modis.ispras.ru/sedna/).
The WhereIsNow Web Service Client Library project is a java library used to query the WhereIsNow webservices. You can freely embed it in your code to easily develop new clients and integrate the WhereIsNow features in your own applications.
Fire.now is a Firefox plugin that automatically adds your documents to the WhereIsNow latest version discovery service. Everytime you upload a document somewhere, Fire.now integrates the WhereIsNow keys into the file and add it's url to WhereIsNow.
The implementation of Bee Hive @ Work algorithm that simulates the foraging behavior of honey bees in nature. The aim is to provide an extensible framework that can be used by researchers to simply create new applications of this algorithm.
NoMule is a program which lets you to download videos from online video communities like youtube, google video, dailymotion, myvideo or some porn sites like youporn and convert them into any format you want like mp3.It also has a commandline interface.
The Cornell Web Lab Collaboration Server is a suite of tools and services for GUI-based extraction, analysis and sharing of archived web data. See http://weblab.infosci.cornell.edu/ and http://www.cs.cornell.edu/~weigel for details about the project.
Narrows search result produced by popular Internet search engines, allowing to put extra filtering conditions, as certain words presented, certain words excluded, and so on.
RSS EXTRACTOR is a java library for generating RSS newsfeeds considering the RSS web feeds from multiple websites. It extracts the best of newsfeed entries and a produces a RSS file which is a fusion of newsfeed entries from several websites.
High performance faceted/parametric search implementation that handles various types of semi-structured data. Written in Java. * We have moved to Google code: http://code.google.com/p/browse-engine, this page is to be deprecated.
A fat client price checking tool. Similar in spirit to pricerunner and others except it checks prices at the source on demand. Supposed to save entering the same search criteria on multiple sites and then tabbing through to do a comparison.
Javen library is a framework for developing C++ application simply, with similar API to Java library. Hawk search engine is a software platform that used to build Vertical Search Product more easily for the Moderate Company or End Users.
The Java-Sitemapper is a Java API for building sitemap files to improve search indexing on Google, Yahoo!, MSN, and Ask.com. This project strives to implement the latest in search technology for use on the Java platform.