Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud
Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
Get a free trial
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.
Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
HttpFinder is web content searching tool. It enables look for text content that matches given regular expression in html pages/scripts etc. All navigation is performed with use of other regexp which describes links to visit.
The Cornell Web Lab Collaboration Server is a suite of tools and services for GUI-based extraction, analysis and sharing of archived web data. See http://weblab.infosci.cornell.edu/ and http://www.cs.cornell.edu/~weigel for details about the project.
WebWordCount crawls a website and counts the occurrences of words. It displays the words for each website. The number of pages to search on each website may be specified. The Java source has Java 1.4, Java 5, and Java 6 versions. Post updates to enhance.
The complete suggestions framework for java, supporting single and multi field suggest, java suggest box, client/server with hessian or json-rpc, and GWT AJAX suggest box, phonetic plugins. Proven high performance for data sets > 1 Mio.
Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
WebPagesChanges provides a platform for complete, easy and high accuracy marking of updated information in web pages. The user, with one click, can check for new information and see the update status by the colored mark of the information and their pages
A Java library which allows to parse the latest freely available RDF files available at DMOZ (Open Directory Project) and inserts them into any JDBC compliant relational database (i.e. MySQL, PostgreSQL and others to come like Oracle, MS Access, SQLite).
SearchSite is intended to support out-of-the-box search for small to medium websites, bridging the gap between simple PHP/Perl scripts at one extreme or something like Nutch which is intended to deal with millions of pages at the other.
OpenSiteSearch is the new Open Source version of OCLC's original java-based web application for building Z39.50 portals (i.e. virtual union catalogues). This project is specifically aimed at the library community.
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.
Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
Project consist of 2 parts. One of them is a J2ME app. used to get information such as photo, position, speed & course from GPS and transfers it to the web server. Another one is a web app. which allows to manage and display received data using GoogleMap
Java program to extract postings and comments from http://www.livejournal.com (blog) into DB and view/classify/process it. LJ loader. Components to reuse: perl-like, but efficient Web pages scraper, trees analyzer, concurrent scheduler.
JaWiki is Java Wiki with a file based database to manage the Content.
The content is stored in XML files in the file system.
A html frontend allows to edit the content by the users via an Browser.
A standalone server also included.
Spidertron is a multithreaded web crawling API for web sites of moderate size (hundreds of thousands of pages) that allows you to focus not on the crawling but on processing of the information retreived.
Catalogo is a system for cataloguing resources on a web site. It allows semantic search of information on an intranet using metadata, RDF and ontology concepts. It provides a Catalog server (Java web applications) and a Catalog client (Firefox plug-in).
myDbSearcher is a search engine for MySQL Databases. It is written in Java. It scans several tables on different databases. A XMLRPC-Server will give you access to the Index.
Currently it runs on http://www.idowa.de/ueberblick/suche/index_html
DialogSearch is an experimental approach to web site searching, which uses the similarity between web pages to retrieve them. It is an alternative to hyperlink-based algorithms such as PageRank and HITS. BEWARE: This is only an experimental prototype.
Lucene Server is a javaserver application for simply create and manage Jakarta Lucene Indexes. It is designed to help you integrate Lucene in distributed environnements.
Swing-Search tool to effectively search among a list of strings and open a corresponding webpage in a browser.
It was originally designed to quickly search all titles of pages that are stored in a Wiki.
"girtools" is an implementation of Grid Information Retrieval (GIR). GIR is an emerging open standard for IR on the grid designed to allow dynamic, secure creation and searching of distributed information systems.
The goal of bookman is to implement a network based service for managing and distributing bookmarks transparently from a central server to any bookman-enabled client software (curently focussing on Mozilla, IE and Opera).