Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform.
Geographic restrictions, eligibility, and terms apply.
Get started with Nexo.
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.
Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
Spider web scritto in java che consente un utilizzo sia come applicazione stand alone, sia come core di altre applicazioni che sfruttino le sue funzionalità.
This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.
It's a Java based Extract Transform Load(ETL) tool with following features -- 1. It can take data from any source to any destination, any thing you can think of - for example from a web crawler to a database or filesystem 2. It's multithreaded and
A Web crawler prototype designed to index pages of certain resource sharing platforms based on folksonomy tags. The results are displayed in an Excel spreadsheet.
A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
This project will provide a tool for users to get a better understanding of the content and structure of an existing website. It will do this by providing a customised web spider as well as extensions to the GUESS graph visualisation application.
WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
HtmlClient provides an SGML/HTML/XHTML parser and connection client making web-spidering as easy for developers as actually surfing the web with a premade browser. Based on Apache's HttpClient.
J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.
Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
Web Crawler Engine: jsrCRAW is an intelligent Java engine Crawler for Internete Content Monitoring: read periodically the content of url, retrieve link, apply rules (Crawlet) alert user of changes.
WebLoupe is a java-based tool for analysis, interactive visualization (sitemap), and exploration of the information architecture and specific properties of local or publicly accessible websites. Based on web spider (or web crawler) technology.
Wadsworth is a java based web scripting engine. It uses user-defined XML scripts to define its actions. It can be used as a web testing tool, or as a web scraper, or to automate any web actions you wish. It can also be invoked and controlled by another
A Java implementation of a flexible and extensible web spider engine.
Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page
WebSPHINX is a web crawler (robot, spider) Java class library, originally developed by Robert Miller of Carnegie Mellon University. Multithreaded, tollerant HTML parsing, URL filtering and page classification, pattern matching, mirroring, and more.
Spider is web crawler written in the Java.Based on an Regular expression string the spider parses the internet for web pages matching this string and stores it in an MYSQL database.