Join/Login
Open Source Software
Business Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Open Source Software

Business Software

SourceForge Podcast

Articles
Case Studies
Learn
Blog

Menu

Help
Create
Join
Login

Home
Browse Open Source
Search Results

Search Results for "web crawler spider" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Linux 56
Windows 56
Mac 51
More...
BSD 42
ChromeOS 38
Desktop Operating Systems 1

Category

Internet 48
Software Development 11
System 8
Scientific/Engineering 7
Business 6
Formats and Protocols 2
Artificial Intelligence 1
Communications 1
Database 1
Education 1
Social sciences 1

License

OSI-Approved Open Source 54
Other License 2
Public Domain 2

Translations

English 29
Brazilian Portuguese 3
German 3
Chinese (Simplified) 1
More...
French 1
Italian 1

Programming Language

Java 62
PHP 4
C++ 3
JavaScript 3
PL/SQL 2
More...
Python 2
C 1
Go 1
JSP 1
Visual Basic .NET 1

Status

Production/Stable 17
Pre-Alpha 14
Alpha 13
Beta 11
More...
Planning 6
Mature 1
Inactive 1

Showing 62 open source projects for "web crawler spider"

View related business solutions

Java Clear Filters & Widen Search

Propelling Payments for Software Platforms
For SaaS businesses to monetize payments through its turnkey PayFac-as-a-Service solution.

Exact Payments delivers easy-to-integrate embedded payment solutions enabling you to rapidly onboard merchants, instantly activate a variety of payment methods and accelerate your revenue — delivering an end-to-end payment processing platform for SaaS businesses.

Learn More
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
1

ItSucks

This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.

3 Reviews

Downloads: 5 This Week

Last Update: 2013-04-29
See Project
2

Project AWESOME

A school project consisting of a crawler, a server and a searchpage.

Downloads: 0 This Week

Last Update: 2013-05-16
See Project
3

Simple Web Spider

Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.

1 Review

Downloads: 0 This Week

Last Update: 2012-12-04
See Project
4

Web Test Framework

An automated website testing framework. Includes a utility to spider a site to determine content and a variety of testing plugins to ensure the content complies to validity and accessibility. A report is then generated with the results of the test.

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
Speech-to-Text: Automatic Speech Recognition
Accurately convert voice to text in over 125 languages and variants by applying Google's powerful machine learning models with an easy-to-use API.

New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.

Try for free
5

MuSE-CIR

MuSE-CIR is a Multigram-based Search Engine and Collaborative Information Retrieval system. Written in Java /JSP, supports any JDBC connectable database - thoroughly tested only with OracleXE, and somewhat with MySQL, JSP on Apache Tomcat 5.5

Downloads: 0 This Week

Last Update: 2013-05-22
See Project
6

JavaWAC

Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
7

nxs Crawler

nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
8

jSEO: Pluggable SEO for JEE

jSEO -- Pluggable SEO (Search Engine Optimization) for dynamic JEE web applications

1 Review

Downloads: 0 This Week

Last Update: 2014-03-04
See Project
9

AO-DAAC Crawler

Crawl a set of files, accumulating information on the temporal and spatial extent of the data in each file, for later search and retrieval.

Downloads: 0 This Week

Last Update: 2014-06-08
See Project
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.

This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.

Learn More
10

Java Sitemap Parser

The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol. This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.

Downloads: 0 This Week

Last Update: 2016-02-11
See Project
11

Retriever: a light, extensible crawler

Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
12

DeDuplicator (Heritrix add-on)

The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Downloads: 7 This Week

Last Update: 2013-04-02
See Project
13

LogCrawler

LogCrawler is an ANT task for automatic testing of web applications. Using a HTTP crawler it visits all pages of a website and checks the server logfiles for errors. Use it as a "smoketest" with your CI system like CruiseControl.

Downloads: 1 This Week

Last Update: 2013-04-19
See Project
14

WebNews Crawler

WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
15

Sit Start

This project will provide a tool for users to get a better understanding of the content and structure of an existing website. It will do this by providing a customised web spider as well as extensions to the GUESS graph visualisation application.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
16

Course Crawler

Course Crawler is an application to compile term-definition pair from multiple web glossaries into a centralized, stable, and searchable location.

Downloads: 0 This Week

Last Update: 2013-03-11
See Project
17

Crawl-By-Example (Heritrix plugin)

Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.

Downloads: 1 This Week

Last Update: 2014-12-14
See Project
18

NightCrawler

NightCrawler is a multithreaded web spider which uses MIME types to download files.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
19

J-Obey (Robots.txt Crawler Module)

J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.

Downloads: 0 This Week

Last Update: 2015-08-05
See Project
20

isobel

A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.

Downloads: 1 This Week

Last Update: 2013-03-22
See Project
21

RSS spider

RSS spider for getting multiple RSS feeds into single place with search capabilities.

Downloads: 0 This Week

Last Update: 2016-07-23
See Project
22

JLinkCheck

JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features

Downloads: 0 This Week

Last Update: 2016-04-26
See Project
23

SmartCrawler

SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
24

Sperowider

Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
25

ASpider

Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.

Downloads: 1 This Week

Last Update: 2013-03-08
See Project

Previous
1
You're on page 2
3
Next

Related Searches

cheat engine 5.5

artificial intelligence lottery analysis

sql database engine

Related Categories

Software Development

Scientific/Engineering

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
225 Broadway Suite 1600
San Diego, CA 92101
+1 (858) 454-5900

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2024 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: