Showing 75 open source projects for "java crawler"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    The “Media Crawler” is an extensible Eclipse RCP based desktop application which will crawl a given file system, extract metadata from files, map metadata to internal schemas and store the metadata in a databse. This project is ANDS-funded.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    A web search engine and crawler written in java/mysql, fulltext and vertical search, word segmentation system .
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    RiverGlass EssentialScanner is an open source web and file system crawler which indexes the text content of discovered files so they can be retrieved and analyzed. It provides simple scanner capabilities as part of larger enterprise search solutions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 5
    a minimal Java web crawler
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Ex-Crawler
    Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Agent based Regional Crawler strategy implementation - gathers users' common needs and interests in a certain domain. It crawls based on these interests, instead of crawling the web without any predefined order.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Very basic crawler on java
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Folksonomy Web Crawler
    A Web crawler prototype designed to index pages of certain resource sharing platforms based on folksonomy tags. The results are displayed in an Excel spreadsheet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Turn Your Content into Interactive Magic - For Free Icon
    Turn Your Content into Interactive Magic - For Free

    From Canva to Slides, Desmos to YouTube, Lumio works with the tech tools you are already using.

    Transform anything you share into an engaging digital experience - for free. Instantly convert your PDFs, slides, and files into dynamic, interactive sessions with built-in collaboration tools, activities, and real-time assessment. From teaching to training to team building, make every presentation unforgettable. Used by millions for education, business, and professional development.
    Start Free Forever
  • 10
    It's a Java based Extract Transform Load(ETL) tool with following features -- 1. It can take data from any source to any destination, any thing you can think of - for example from a web crawler to a database or filesystem 2. It's multithreaded and
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ItSucks
    This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    A school project consisting of a crawler, a server and a searchpage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MuSE-CIR is a Multigram-based Search Engine and Collaborative Information Retrieval system. Written in Java /JSP, supports any JDBC connectable database - thoroughly tested only with OracleXE, and somewhat with MySQL, JSP on Apache Tomcat 5.5
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    This is a simple webcrawler for FaceBook (TM) written in Java. The crawler will surf the public user pages (this means that you do not need to provide ann account) to reconstruct the friendship graph for further studies and analises
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Crawl a set of files, accumulating information on the temporal and spatial extent of the data in each file, for later search and retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    jSEO -- Pluggable SEO (Search Engine Optimization) for dynamic JEE web applications
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    a web crawler in java
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol. This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A java game that was developed for a class project. The original intention was to make it similar to Secret of Mana, but it became more of a dungeon crawler. (8/15/09) Development was slowed due to Summer. We should be resuming development shortly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    The project aims at developing a system that will consist of a crawler, a user interface and a database that will allow user to obtain research papers in PDF format from any domain and carry out the analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    LogCrawler is an ANT task for automatic testing of web applications. Using a HTTP crawler it visits all pages of a website and checks the server logfiles for errors. Use it as a "smoketest" with your CI system like CruiseControl.
    Downloads: 0 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.