Showing 75 open source projects for "java crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    Course Crawler is an application to compile term-definition pair from multiple web glossaries into a centralized, stable, and searchable location.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    GronoSpy is a WWW crawler which tries to extract knowledge based on the data from grono.net - a community portal.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Turn Your Content into Interactive Magic - For Free Icon
    Turn Your Content into Interactive Magic - For Free

    From Canva to Slides, Desmos to YouTube, Lumio works with the tech tools you are already using.

    Transform anything you share into an engaging digital experience - for free. Instantly convert your PDFs, slides, and files into dynamic, interactive sessions with built-in collaboration tools, activities, and real-time assessment. From teaching to training to team building, make every presentation unforgettable. Used by millions for education, business, and professional development.
    Start Free Forever
  • 5
    J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    JCrawler is a perfect cralwing/load-testing tool which is cookie-enabled and follows human crawling pattern (hit/second).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Web Crawler Engine: jsrCRAW is an intelligent Java engine Crawler for Internete Content Monitoring: read periodically the content of url, retrieve link, apply rules (Crawlet) alert user of changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 10
    WebLoupe is a java-based tool for analysis, interactive visualization (sitemap), and exploration of the information architecture and specific properties of local or publicly accessible websites. Based on web spider (or web crawler) technology.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Pödznsatch is a open and distributed hypergoogle of love. It is a semantic web application for social networking, word-of-mouth analysis and profiling. The Pödznsatch architecture includes a bot crawler, an inference engine and a query interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    WWW Universal Tester is a Java application designed to gather information about WWW. She works as a spider (robot, crawler) and collets information about size of files used on the web, structure of connections between pages, on so on.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    A new Web Crawler including sophisticated searching process especialized by language !
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    LARM is a 100% Java search solution for end-users of the Jakarta Lucene search engine framework. It contains methods for indexing files, database tables, and a crawler for indexing web sites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    a crawler to index and search the XML web
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    WebSPHINX is a web crawler (robot, spider) Java class library, originally developed by Robert Miller of Carnegie Mellon University. Multithreaded, tollerant HTML parsing, URL filtering and page classification, pattern matching, mirroring, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Content Engineering Tools including an XSLT based site rendering system, XSLT Documentation Generator, and Swing based Site Crawler. The tools may be downloaded and used seperately since there are no dependancies between them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    An application to crawl public profiles of www.myspace.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    This project aims to be a base for specialized image crawlers. It can download images from a specific website and can be extended to crawler any website. All the the processes are multithread. Accept filters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Java Twitter Crawler
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    RedditCrawler

    Crawls reddit website to pull statistical info.

    Reddit Crawler is made to crawl a list of subreddits and get the number of online users. The project will be updated to get more statistical info
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    Stegcrawler

    A web crawler to search the Internet for use of steganography

    A web crawler to search the Internet for use of steganography. Includes a MySQL database, and a Java based application to search for, test, and attempt to crack images that (may) use steganography. Created by the CIST 1450: Object Orientated Programming class at the University of Pittsburgh at Bradford. Class participants were: Josiah Bennett Dan Connor Lincoln Dorward Samuel Ficorilli Samuel Kleiner Bryan Nelson Rachel Rybicki Mark Saccucci Adam Schrot Daniel Taylor Steven...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Luanium

    A Lua-based crawling scripting language and leveraging selenium

    I needed a way to crawl a site, crawling using commands. I would put commands in a file or DB to use selenium to interpret the HTML and Javascript. The best would be to have a complete language with conditionals and looping. I'm a java developper and I needed that the crawler to run in a Spring-Boot application. So I decided to use a Lua interpreter in Java to build a crawling tool based on Selenium. The trick here is to add the crawling commands into the Lua interpreter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Spider is web crawler written in the Java.Based on an Regular expression string the spider parses the internet for web pages matching this string and stores it in an MYSQL database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    studiMaps is a web based application for visualization and analysis of social networks. It consists of two software components: a web-crawler for getting data and the web based application for visualization.
    Downloads: 0 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.