Showing 15 open source projects for "apache"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Context for your AI agents Icon
    Context for your AI agents

    Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

    Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
    Try for free
  • 1
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    crawlee

    crawlee

    A web scraping and browser automation library for Node.js

    Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back. It keeps your proxies healthy by rotating them smartly with good fingerprints that...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
    Learn More
  • 5
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking. Since pyspider has various components, you can just run pyspider to start a standalone and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Perl Web Scraping Project

    Perl Web Scraping Project

    Perl Web Scraping Project

    Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Smart Business Texting that Generates Pipeline Icon
    Smart Business Texting that Generates Pipeline

    Create and convert pipeline at scale through industry leading SMS campaigns, automation, and conversation management.

    TextUs is the leading text messaging service provider for businesses that want to engage in real-time conversations with customers, leads, employees and candidates. Text messaging is one of the most engaging ways to communicate with customers, candidates, employees and leads. 1:1, two-way messaging encourages response and engagement. Text messages help teams get 10x the response rate over phone and email. Business text messaging has become a more viable form of communication than traditional mediums. The TextUs user experience is intentionally designed to resemble the familiar SMS inbox, allowing users to easily manage contacts, conversations, and campaigns. Work right from your desktop with the TextUs web app or use the Chrome extension alongside your ATS or CRM. Leverage the mobile app for on-the-go sending and responding.
    Learn More
  • 10
    Constellio Enterprise Search engine

    Constellio Enterprise Search engine

    Open source Search Engine and Enterprise Search

    Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    HtmlClient provides an SGML/HTML/XHTML parser and connection client making web-spidering as easy for developers as actually surfing the web with a premade browser. Based on Apache's HttpClient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DirIndexFaker is a PHP script designed to produce fake apache directory listings for the purpose of slowing down, and overloading with false positives the web spiders used by the RIAA, MPAA, and other Copyright Cartel members.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next