Showing 7 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Abot

    Abot

    Fast and flexible C# framework for building customizable web crawlers

    Abot is an open source C# web crawler framework designed to help developers efficiently crawl and process web content. It focuses on speed, flexibility, and extensibility while handling the complex low-level tasks involved in web crawling. It manages essential components such as multithreading, HTTP requests, scheduling, and link parsing so developers can focus on processing the collected data. Abot follows a modular architecture that allows developers to customize nearly every stage of the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    dboxShare

    dboxShare

    A more reliable enterprise file sharing synchronization software

    dboxShare is an easy-to-use free open-source enterprise file hosting server software, based on .NET technology development, used to build a safe and efficient file cloud storage and cloud management platform. Users don't need to change their work habits, and two-way file synchronization will automatically upload, download and update versions according to their permissions, providing a convenient and efficient solution for sharing and cooperation. The system has the characteristics of simple...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    intoit-sra

    intoit-sra

    Find exact position of a web site or URL among search results

    IntoIT-SRA can easily find the position of your web site within search results of a search engine. It also includes an exclusion list that will help you remove unwanted sites. Includes web data extraction, a personal spreadsheet loader for data filtering, a handy alarm for appointments and world time clock. This is a Beta version. This project is looking for programmers who want to build a Linux based version of IntoIT-SRA or a plug-in for browsers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    PDF Clown

    PDF Clown

    General-Purpose PDF Library for Java and .NET

    PDF Clown is a general-purpose Java and .NET library for manipulating PDF files through multiple abstraction layers, rigorously adhering to PDF 1.7 specification (ISO 32000-1). This project aims to provide a universal access to PDF files (creation, reading, editing, rendering...) through an accurate and elegant object-oriented API. * Features: http://pdfclown.org/overview/features/ * Overview: http://pdfclown.org/overview/architecture/ * Website: http://pdfclown.org/ * Blog:...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    The Metadata Express is a web application for maintaining and browsing documentation of databases, a.k.a. metadata. It has automatic extraction, a method of creating description links between objects, an issue tracking system and is very easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    A toolkit for crawling information from web pages by combining different kinds of "actions". Actions are simple operations such as navigation to a specified url or extraction of text from the html. Also available is a graphic user interface.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB