retrieve data free download

Showing 32 open source projects for "retrieve data"

View related business solutions

Internet Linux Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
$300 Free Credits to Build on Google Cloud
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.

Claim $300 Free
1

douyin

Open source Douyin crawler for collecting and downloading public data

DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It...

Downloads: 10 This Week

Last Update: 6 days ago
See Project
2

Weibo Crawler

Python crawler for collecting and downloading Sina Weibo user data

weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata.

Downloads: 3 This Week

Last Update: 3 days ago
See Project
3

YourInfo

Real-time browser fingerprinting demo with cross-browser tracking

YourInfo is a personal information management tool designed to let users securely store, structure, and retrieve their key data — such as contacts, credentials, personal notes, and preferences — while also enabling AI-assisted queries or reminders using that data. The platform prioritizes privacy by focusing on local storage or user-controlled databases, ensuring sensitive data stays under the user’s control rather than in third-party servers.

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
4

CoreDNS

CoreDNS is a DNS server that chains plugins

Retrieve zone data from primaries, i.e., act as a secondary server (AXFR only) (secondary). Sign zone data on-the-fly (dnssec). Load balancing of responses (loadbalance). Allow for zone transfers, i.e., act as a primary server (file + transfer). Automatically load zone files from disk (auto).

Downloads: 2 This Week

Last Update: 2026-06-09
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

Spider

High-performance Rust web crawler and scraper for large-scale data

...It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...

Downloads: 1 This Week

Last Update: 2026-03-31
See Project
6

news-please

Python tool for crawling and extracting structured data from news site

news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a...

Downloads: 2 This Week

Last Update: 3 days ago
See Project
7

Sanity

Rapidly configure content workspaces powered by structured content

...Instead of using predefined content templates, Sanity allows developers to define schemas in code that determine how content is structured and stored. The platform stores data in a real-time backend called the Content Lake, enabling collaborative editing and instant updates across connected applications. Because the system separates content management from presentation, developers can use any front-end framework to display the data. Sanity also includes APIs and query tools that allow developers to retrieve content dynamically and integrate it into websites, mobile apps, and other digital services.

Downloads: 3 This Week

Last Update: 2026-06-16
See Project
8

Python API for JMComic

Python crawler and API for downloading JMComic albums and images

JMComic-Crawler-Python is a Python library and crawler framework designed to programmatically access and download comic content from the JMComic platform. It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes. It supports both web-based and mobile API interfaces, enabling flexible interaction with the platform depending on the available endpoints. Its architecture includes components for configuration management, download orchestration, and client communication, allowing users to automate the retrieval of manga chapters or entire albums. ...

Downloads: 0 This Week

Last Update: 2026-06-14
See Project
9

autocrawler

Multiprocess Selenium crawler for downloading images by keywords

AutoCrawler is a Python-based image crawling tool designed to automatically download large numbers of images from search engines using automated browser interaction. It uses Selenium and a Chrome browser driver to navigate image search pages and collect image sources based on keywords provided by the user. AutoCrawler supports multiprocess and multithreaded downloading, which allows it to retrieve images faster by running several tasks simultaneously. Users provide search terms through a...

Downloads: 10 This Week

Last Update: 3 days ago
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

go-dork

Fast Go-based CLI scanner for running automated search engine dorks

...Written in the Go programming language, it focuses on speed and efficiency when executing advanced search queries across multiple search engines. It allows users to run specialized queries, often referred to as “dorks,” to discover publicly exposed data, misconfigurations, or potentially vulnerable resources. It supports several major search engines and enables users to switch between them depending on the target or query requirements. go-dork can retrieve results from multiple pages of search results and process them sequentially for broader coverage during scans. go-dork also supports custom HTTP headers and proxy configuration, which can help users work around restrictions such as captchas or filtering mechanisms. ...

Downloads: 3 This Week

Last Update: 2026-03-11
See Project
11

S3cmd

Command line tool for managing Amazon S3 and CloudFront services

S3cmd (s3cmd) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command-line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc. S3cmd is written in Python. It's an open-source project available under GNU Public License v2...

Downloads: 1 This Week

Last Update: 2023-12-12
See Project
12

bilili

Command-line Bilibili video and danmaku downloader with batch support

bilili is a command-line tool designed to download videos and related content from the Bilibili video platform. It focuses on enabling users to retrieve user-uploaded videos as well as serialized content such as bangumi episodes directly from the terminal environment. It provides automated downloading capabilities that handle video streams and associated data efficiently while minimizing manual interaction. bilili supports retrieving both the video files and danmaku comments, which are the scrolling overlay comments commonly associated with the platform’s videos. ...

Downloads: 3 This Week

Last Update: 2026-03-11
See Project
13

Cinemagoer

Python package to retrieve and manage data of the IMDb

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies. Platform-independent, it can retrieve data from both the IMDb's web server and a local copy of the whole db.

4 Reviews

Downloads: 1 This Week

Last Update: 2023-05-01
See Project
14

mlscraper

ML-based HTML scraper that learns extraction rules from examples

mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages.

Downloads: 5 This Week

Last Update: 2 days ago
See Project
15

Scylla

Intelligent proxy pool for collecting and managing public proxies

Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. ...

Downloads: 19 This Week

Last Update: 2026-03-10
See Project
16

WeChatSogou

Python library to crawl and retrieve data from WeChat accounts

WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
17

casync

Content-addressable data synchronization tool

A combination of the rsync algorithm and content-addressable storage. An efficient way to store and retrieve multiple related versions of large file systems or directory trees. An efficient way to deliver and update OS, VM, IoT and container images over the Internet in an HTTP and CDN friendly way. Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. ...

Downloads: 0 This Week

Last Update: 2023-05-12
See Project
18

dwdwetter

Retrievews data from ftp.dwd.de and displays them

Retrieve Weather info from dwd, display them in a window, including animated gif - films and text display.

Downloads: 0 This Week

Last Update: 2012-08-24
See Project
19

LOGalyze CLI

Command line client interface for LOGalyze 4

Command line LOGalyze client. logalyze-cli is a powerful command line client for managing LOGalyze engine. With LOGalyze application log analyzer, you can collect your log data from any device, analyze, normalize and parse them.

1 Review

Downloads: 0 This Week

Last Update: 2017-07-19
See Project
20

GNU-BMEcat Generation Tool

GNU-BMEcat Generation Tool is a standardization tool for electronic product catalogs based on the german BMEcat specification. Those catalogs are used for eProcurement data interchange. Uses MySQL, HTML, PHP to store, retrieve and display catalog data.

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
21

ebay mine

OO PHP Libraries for mining data from eBay into mysql database

I started this project for use in a new business and decided the the development time for the end result was going to be too long. This is basically a OO PHP API to retrieve data from eBay to be stored in a MySQL database for analysis. In a test run I retrieved over 804,000 completed item auction records from the consumer electronics category on eBay.

Downloads: 1 This Week

Last Update: 2012-07-09
See Project
22

CorpSite

This project is a template for a website of the MMORPG EVE Online. It consists of a user management, news feed, gallery, community features and makes use of the EVE API to retrieve many useful informations from the game.

Downloads: 0 This Week

Last Update: 2016-10-18
See Project
23

pachulib: pachube c api

Pachulib is a simple c library for accesing and managing Pachube's datastreams and feeds.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
24

Java IO Extension

Java IO Extension is a open source project which extends java IO package by providing a java class library to access IO system in a local area network Developers can easily use IO Extension just like using java IO API to access a remote IO system

Downloads: 3 This Week

Last Update: 2015-03-31
See Project
25

Shopzilla Publisher Program fetcher

This project is intended to be for members of the Shopzilla affilliate program. The main idea is to provide a facility allowing user to retrieve data from the publisher program ftp and produce a flat file that can be used automatically.

1 Review

Downloads: 0 This Week

Last Update: 2014-08-28
See Project