data processing free download

Showing 125 open source projects for "data processing"

View related business solutions

Internet Linux Clear Filters & Widen Search

Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

fluentbit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

Fluent Bit is a super-fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. A robust, lightweight, and portable architecture for high throughput with low CPU and memory usage from any data source to any destination. Proven across distributed cloud and container environments. Highly available with I/O handlers to store data for disaster recovery. Granular management of data parsing and routing....

Downloads: 2 This Week

Last Update: 20 hours ago
See Project
2

Jimp

An image processing library written entirely in JavaScript for Node

An image processing library for Node written entirely in JavaScript, with zero native dependencies. If you're using this library with TypeScript the method of importing slightly differs from JavaScript. Instead of using require, you must import it with ES6 default import scheme. If you're using a web bundles (webpack, rollup, parcel) you can benefit from using the module build of jimp. Using the module build will allow your bundler to understand your code better and exclude things you aren't...

Downloads: 9 This Week

Last Update: 2026-04-07
See Project
3

Tesla

The flexible HTTP client library for Elixir

The flexible HTTP client library for Elixir, with support for middleware and multiple adapters. Tesla is an HTTP client loosely based on Faraday. It embraces the concept of middleware when processing the request/response cycle. Define module with use Tesla and choose from a variety of middleware. Tesla is built around the concept of composable middlewares. This is very similar to how Plug Router works. All HTTP functions, such as Tesla.get/3 and Tesla.post/4, can take a dynamic client as the...

Downloads: 7 This Week

Last Update: 2026-01-26
See Project
4

Acl

A powerful server and network library, including coroutine

The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as...

Downloads: 10 This Week

Last Update: 2026-03-09
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Qualitis

Qualitis is a one-stop data quality management platform

Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. ...

Downloads: 3 This Week

Last Update: 2025-10-17
See Project
6

spider_collection

Collection of Python web scraping scripts for data extraction tasks

...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
7

Python API for JMComic

Python crawler and API for downloading JMComic albums and images

...It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes. It supports both web-based and mobile API interfaces, enabling flexible interaction with the platform depending on the available endpoints. Its architecture includes components for configuration management, download orchestration, and client communication, allowing users to automate the retrieval of manga chapters or entire albums. ...

Downloads: 11 This Week

Last Update: 2026-04-07
See Project
8

Google Mobile Ads Unity Plugin

}Unity Plugin for the Google Mobile Ads SDK

...The plugin provides a C# interface for requesting ads that is used by C# scripts in your Unity project. You can help improve the Google Mobile Ads Unity plugin by opting-in to sending usage data to Google. The data collected is general information about how you are using the plugin (such as ad unit creation and processing errors).

Downloads: 4 This Week

Last Update: 2026-02-25
See Project
9

geckodriver

WebDriver for Firefox

geckodriver is an implementation of WebDriver, and WebDriver can be used for widely different purposes. How you invoke geckodriver largely depends on your use case. If you are using geckodriver through Selenium, you must ensure that you have version 3.11 or greater. Because geckodriver implements the W3C WebDriver standard and not the same Selenium wire protocol older drivers are using, you may experience incompatibilities and migration problems when making the switch from FirefoxDriver to...

Downloads: 63 This Week

Last Update: 2025-02-25
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

syslog-ng

Log management solution that improves the performance of SIEM

syslog-ng is the log management solution that improves the performance of your SIEM solution by reducing the amount and improving the quality of data feeding your SIEM. With syslog-ng Store Box, you can find the answer. Search billions of logs in seconds using full text queries with Boolean operators to pinpoint critical logs. syslog-ng Store Box provides secure, tamper-proof storage and custom reporting to demonstrate compliance. syslog-ng can deliver data from a wide variety of sources to...

Downloads: 20 This Week

Last Update: 2026-02-24
See Project
11

watercrawl

AI-ready web crawler that extracts and structures website content

WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website....

Downloads: 9 This Week

Last Update: 2026-03-11
See Project
12

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...

Downloads: 13 This Week

Last Update: 2026-03-31
See Project
13

douyin

Open source Douyin crawler for collecting and downloading public data

DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It...

Downloads: 7 This Week

Last Update: 2026-03-13
See Project
14

MDCx

Movie metadata scraper and organizer for media libraries and NFO

...It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports image processing tasks such as downloading and cropping artwork used by media centers. It includes several interfaces, allowing users to operate it through a graphical desktop application, a browser-based web interface, or command-line utilities depending on their workflow. ...

Downloads: 13 This Week

Last Update: 2026-03-10
See Project
15

Trafilatura

Python & command-line tool to gather text on the Web

Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...

Downloads: 0 This Week

Last Update: 2024-12-03
See Project
16

Shelf

Web server middleware for Dart

...Map server logic into a simple function: a single argument for the request, the response is the return value. Trivially mix and match synchronous and asynchronous processing. Flexibility to return a simple string or a byte stream with the same model. An adapter must handle all errors from the handler, including the handler returning a null response. It should print each error to the console if possible, then act as though the handler returned a 500 response. The adapter may include body data for the 500 response, but this body data must not include information about the error that occurred. ...

Downloads: 0 This Week

Last Update: 2026-02-26
See Project
17

RESTinio

HTTP/WebSocket server C++14 library

...Async request handling. Cannot get the response data immediately? That's ok, store the request handle somewhere and/or pass it to another execution context and get back to it when the data is ready.

Downloads: 22 This Week

Last Update: 2026-04-02
See Project
18

BrowserOS

Agentic browser; privacy-first alternative to ChatGPT Atlas

BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...

Downloads: 30 This Week

Last Update: 7 days ago
See Project
19

Snoop Project

This is the most powerful software taking into account CIS location

...Snoop is a research work (own database / closed bugbounty) in the field of searching and processing public data on the Internet. In terms of specialized search, Snoop is able to compete with traditional search engines.

Downloads: 2 This Week

Last Update: 2026-01-01
See Project
20

QueryList

Progressive PHP web crawler framework with jQuery-like DOM parsing

QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
21

news-please

Python tool for crawling and extracting structured data from news site

...It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.

Downloads: 1 This Week

Last Update: 2026-04-08
See Project
22

diskover-community

Open source file indexing & storage analytics powered by Elasticsearch

Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
23

Tweepy

Twitter for Python

An easy-to-use Python library for accessing the Twitter API. You can also use Git to clone the repository from GitHub to install the latest development version. The easiest way to install the latest version from PyPI is by using pip. Twitter requires all requests to use OAuth for authentication. The API class provides access to the entire twitter RESTful API methods. Each method can accept various parameters and return responses. When we invoke an API method most of the time returned back to...

Downloads: 2 This Week

Last Update: 2025-06-22
See Project
24

Aimeos headless distribution

Aimeos cloud-native, API-first ecommerce headless distribution

Aimeos Headless is an open-source headless eCommerce distribution built on top of the Laravel framework, designed to provide a fast and scalable API-driven commerce backend. The project exposes a comprehensive REST and GraphQL API that allows developers to build custom storefronts or commerce applications using any frontend technology. Because the platform follows a headless architecture, it separates the commerce logic from the presentation layer, enabling developers to build web, mobile,...

Downloads: 4 This Week

Last Update: 2026-03-15
See Project
25

SingleFile

Web Extension for saving a copy of complete web page in a single file

Web Extension for Firefox/Chrome/MS Edge and CLI tool to save a faithful copy of an entire web page in a single HTML file. SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Vivaldi, Brave, Waterfox, Yandex Browser, and Opera. It helps you to save a complete web page into a single HTML file. Wait until the page is fully loaded. Click on the SingleFile button in the extension toolbar to save the page. You can click again on the...

Downloads: 14 This Week

Last Update: 2024-03-28
See Project