data processing free download

86 projects for "data processing" with 2 filters applied:

Internet BSD Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
1

fluentbit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

Fluent Bit is a super-fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. A robust, lightweight, and portable architecture for high throughput with low CPU and memory usage from any data source to any destination. Proven across distributed cloud and container environments. Highly available with I/O handlers to store data for disaster recovery. Granular management of data parsing and routing....

Downloads: 2 This Week

Last Update: 1 day ago
See Project
2

Acl

A powerful server and network library, including coroutine

The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as...

Downloads: 10 This Week

Last Update: 2026-03-09
See Project
3

spider_collection

Collection of Python web scraping scripts for data extraction tasks

...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
4

Python API for JMComic

Python crawler and API for downloading JMComic albums and images

...It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes. It supports both web-based and mobile API interfaces, enabling flexible interaction with the platform depending on the available endpoints. Its architecture includes components for configuration management, download orchestration, and client communication, allowing users to automate the retrieval of manga chapters or entire albums. ...

Downloads: 11 This Week

Last Update: 2026-04-07
See Project
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
5

syslog-ng

Log management solution that improves the performance of SIEM

syslog-ng is the log management solution that improves the performance of your SIEM solution by reducing the amount and improving the quality of data feeding your SIEM. With syslog-ng Store Box, you can find the answer. Search billions of logs in seconds using full text queries with Boolean operators to pinpoint critical logs. syslog-ng Store Box provides secure, tamper-proof storage and custom reporting to demonstrate compliance. syslog-ng can deliver data from a wide variety of sources to...

Downloads: 20 This Week

Last Update: 2026-02-24
See Project
6

watercrawl

AI-ready web crawler that extracts and structures website content

WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website....

Downloads: 9 This Week

Last Update: 2026-03-11
See Project
7

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...

Downloads: 13 This Week

Last Update: 2026-03-31
See Project
8

douyin

Open source Douyin crawler for collecting and downloading public data

DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It...

Downloads: 7 This Week

Last Update: 2026-03-13
See Project
9

MDCx

Movie metadata scraper and organizer for media libraries and NFO

...It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports image processing tasks such as downloading and cropping artwork used by media centers. It includes several interfaces, allowing users to operate it through a graphical desktop application, a browser-based web interface, or command-line utilities depending on their workflow. ...

Downloads: 13 This Week

Last Update: 2026-03-10
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
10

QueryList

Progressive PHP web crawler framework with jQuery-like DOM parsing

QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
11

news-please

Python tool for crawling and extracting structured data from news site

...It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.

Downloads: 1 This Week

Last Update: 2026-04-08
See Project
12

diskover-community

Open source file indexing & storage analytics powered by Elasticsearch

Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
13

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 3 This Week

Last Update: 2025-10-27
See Project
14

mendelson OFTP2

Implementation of the OFTP2 protocol (RFC 5024)

The mendelson OFTP2 is a JAVA-based open source OFTP2 solution. It contains a logging- and configuration UI and supports encryption, digital signatures, message compression, TLS, certificate exchange, message routing, mail notification

Downloads: 46 This Week

Last Update: 2026-04-08
See Project
15

uriparser

RFC 3986 URI parsing and processing libary

PLEASE NOTE that we are in the process of moving to GitHub: https://github.com/uriparser/uriparser uriparser is a strictly RFC 3986 compliant URI parsing library written in C89. uriparser is cross-platform, fast, supports Unicode and is licensed under the New BSD license. PLEASE NOTE that we are in the process of moving to GitHub: https://github.com/uriparser/uriparser

Downloads: 12 This Week

Last Update: 2025-12-15
See Project
16

queXF

Web based, Open Source alternative to Remark OMR or Teleform

queXF, a CADE (Computer Assisted Data Entry) Tool, processes filled paper forms that were created in queXML, such as survey questionnaires. queXF can be used as a web based, Open Source alternative to programs such as Cardiff Teleform and Remark OMR.

2 Reviews

Downloads: 9 This Week

Last Update: 2024-07-23
See Project
17

crawly

High-level web crawling and scraping framework for Elixir apps

Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. ...

Downloads: 4 This Week

Last Update: 2026-03-11
See Project
18

mlscraper

ML-based HTML scraper that learns extraction rules from examples

mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
19

pspider

Simple Python framework for building multithreaded web crawlers

PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
20

Abot

Fast and flexible C# framework for building customizable web crawlers

...It focuses on speed, flexibility, and extensibility while handling the complex low-level tasks involved in web crawling. It manages essential components such as multithreading, HTTP requests, scheduling, and link parsing so developers can focus on processing the collected data. Abot follows a modular architecture that allows developers to customize nearly every stage of the crawl process by implementing or replacing core interfaces. Abot exposes an event-driven model that enables applications to react to crawling events such as page completion or crawl restrictions. It also provides configuration options that control crawling behavior including concurrency limits, crawl delays, and request parameters. ...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
21

ruia

Async Python framework for fast and flexible web scraping spiders

...It also supports middleware and plugin systems that allow customization of request handling, response processing, and additional functionality.

Downloads: 8 This Week

Last Update: 2026-03-11
See Project
22

csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster

csv-parser is a streaming CSV parsing library for Node.js designed for efficiency and correctness. It implements the stream API native to Node, allowing you to pipe a file or readable stream into the parser and process each row (as a JavaScript object or array) as soon as it's parsed — which is crucial for handling large CSV files without loading them entirely into memory. The parser handles standard CSV semantics including quoted fields, variable delimiters, escape sequences, and optional...

Downloads: 7 This Week

Last Update: 2025-12-04
See Project
23

CSSBox

Pure Java HTML / CSS rendering engine

CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete information about the rendered page suitable for further processing. However, it also allows displaying the rendered document.

3 Reviews

Downloads: 13 This Week

Last Update: 2021-02-01
See Project
24

mod_psldap

Apache LDAP Directory Manager

mod_psldap is an Apache module for leveraging LDAP services built on the OpenLDAP library and the Apache APIs, to include web based A&A, web based updates to the LDAP store, server-side XSLT processing, and session management across servers.

Downloads: 0 This Week

Last Update: 2019-05-14
See Project
25

iText®, a JAVA PDF library

PDF Library for Developers

iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...

Downloads: 175 This Week

Last Update: 2024-06-01
See Project