Page 2 | Best Open Source Windows Big Data Tools 2026

Big Data Tools for Windows

View 72 business solutions

Big Data Windows Clear Filters

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
1

SentimentAnalysis-Rick&Morty

Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. In this end-of-degree work, we analyze and classify the dialogue of characters in an English-language television series as "Rick and Morty" using Python. The objective is to identify and categorize the feelings and emotions expressed in the text, comparing the human perception of the characters' personalities with the results obtained using natural language processing techniques.

Downloads: 2 This Week

Last Update: 2023-07-12
See Project
2

GnuCopy

GnuCopy is an Open-Source tool to copy and archive all your important data. It supports all important archive typs like Zip and Tar to guaranty an easy and secure exchange between all types of operating systems. Additionally, you can create profiles to blacklist or whitelist specific file types or folders to seperate your big data stores for backups.

Downloads: 1 This Week

Last Update: 2023-07-28
See Project
3

NCHC-Storm

NCHC's Storm Team

Sharing the applications of storm which developed by NCHC's Storm Team.

Downloads: 1 This Week

Last Update: 2022-12-21
See Project
4

AXYZ

Newsfeed aggregator

The article explains the design and the characteristics of a new open source content aggregation system. Among the features of the program, stands out a new processing engine of syndication channels, monitoring capability of information recovery in real time, the possibilities of the configuration of aggregator behavior, automatic classification of contents and new models for representation of information from relational interactive maps. On the other hand, the aggregation program, which named AXYZ, is designed to manage thousands of syndication channels of RSS format. Furthermore it also provides statistics that can be used to study the production of any producer subject and the impact of the information that published in other sources. The result that has been obtained in the research, allow to create modules capable of compare the relationship between different news or information from different sources, their degree of influence and their detection by the patterns.

Downloads: 0 This Week

Last Update: 2017-01-26
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

An introduction to Data Analysis in R

A guide for learning the basic tools on data anaylisis with R

An Introduction to Data Analysis in R [Book] A guide for learning the basic tools on data anaylisis: process, visualize and learn from your data using R programming. This repository holds the necessary data sets for the book "An introduction to Data Analysis in R", to be published by Springer series Use R!. The book can be purchased in XXX. The book is meant as an introductory guide to manipulate data sets in the Big Data paradigm. One of the main goals of this book is to take the analyst from the very first moment when she/he contacts with data to the final conclusion and presentation of results of analysis. We take into account the variety of fields where data analysis occurs nowadays. We pay special attention to the different ways to obtain data and how to make it manageable before starting the analysis. The data analysis includes most of the basic visualization options and some advanced extra options. Finally, basic statistics is used to learn from the processed data.

Downloads: 0 This Week

Last Update: 2020-02-08
See Project
6

Apache Doris

MPP-based interactive SQL data warehousing for reporting and analysis

Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis. Make your data analysis easier! Support standard SQL language, compatible with MySQL protocol. The main advantages of Doris are the simplicity (of developing, deploying and using) and meeting many data serving requirements in a single system. Doris mainly integrates the technology of Google Mesa and Apache Impala, and it is based on a column-oriented storage engine and can communicate by MySQL client.

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
7

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the mapped file group contains all versions of a group of records.

Downloads: 0 This Week

Last Update: 2025-12-18
See Project
8

Apache RocketMQ

Distributed messaging and streaming platform with low latency

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability. Messaging patterns including publish/subscribe, request/reply and streaming. Financial grade transactional message. Built-in fault tolerance and high availability configuration options base on DLedger. A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built-in message tracing capability, also support opentracing. Versatile big-data and streaming ecosytem integration. Message retroactivity by time or offset. Reliable FIFO and strict ordered messaging in the same queue. Efficient pull and push consumption model. Million-level message accumulation capacity in a single queue. Multiple messaging protocols like JMS and OpenMessaging. Flexible distributed scale-out deployment architecture. Lightning-fast batch message exchange system.

Downloads: 0 This Week

Last Update: 2026-04-10
See Project
9

Arroyo

Distributed stream processing engine in Rust

Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

BEAR

CBR Meets Big Data

Case-based regression learner for big data. The package contains source and binary files for running BEAR's method. BEAR utilizes EAR4 and locality sensitive hashing in its implementation.

Downloads: 0 This Week

Last Update: 2015-08-11
See Project
11

BIRT iHub F-Type

Free report server for creating data-driven apps

Open Source BIRT (Business Intelligence and Reporting Tools) is a visual-based development tool used to create data visualizations and reports that can be embedded into rich client and web applications. BIRT is a top-level designer tool within the Eclipse Foundation, an independent not-for-profit open source community and consortium of software vendors. For the first time ever, Open Source developers using BIRT can now leverage technology previously reserved for commercial applications with BIRT iHub F-Type, a free enterprise-grade server for managing and delivering BIRT content to any number of users, while instantly enhancing Open Source reporting functionality and reducing deployment time down to minutes. With out-of-the-box Big Data support, BIRT iHub F-Type connects to all relational databases and allows unlimited data-in for creating data visualizations and dynamic report views.

Downloads: 0 This Week

Last Update: 2017-04-04
See Project
12

Big Sack

Big Sack: A lightweight Java Key/Value store with undo and disk cache.

Big Sack is a Java persistence mechanism that allows storage of key value pairs following the popular Big Data paradigms. Its a very simple and straightforward way to bridge the gap between in-memory data structures and long-term storage. It has the convenience of Java SDK TreeMap and TreeSet classes and is used the same easy way, but it includes rollback through undo logging to checkpoint data so it does not wind up in an unknown state regardless of failures. Data storage in the exabyte range is possible using filesystem and/or memory-mapped IO. Three levels of configurable write-through caching at different granularities ensure performance.

Downloads: 0 This Week

Last Update: 2013-12-21
See Project
13

Blue Whale Configuration Platform

Blue Whale smart cloud configuration platform

Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. Through the Blue Whale intelligent cloud system, it can help enterprises quickly realize the automation of basic operation and maintenance services, thereby accelerating the transformation of DevOps, realizing a tool culture, and maximizing operational efficiency.

Downloads: 0 This Week

Last Update: 2025-05-30
See Project
14

Chordalysis

Log-linear analysis (data modelling) for high-dimensional data

===== Project moved to https://github.com/fpetitjean/Chordalysis ===== Log-linear analysis is the statistical method used to capture multi-way relationships between variables. However, due to its exponential nature, previous approaches did not allow scale-up to more than a dozen variables. We present here Chordalysis, a log-linear analysis method for big data. Chordalysis exploits recent discoveries in graph theory by representing complex models as compositions of triangular structures, also known as chordal graphs. Chordalysis makes it possible to discover the structure of datasets with thousands of variables on a standard desktop computer. Associated papers at ICDM 2013, ICDM 2014 and SDM 2015 can be found at http://www.francois-petitjean.com/Research/ YourKit is supporting Chordalysis open source project with its full-featured Java Profiler. YourKit is the creator of innovative and intelligent tools for profiling Java and .NET applications. http://www.yourkit.com

Downloads: 0 This Week

Last Update: 2015-01-29
See Project
15

Cube Platform

Cube Platform is a decentralized grid computing system that uses P2P Pastry protocol for communication between nodes. It's a big data storage written in Java.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
16

Custom Apache Big data Distribution

A Custom Apache Distribution including Spark and Hadoop, for Windows.

This Distribution has been customized to work out of the box. So, just download it, and unzip it. Set the Path variables for bin folders, HADOOP_HOME, SPARK_HOME, and JAVA_HOME. That's it..! use Hadoop and Spark natively on Windows.

Downloads: 0 This Week

Last Update: 2020-03-11
See Project
17

ElasticJob

Distributed scheduled job framework

ElasticJob is a distributed scheduling solution consisting of two separate projects, ElasticJob-Lite and ElasticJob-Cloud. ElasticJob-Lite is a lightweight, decentralized solution that provides distributed task sharding services. ElasticJob-Cloud uses Mesos to manage and isolate resources. It uses a unified job API for each project. Developers only need code one time and can deploy at will. Support job sharding and high availability in distributed system. Scale out for throughput and efficiency improvement. Job processing capacity is flexible and scalable with the allocation of resources. Execute job on suitable time and assigned resources. Aggregation same job to same job executor. Append resources to newly assigned jobs dynamically. Using ElasticJob can make developers no longer worry about the non-functional requirements such as jobs scale out, so that they can focus more on business coding.

Downloads: 0 This Week

Last Update: 2026-01-31
See Project
18

Fluid

Fluid, elastic data abstraction and acceleration for BigData/AI apps

Fluid, elastic data abstraction and acceleration for BigData/AI applications in the cloud. Provide DataSet abstraction for underlying heterogeneous data sources with multidimensional management in a cloud environment. Enable dataset warmup and acceleration for data-intensive applications by using a distributed cache in Kubernetes with observability, portability, and scalability. Taking characteristics of application and data into consideration for cloud application/dataset scheduling to improve the performance.

Downloads: 0 This Week

Last Update: 2025-10-31
See Project
19

FrincBackup

Incremtal backup tool supporting removable storage devices

FrincBackup means free incremental backup. It is developed for backing up a x TB NAS with storage devices in a logical volume to multiple removable storage devices, such as 500 GB USB hard drives. Files are backuped as files (not as an archive) and are readable without the need of a tool and without the need of FrincBackup itself (allthough there is a restore mode for better handling).

Downloads: 0 This Week

Last Update: 2014-07-18
See Project
20

GOBIG

GOBIG is a toolbox that can be used for detecting genetic variations. The project is intended to handle big data. What's more important is that it be used to detect clusters of SNP variants. It is the intention to use the toolbox with common and rare variants. To use it, for example, to find the genetic map of genes causing complex diseases.

Downloads: 0 This Week

Last Update: 2015-09-10
See Project
21

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 0 This Week

Last Update: 2025-08-05
See Project
22

GridDB

GridDB is a next-generation open source database

A cyber-physical systems is a system that collects a variety of data in physical space (the real world), analyzes and converts it into knowledge in cyberspace, and feeds the knowledge back to the real world to revitalize industry and solve social problems. GridDB is an open database that enables real-time processing of vast amounts of time-series data in physical space, which is necessary to realize a cyber-physical system. Multi-model architecture capable of supporting various data stores with time-series data-oriented and pluggable data stores for efficient real-time processing and management of huge amounts of time-series data at high frequency. Various architectural innovations, such as in-memory orientation with "memory as the main unit and disk as the secondary unit" and event-driven design with minimal overhead, have been incorporated to achieve processing capabilities that can handle petabyte-scale applications.

Downloads: 0 This Week

Last Update: 2026-02-18
See Project
23

HugeGraph

A graph database that supports more than 100+ billion data

HugeGraph is a convenient, efficient, and adaptable graph database compatible with the Apache TinkerPop3 framework and the Gremlin query language. HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports Gremlin graph query language and RESTful API but also provides commonly used graph algorithm APIs. To help users easily implement various queries and analyses, HugeGraph has a full range of accessory tools, such as supporting distributed storage, data replication, scaling horizontally, and supports many built-in backends of storage engines.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
24

JuiceFS

JuiceFS is a distributed POSIX file system built on top of Redis

A POSIX, HDFS and S3 compatible distributed file system for cloud. JuiceFS is designed to bring back the gold-old memories and experience of file systems in local disks to the cloud. JuiceFS is POSIX compliant and is fully compatible with HDFS and S3. Cloud app building or migrating, file sharing cross-geo and cross-cloud has become easier than ever before. Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility, availability, scalability and strong consistency for your data-intensive applications. Purposely built to serve big data scenarios such as self-driving model training, recommendation engine, and Next-generation Gene Sequencing, JuiceFS specializes in high performance and easier management of tens of billion of files management. We bring JuiceFS to developers with the hope that it will be easy to use, reliable, high-performance, and solve all your file storage problems in a cloud environment.

Downloads: 0 This Week

Last Update: 22 hours ago
See Project
25

LEACrypt

TTAK.KO-12.0223 Lightweight Encryption Algorithm Tool

The Lightweight Encryption Algorithm (also known as LEA) is a 128-bit block cipher developed by South Korea in 2013 to provide confidentiality in high-speed environments such as big data and cloud computing, as well as lightweight environments such as IoT devices and mobile devices. LEA is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP) and is the national standard of Republic of Korea (KS X 3246). LEA is included in the ISO/IEC 29192-2:2019 standard (Information security - Lightweight cryptography - Part 2: Block ciphers). This project is licensed under the ISC License. Copyright © 2020-2021 ALBANESE Research Lab Source code: https://github.com/pedroalbanese/leacrypt Visit: http://albanese.atwebpages.com

Downloads: 0 This Week

Last Update: 2022-12-16
See Project