Big Data Tools for Linux

View 44 business solutions

Browse free open source Big Data tools and projects for Linux below. Use the toggles on the left to filter open source Big Data tools by OS, license, language, programming language, and project status.

  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 1
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building block for doing practical, real world data analysis in Python, as well as powerful and flexible open source data analysis/ manipulation tool for any language.
    Downloads: 76 This Week
    Last Update:
    See Project
  • 2
    MyCAT

    MyCAT

    Active, high-performance open source database middleware

    MyCAT is an Open-Source software, “a large database cluster” oriented to enterprises. MyCAT is an enforced database which is a replacement for MySQL and supports transaction and ACID. Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT is combined with the traditional database and new distributed data warehouse. In a word, MyCAT is a fresh new middleware of database. MyCAT ’s objective is to smoothly migrate the current stand-alone database and applications to cloud side with low cost and to solve the bottleneck problem caused by the rapid growth of data storage and business scale.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 4
    XCharts

    XCharts

    A charting and data visualization library for Unity

    A charting and data visualization library for Unity. Unity data visualization chart plugin. A UGUIpowerful, easy-to-use, parameter-configurable data visualization chart plug-in. It supports ten built-in charts. A powerful, easy-to-use, configurable charting and data visualization library for Unity. Visual configuration of parameters, real-time preview of effects, and pure code drawing without additional resources. Support ten built-in charts such as line chart, column chart, pie chart, radar chart, scatter chart, heat map, ring chart, candlestick chart, polar coordinate, parallel coordinate and so on. Supports 3D column charts, funnel charts, pyramids, dashboards, water level charts, pictographic column charts, Gantt charts, rectangular tree charts and other extended charts. Line graphs such as line graphs, curve graphs, area graphs, and stepped line graphs are supported.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    QuickRedis

    QuickRedis

    QuickRedis is a free forever redis gui tool

    QuickRedis is a free forever Redis Desktop manager. It supports direct connection, sentinel, and cluster mode, supports multiple languages, supports hundreds of millions of keys, and has an amazing UI. Supports both Windows, Mac OS X and Linux platform.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 6
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer. .NET for Apache Spark runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework. It also runs on all major cloud providers including Azure HDInsight Spark, Amazon EMR Spark, AWS & Azure Databricks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach. You can even combine both approaches efficiently in the same query, something no other engine can do. Recommendation, personalization and targeting involves evaluating recommender models over content items to select the best ones. Vespa lets you build applications which does this online, typically combining fast vector search and filtering with evaluation of machine-learned models over the items. This makes it possible to make recommendations specifically for each user or situation, using completely up to date information.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state, delete a cell and marimo deletes its variables while updating affected cells.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Apache Doris

    Apache Doris

    MPP-based interactive SQL data warehousing for reporting and analysis

    Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis. Make your data analysis easier! Support standard SQL language, compatible with MySQL protocol. The main advantages of Doris are the simplicity (of developing, deploying and using) and meeting many data serving requirements in a single system. Doris mainly integrates the technology of Google Mesa and Apache Impala, and it is based on a column-oriented storage engine and can communicate by MySQL client.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the mapped file group contains all versions of a group of records.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. Through the Blue Whale intelligent cloud system, it can help enterprises quickly realize the automation of basic operation and maintenance services, thereby accelerating the transformation of DevOps, realizing a tool culture, and maximizing operational efficiency.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Parkiet

    Parkiet

    Parquet format file GUI editor

    Parquet file viewer and editor written in Java and SWT. It uses Apache Avro library for reading and writing edited parquet files. Only Parquet files with simple data type columns are supported.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    qvge

    qvge

    Qt Visual Graph Editor

    qvge is a multiplatform graph editor written in C++/Qt. Its main goal is to make possible visually edit two-dimensional graphs in a simple and intuitive way. Please note that qvge is not a replacement for such a software like Gephi, Graphvis, Dot, yEd, Dia and so on. It is neither a tool for "big data analysis" nor a math application. It is really just a simple graph editor :)
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    FastoNoSQL

    FastoNoSQL

    FastoNoSQL it is GUI platform for NoSQL databases.

    Gui managment admin tool for: Redis Memcached SSDB LevelDB RocksDB UnQLite LMDB UpscaleDB ForestDB
    Downloads: 13 This Week
    Last Update:
    See Project
  • 16

    json-scada

    A portable SCADA/IoT platform centered on the MongoDB database server.

    Standard IT tools applied to SCADA/IoT (MongoDB, PostgreSQL/TimescaleDB,Node.js, C#, Golang, Grafana, etc.). MongoDB as the real-time core database, persistence layer, config store, SOE historian. Portability and interoperability over Linux, Windows, x86/64, ARM. Horizontal scalability, from a single computer to big clusters (MongoDB-sharding), Bare Metal, Docker containers, VM, cloud, or hybrid deployments. Unlimited tags, servers, and users. HTML5 Web interface. UTF-8/I18N. Protocols: IEC61850 Client, IEC60870-5-101/104 Client and Server, DNP3 Client, OPC-UA Client/Server, MQTT/Sparkplug-B, Telegraf (various data sources for monitoring like Modbus, SNMP, etc.) Github. project https://github.com/riclolsen/json-scada Requirements for Windows Installer: Windows 10/11 64 bits or Server 2016, Windows PowerShell.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    BIRT Report Designer

    BIRT Report Designer

    Open Source Reporting & Data Visualization Platform

    BIRT is an open source technology platform used to create data visualizations and reports that can be embedded into rich client and web applications. Developers who use BIRT Designer are able to access information from multiple data sources easily and quickly in order to create reports and applications with stunning data visualizations. Actuate now provides a free report server, BIRT iHub F-Type, to deploy BIRT content so developers don't have to build their own infrastructure. With a flexible Open Data Access framework, developers can write custom data drivers to access data from any source, including Big Data sources like Apache Hadoop, Cassandra, and MongoDB, along with all traditional relational databases, Flat Files, XML data streams, and data stored in proprietary systems. Built for embedding, BIRT includes APIs for data access, chart generation, output formats, content execution, and integration within larger applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18

    X10

    Performance and Productivity at Scale

    X10 is a class-based, strongly-typed, garbage-collected, object-oriented language. To support concurrency and distribution, X10 uses the Asynchronous Partitioned Global Address Space programming model (APGAS). This model introduces two key concepts -- places and asynchronous tasks -- and a few mechanisms for coordination. With these, APGAS can express both regular and irregular parallelism, message-passing-style and active-message-style computations, fork-join and bulk-synchronous parallelism. Both its modern, type-safe sequential core and simple programming model for concurrency and distribution contribute to making X10 a high-productivity language in the HPC and Big Data spaces. User productivity is further enhanced by providing tools such as an Eclipse-based IDE (X10DT). Implementations of X10 are available for a wide variety of hardware and software platforms ranging from laptops, to commodity clusters, to supercomputers.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19

    BEAR

    CBR Meets Big Data

    Case-based regression learner for big data. The package contains source and binary files for running BEAR's method. BEAR utilizes EAR4 and locality sensitive hashing in its implementation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    GnuCopy
    GnuCopy is an Open-Source tool to copy and archive all your important data. It supports all important archive typs like Zip and Tar to guaranty an easy and secure exchange between all types of operating systems. Additionally, you can create profiles to blacklist or whitelist specific file types or folders to seperate your big data stores for backups.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21

    LogicalSets

    Integrated Comprehensive Data Architecture & Methodology

    This is an advanced data architecture and methodology. A comprehensive Enterprise Resource Management System. A re-usable database with rules for customization, While being a data driven transaction processing engine, this system has very advanced reporting capabilities. This design eliminates up to 90% of business logic due to the way the data is structured. Uses a concept called Table Sets. Has a compound key that tells the programmer what tableset, which record which applet will view/edit the data. Developed in SAP PowerDesigner, for (Sybase) SQL Anywhere. Don't let the date fool you, this system is ahead of its time.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    An introduction to Data Analysis in R

    A guide for learning the basic tools on data anaylisis with R

    An Introduction to Data Analysis in R [Book] A guide for learning the basic tools on data anaylisis: process, visualize and learn from your data using R programming. This repository holds the necessary data sets for the book "An introduction to Data Analysis in R", to be published by Springer series Use R!. The book can be purchased in XXX. The book is meant as an introductory guide to manipulate data sets in the Big Data paradigm. One of the main goals of this book is to take the analyst from the very first moment when she/he contacts with data to the final conclusion and presentation of results of analysis. We take into account the variety of fields where data analysis occurs nowadays. We pay special attention to the different ways to obtain data and how to make it manageable before starting the analysis. The data analysis includes most of the basic visualization options and some advanced extra options. Finally, basic statistics is used to learn from the processed data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data. InLong (应龙) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams. InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Auth0 Logo