Big Data Tools for Linux

View 45 business solutions

Browse free open source Big Data tools and projects for Linux below. Use the toggles on the left to filter open source Big Data tools by OS, license, language, programming language, and project status.

  • Eptura Workplace Software Icon
    Eptura Workplace Software

    From desk booking and visitor management, to space planning and office utilization data, Eptura Workplace helps your entire organization work smarter.

    With the world of work changed forever, it’s essential to manage your workplace and assets together to effectively create a high-performing environment. The Eptura experience combines the power of workplace management software with asset management, enabling you to effectively operate your building and facilitate hybrid work.
  • Claims Processing solution for healthcare practitioners. Icon
    Claims Processing solution for healthcare practitioners.

    Very easy to use for medical, dental and therapy offices.

    Speedy Claims became the top CMS-1500 Software by providing the best customer service imaginable to our thousands of clients all over America. Medical billing isn't the kind of thing most people get excited about - it is just a tedious task you have to do. But while it will never be a fun task, it doesn't have to be as difficult or time consumimg as it is now. With Speedy Claims CMS-1500 software you can get the job done quickly and easily, allowing you to focus on the things you love about your job, like helping patients. With a simple interface, powerful features to eliminate repetitive work, and unrivaled customer support, it's simply the best HCFA 1500 software available on the market. A powerful built-in error checking helps ensure your HCFA 1500 form is complete and correctly filled out, preventing CMS-1500 claims from being denied.
  • 1
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building block for doing practical, real world data analysis in Python, as well as powerful and flexible open source data analysis/ manipulation tool for any language.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 2
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Leader badge
    Downloads: 86 This Week
    Last Update:
    See Project
  • 3
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Leader badge
    Downloads: 49 This Week
    Last Update:
    See Project
  • 4
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Automated quote and proposal software for IT solution providers. | ConnectWise CPQ Icon
    Automated quote and proposal software for IT solution providers. | ConnectWise CPQ

    Create IT quote templates, automate workflows, add integrations & price catalogs to save time & reduce errors on manual data entry & updates.

    ConnectWise CPQ, formerly ConnectWise Sell, is a professional quote and proposal automation software for IT solution providers. ConnectWise CPQ offers a wide range of tools that enables IT solution providers to save time, quote more, and win big. Top features include professional quote or proposal templates, product catalog and sourcing, workflow automation, sales reporting, and integrations with best-in-breed solutions like Cisco, Dell, HP, and Salesforce.
  • 5
    QuickRedis

    QuickRedis

    QuickRedis is a free forever redis gui tool

    QuickRedis is a free forever Redis Desktop manager. It supports direct connection, sentinel, and cluster mode, supports multiple languages, supports hundreds of millions of keys, and has an amazing UI. Supports both Windows, Mac OS X and Linux platform.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 6
    Apache RocketMQ

    Apache RocketMQ

    Distributed messaging and streaming platform with low latency

    Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability. Messaging patterns including publish/subscribe, request/reply and streaming. Financial grade transactional message. Built-in fault tolerance and high availability configuration options base on DLedger. A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built-in message tracing capability, also support opentracing. Versatile big-data and streaming ecosytem integration. Message retroactivity by time or offset. Reliable FIFO and strict ordered messaging in the same queue. Efficient pull and push consumption model. Million-level message accumulation capacity in a single queue. Multiple messaging protocols like JMS and OpenMessaging. Flexible distributed scale-out deployment architecture. Lightning-fast batch message exchange system.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    MyCAT

    MyCAT

    Active, high-performance open source database middleware

    MyCAT is an Open-Source software, “a large database cluster” oriented to enterprises. MyCAT is an enforced database which is a replacement for MySQL and supports transaction and ACID. Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT is combined with the traditional database and new distributed data warehouse. In a word, MyCAT is a fresh new middleware of database. MyCAT ’s objective is to smoothly migrate the current stand-alone database and applications to cloud side with low cost and to solve the bottleneck problem caused by the rapid growth of data storage and business scale.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    GridDB

    GridDB

    GridDB is a next-generation open source database

    A cyber-physical systems is a system that collects a variety of data in physical space (the real world), analyzes and converts it into knowledge in cyberspace, and feeds the knowledge back to the real world to revitalize industry and solve social problems. GridDB is an open database that enables real-time processing of vast amounts of time-series data in physical space, which is necessary to realize a cyber-physical system. Multi-model architecture capable of supporting various data stores with time-series data-oriented and pluggable data stores for efficient real-time processing and management of huge amounts of time-series data at high frequency. Various architectural innovations, such as in-memory orientation with "memory as the main unit and disk as the secondary unit" and event-driven design with minimal overhead, have been incorporated to achieve processing capabilities that can handle petabyte-scale applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9

    X10

    Performance and Productivity at Scale

    X10 is a class-based, strongly-typed, garbage-collected, object-oriented language. To support concurrency and distribution, X10 uses the Asynchronous Partitioned Global Address Space programming model (APGAS). This model introduces two key concepts -- places and asynchronous tasks -- and a few mechanisms for coordination. With these, APGAS can express both regular and irregular parallelism, message-passing-style and active-message-style computations, fork-join and bulk-synchronous parallelism. Both its modern, type-safe sequential core and simple programming model for concurrency and distribution contribute to making X10 a high-productivity language in the HPC and Big Data spaces. User productivity is further enhanced by providing tools such as an Eclipse-based IDE (X10DT). Implementations of X10 are available for a wide variety of hardware and software platforms ranging from laptops, to commodity clusters, to supercomputers.
    Leader badge
    Downloads: 32 This Week
    Last Update:
    See Project
  • Small Business HR Management Software Icon
    Small Business HR Management Software

    Get a unified timekeeping, scheduling, payroll, HR and benefits portal with WorkforceHub.

    WorkforceHub is the instantly useful, delightfully simple to use, small business solution for tracking time, scheduling and hiring. It scales as your business grows while delivering the mission-critical features an organization needs. It is tailored to, built for, and priced for small business employers.
  • 10
    Apache Doris

    Apache Doris

    MPP-based interactive SQL data warehousing for reporting and analysis

    Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis. Make your data analysis easier! Support standard SQL language, compatible with MySQL protocol. The main advantages of Doris are the simplicity (of developing, deploying and using) and meeting many data serving requirements in a single system. Doris mainly integrates the technology of Google Mesa and Apache Impala, and it is based on a column-oriented storage engine and can communicate by MySQL client.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Fluid

    Fluid

    Fluid, elastic data abstraction and acceleration for BigData/AI apps

    Fluid, elastic data abstraction and acceleration for BigData/AI applications in the cloud. Provide DataSet abstraction for underlying heterogeneous data sources with multidimensional management in a cloud environment. Enable dataset warmup and acceleration for data-intensive applications by using a distributed cache in Kubernetes with observability, portability, and scalability. Taking characteristics of application and data into consideration for cloud application/dataset scheduling to improve the performance.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Vaex

    Vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python

    Data science solutions, insights, dashboards, machine learning, deployment. We start at 100GB. Vaex is a high-performance Python library for lazy Out-of-Core data frames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). Cut development cut development time by 80%. Your prototype is your solution. Create automatic pipelines for any model.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    BIRT Report Designer

    BIRT Report Designer

    Open Source Reporting & Data Visualization Platform

    BIRT is an open source technology platform used to create data visualizations and reports that can be embedded into rich client and web applications. Developers who use BIRT Designer are able to access information from multiple data sources easily and quickly in order to create reports and applications with stunning data visualizations. Actuate now provides a free report server, BIRT iHub F-Type, to deploy BIRT content so developers don't have to build their own infrastructure. With a flexible Open Data Access framework, developers can write custom data drivers to access data from any source, including Big Data sources like Apache Hadoop, Cassandra, and MongoDB, along with all traditional relational databases, Flat Files, XML data streams, and data stored in proprietary systems. Built for embedding, BIRT includes APIs for data access, chart generation, output formats, content execution, and integration within larger applications.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    Universal Java Matrix Package

    Universal Java Matrix Package

    sparse and dense matrix, linear algebra, visualization, big data

    The Universal Java Matrix Package (UJMP) is an open source Java library which provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra such as matrix multiplication or matrix inverse. Operations such as mean, correlation, standard deviation, replacement of missing values or the calculation of mutual information are supported, too. The Universal Java Matrix Package provides various visualization methods, import and export filters for a large number of file formats, and even the possibility to link to JDBC databases. Multi-dimensional matrices as well as generic matrices with a specified object type are supported and very large matrices can be handled even when they do not fit into memory.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 16
    qvge

    qvge

    Qt Visual Graph Editor

    qvge is a multiplatform graph editor written in C++/Qt. Its main goal is to make possible visually edit two-dimensional graphs in a simple and intuitive way. Please note that qvge is not a replacement for such a software like Gephi, Graphvis, Dot, yEd, Dia and so on. It is neither a tool for "big data analysis" nor a math application. It is really just a simple graph editor :)
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    FastoRedis

    FastoRedis

    Cross-platform open source Redis DB management tool

    FastoRedis (fork of FastoNoSQL) — is a cross-platform open source Redis management tool (i.e. Admin GUI). It put the same engine that powers Redis's redis-cli shell. Everything you can write in redis-cli shell — you can write in FastoRedis! Our program works on the most amount of Linux systems, also on Windows, Mac OS X, FreeBSD and Android platforms, on desktops and embedded devices.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18

    Augustus

    PMML-compliant scoring engine and analytic toolkit

    Augustus development has moved to google code. The new project page is augustus.googlecode.com. New releases of the project are not currently being released to sourceforge. Augustus is designed for statistical and data mining models and produces and consumes models with 10,000s of segments. Versions of Augustus support PMML 3, 4.0.1, and 4.1.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    FastoNoSQL

    FastoNoSQL

    FastoNoSQL it is GUI platform for NoSQL databases.

    Gui managment admin tool for: Redis Memcached SSDB LevelDB RocksDB UnQLite LMDB UpscaleDB ForestDB
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20

    json-scada

    A portable SCADA/IoT platform centered on the MongoDB database server.

    Standard IT tools applied to SCADA/IoT (MongoDB, PostgreSQL/TimescaleDB,Node.js, C#, Golang, Grafana, etc.). MongoDB as the real-time core database, persistence layer, config store, SOE historian. Portability and interoperability over Linux, Windows, x86/64, ARM. Horizontal scalability, from a single computer to big clusters (MongoDB-sharding), Bare Metal, Docker containers, VM, cloud, or hybrid deployments. Unlimited tags, servers, and users. HTML5 Web interface. UTF-8/I18N. Protocols: IEC61850 Client, IEC60870-5-101/104 Client and Server, DNP3 Client, OPC-UA Client/Server, MQTT/Sparkplug-B, Telegraf (various data sources for monitoring like Modbus, SNMP, etc.) Github. project https://github.com/riclolsen/json-scada Requirements for Windows Installer: Windows 10/11 64 bits or Server 2016, Windows PowerShell.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21

    BEAR

    CBR Meets Big Data

    Case-based regression learner for big data. The package contains source and binary files for running BEAR's method. BEAR utilizes EAR4 and locality sensitive hashing in its implementation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22

    Big Sack

    Big Sack: A lightweight Java Key/Value store with undo and disk cache.

    Big Sack is a Java persistence mechanism that allows storage of key value pairs following the popular Big Data paradigms. Its a very simple and straightforward way to bridge the gap between in-memory data structures and long-term storage. It has the convenience of Java SDK TreeMap and TreeSet classes and is used the same easy way, but it includes rollback through undo logging to checkpoint data so it does not wind up in an unknown state regardless of failures. Data storage in the exabyte range is possible using filesystem and/or memory-mapped IO. Three levels of configurable write-through caching at different granularities ensure performance.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    wzd

    Powerful storage server, designed for big data storage systems

    wZD is a server written in Go language that uses a modified version of the BoltDB database as a backend for saving and distributing any number of small and large files, NoSQL keys/values, in a compact form inside micro Bolt databases (archives), with distribution of files and values in BoltDB databases depending on the number of directories or subdirectories and the general structure of the directories. Using wZD can permanently solve the problem of a large number of files on any POSIX compatible file system, including a clustered one. Outwardly it works like a regular WebDAV server.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer. .NET for Apache Spark runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework. It also runs on all major cloud providers including Azure HDInsight Spark, Amazon EMR Spark, AWS & Azure Databricks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    An introduction to Data Analysis in R

    A guide for learning the basic tools on data anaylisis with R

    An Introduction to Data Analysis in R [Book] A guide for learning the basic tools on data anaylisis: process, visualize and learn from your data using R programming. This repository holds the necessary data sets for the book "An introduction to Data Analysis in R", to be published by Springer series Use R!. The book can be purchased in XXX. The book is meant as an introductory guide to manipulate data sets in the Big Data paradigm. One of the main goals of this book is to take the analyst from the very first moment when she/he contacts with data to the final conclusion and presentation of results of analysis. We take into account the variety of fields where data analysis occurs nowadays. We pay special attention to the different ways to obtain data and how to make it manageable before starting the analysis. The data analysis includes most of the basic visualization options and some advanced extra options. Finally, basic statistics is used to learn from the processed data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next