Alternatives to Apache Phoenix
Compare Apache Phoenix alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Phoenix in 2026. Compare features, ratings, user reviews, pricing, and more from Apache Phoenix competitors and alternatives in order to make an informed decision for your business.
-
1
Teradata VantageCloud
Teradata
Teradata VantageCloud: The complete cloud analytics and data platform for AI. Teradata VantageCloud is an enterprise-grade, cloud-native data and analytics platform that unifies data management, advanced analytics, and AI/ML capabilities in a single environment. Designed for scalability and flexibility, VantageCloud supports multi-cloud and hybrid deployments, enabling organizations to manage structured and semi-structured data across AWS, Azure, Google Cloud, and on-premises systems. It offers full ANSI SQL support, integrates with open-source tools like Python and R, and provides built-in governance for secure, trusted AI. VantageCloud empowers users to run complex queries, build data pipelines, and operationalize machine learning models—all while maintaining interoperability with modern data ecosystems. -
2
RaimaDB
Raima
RaimaDB is an embedded time series database for IoT and Edge devices that can run in-memory. It is an extremely powerful, lightweight and secure RDBMS. Field tested by over 20 000 developers worldwide and has more than 25 000 000 deployments. RaimaDB is a high-performance, cross-platform embedded database designed for mission-critical applications, particularly in the Internet of Things (IoT) and edge computing markets. It offers a small footprint, making it suitable for resource-constrained environments, and supports both in-memory and persistent storage configurations. RaimaDB provides developers with multiple data modeling options, including traditional relational models and direct relationships through network model sets. It ensures data integrity with ACID-compliant transactions and supports various indexing methods such as B+Tree, Hash Table, R-Tree, and AVL-Tree. -
3
InterBase
Embarcadero
Ultrafast, scalable, embeddable SQL database with commercial-grade data security, disaster recovery, and change synchronization. Cross-platform, zero-install, embedded database as a direct-access library. Cross-platform, zero-install, embedded database with database-level and column-level AES and DES encryption. Concurrent applications/client access to the database on Windows with database-level and column-level AES and DES encryption. Ultrafast, scalable, SQL server database for Windows and Linux with commercial-grade data security, disaster recovery and change synchronization. Attacks on databases and loss of data can be costly and lead to loss of customers’ trust (and business), regulatory action, and heavy fines. InterBase provides over-the-wire and at-rest encryption, separate security login, and role-based user security. InterBase maintains full on-disk encryption while adding negligible overhead to database speed and performance. -
4
CUBRID
CUBRID
CUBRID is a relational DBMS optimized for online transaction processing (OLTP) that complies with ANSI SQL standards and provides MVCC support, High-Availability (HA) capabilities, and GUI-based tools for DB management/migration. It also provides Oracle/MySQL compatibility and supports a variety of interfaces, including JDBC. CUBRID provides ease of installation and native GUI-based administration tools for developers' convenience. Multi-threaded, multi-server architecture, native broker middleware, cost-based optimizer, and intensive caching techniques for your OLTP services. Very accurate predictable automatic fail-over built-in technology, based on the CUBRID Heartbeat native engine core. Multi-volume support, automatic volume expansion, and unlimited number and size of databases/ tables/indexes.Starting Price: $0.01/one-time/user -
5
Apache Trafodion
Apache Software Foundation
Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop. Full-functioned ANSI SQL language support. JDBC/ODBC connectivity for Linux/Windows clients. Distributed ACID transaction protection across multiple statements, tables, and rows. Performance improvements for OLTP workloads with compile-time and run-time optimizations. Support for large data sets using a parallel-aware query optimizer. Reuse existing SQL skills and improve developer productivity. Distributed ACID transactions guarantee data consistency across multiple rows and tables. Interoperability with existing tools and applications. Hadoop and Linux distribution neutral. Easy to add to your existing Hadoop infrastructure.Starting Price: Free -
6
Apache Spark
Apache Software Foundation
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources. -
7
HugeGraph
HugeGraph
HugeGraph is a fast-speed and highly-scalable graph database. Billions of vertices and edges can be easily stored into and queried from HugeGraph due to its excellent OLTP ability. As compliance to Apache TinkerPop 3 framework, various complicated graph queries can be accomplished through Gremlin (a powerful graph traversal language). Among its features, it provides compliance to Apache TinkerPop 3, supporting Gremlin. Schema Metadata Management, including VertexLabel, EdgeLabel, PropertyKey and IndexLabel. Multi-type Indexes, supporting exact query, range query and complex conditions combination query. Plug-in Backend Store Driver Framework, supporting RocksDB, Cassandra, ScyllaDB, HBase and MySQL now and easy to add other backend store driver if needed. Integration with Hadoop/Spark. HugeGraph relies on the TinkerPop framework, we refer to the storage structure of Titan and the schema definition of DataStax. -
8
Apache Derby
Apache
Apache Derby, an Apache DB subproject, is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0. Derby has a small footprint - about 3.5 megabytes for the base engine and embedded JDBC driver. Derby provides an embedded JDBC driver that lets you embed Derby in any Java-based solution. Derby also supports the more familiar client/server mode with the Derby Network Client JDBC driver and Derby Network Server. -
9
MLlib
Apache Software Foundation
Apache Spark's MLlib is a scalable machine learning library that integrates seamlessly with Spark's APIs, supporting Java, Scala, Python, and R. It offers a comprehensive suite of algorithms and utilities, including classification, regression, clustering, collaborative filtering, and tools for constructing machine learning pipelines. MLlib's high-quality algorithms leverage Spark's iterative computation capabilities, delivering performance up to 100 times faster than traditional MapReduce implementations. It is designed to operate across diverse environments, running on Hadoop, Apache Mesos, Kubernetes, standalone clusters, or in the cloud, and accessing various data sources such as HDFS, HBase, and local files. This flexibility makes MLlib a robust solution for scalable and efficient machine learning tasks within the Apache Spark ecosystem. -
10
Apache Bigtop
Apache Software Foundation
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark. Bigtop packages Hadoop RPMs and DEBs, so that you can manage and maintain your Hadoop cluster. Bigtop provides an integrated smoke testing framework, alongside a suite of over 50 test files. Bigtop provides vagrant recipes, raw images, and (work-in-progress) docker recipes for deploying Hadoop from zero. Bigtop support many Operating Systems, including Debian, Ubuntu, CentOS, Fedora, openSUSE and many others. Bigtop includes tools and a framework for testing at various levels (packaging, platform, runtime, etc.) for both initial deployments as well as upgrade scenarios for the entire data platform, not just the individual components. -
11
H2
H2
Welcome to H2, the Java SQL database. In embedded mode, an application opens a database from within the same JVM using JDBC. This is the fastest and easiest connection mode. The disadvantage is that a database may only be open in one virtual machine (and class loader) at any time. As in all modes, both persistent and in-memory databases are supported. There is no limit on the number of database open concurrently, or on the number of open connections. The mixed mode is a combination of the embedded and the server mode. The first application that connects to a database does that in embedded mode, but also starts a server so that other applications (running in different processes or virtual machines) can concurrently access the same data. The local connections are as fast as if the database is used in just the embedded mode, while the remote connections are a bit slower. -
12
Apache Hive
Apache Software Foundation
The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Apache Hive is an open source project run by volunteers at the Apache Software Foundation. Previously it was a subproject of Apache® Hadoop®, but has now graduated to become a top-level project of its own. We encourage you to learn about the project and contribute your expertise. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. -
13
Apache HBase
The Apache Software Foundation
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Automatic failover support between RegionServers. Easy to use Java API for client access. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. -
14
VoltDB
VoltDB
Volt Active Data is a data platform built to make your entire tech stack leaner, faster, and less expensive, so that your applications (and your company) can scale seamlessly to meet the ultra-low latency SLAs of 5G, IoT, edge computing, and whatever comes next. Designed to augment your existing big data investments, such as NoSQL, Hadoop, Kubernetes, Kafka, and traditional databases or data warehouses, Volt Active Data replaces the various layers typically required to make contextual decisions on streaming data with a single, unified layer that can handle ingest to action in less than 10 milliseconds. The world is full of data that’s generated, stored, forgotten, and then deleted. “Active Data” is data that needs to be acted on immediately to gain business value from it. There are lots of traditional and NoSQL data storage products that you can use to keep such data. There’s also data that you can make money from, if only you can act on it fast enough to ‘influence the moment’. -
15
Amazon EMR
Amazon
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. With EMR you can run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. For short-running jobs, you can spin up and spin down clusters and pay per second for the instances used. For long-running workloads, you can create highly available clusters that automatically scale to meet demand. If you have existing on-premises deployments of open-source tools such as Apache Spark and Apache Hive, you can also run EMR clusters on AWS Outposts. Analyze data using open-source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet. Connect to Amazon SageMaker Studio for large-scale model training, analysis, and reporting. -
16
Apache Sentry
Apache Software Foundation
Apache Sentry™ is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster. Apache Sentry has successfully graduated from the Incubator in March of 2016 and is now a Top-Level Apache project. Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (limited to Hive table data). Sentry is designed to be a pluggable authorization engine for Hadoop components. It allows you to define authorization rules to validate a user or application’s access requests for Hadoop resources. Sentry is highly modular and can support authorization for a wide variety of data models in Hadoop. -
17
Google Cloud Bigtable
Google
Google Cloud Bigtable is a fully managed, scalable NoSQL database service for large analytical and operational workloads. Fast and performant: Use Cloud Bigtable as the storage engine that grows with you from your first gigabyte to petabyte-scale for low-latency applications as well as high-throughput data processing and analytics. Seamless scaling and replication: Start with a single node per cluster, and seamlessly scale to hundreds of nodes dynamically supporting peak demand. Replication also adds high availability and workload isolation for live serving apps. Simple and integrated: Fully managed service that integrates easily with big data tools like Hadoop, Dataflow, and Dataproc. Plus, support for the open source HBase API standard makes it easy for development teams to get started. -
18
Empress RDBMS
Empress Software
Empress Embedded Database engine is the heartbeat of EMPRESS RDBMS, a relational database management system specializing in embedded database technology – from car navigation systems to mission critical military command and control, from Internet routers to complex medical systems, EMPRESS beats steadily, 24/7 at the core of embedded systems applications everywhere. Empress kernel level mr API is a unique feature of Empress that gives users access to the Embedded Database kernel libraries. This Empress API provides the fastest means of accessing Empress databases. MR Routines give the developer maximum control over time and space in developing real-time embedded database applications. Empress ODBC and JDBC APIs applications to access Empress databases in both standalone and client/server mode. Empress ODBC and JDBC APIs enable many 3rd party ODBC and JDBC capable software packages to access a local Empress database or via Empress Connectivity Server. -
19
E-MapReduce
Alibaba
EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm. Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS). You can quickly create clusters without the need to configure hardware and software. All maintenance operations are completed on its Web interface. -
20
Oracle Big Data SQL Cloud Service enables organizations to immediately analyze data across Apache Hadoop, NoSQL and Oracle Database leveraging their existing SQL skills, security policies and applications with extreme performance. From simplifying data science efforts to unlocking data lakes, Big Data SQL makes the benefits of Big Data available to the largest group of end users possible. Big Data SQL gives users a single location to catalog and secure data in Hadoop and NoSQL systems, Oracle Database. Seamless metadata integration and queries which join data from Oracle Database with data from Hadoop and NoSQL databases. Utilities and conversion routines support automatic mappings from metadata stored in HCatalog (or the Hive Metastore) to Oracle Tables. Enhanced access parameters give administrators the flexibility to control column mapping and data access behavior. Multiple cluster support enables one Oracle Database to query multiple Hadoop clusters and/or NoSQL systems.
-
21
SAP HANA
SAP
SAP HANA in-memory database is for transactional and analytical workloads with any data type — on a single data copy. It breaks down the transactional and analytical silos in organizations, for quick decision-making, on premise and in the cloud. Innovate without boundaries on a database management system, where you can develop intelligent and live solutions for quick decision-making on a single data copy. And with advanced analytics, you can support next-generation transactional processing. Build data solutions with cloud-native scalability, speed, and performance. With the SAP HANA Cloud database, you can gain trusted, business-ready information from a single solution, while enabling security, privacy, and anonymization with proven enterprise reliability. An intelligent enterprise runs on insight from data – and more than ever, this insight must be delivered in real time. -
22
Firebird
Firebird Foundation
Firebird is a relational database offering many ANSI SQL standard features that runs on Linux, Windows, and a variety of Unix platforms. Firebird offers excellent concurrency, high performance, and powerful language support for stored procedures and triggers. It has been used in production systems, under a variety of names, since 1981. The Firebird Project is a commercially independent project of C and C++ programmers, technical advisors and supporters developing and enhancing a multi-platform relational database management system based on the source code released by Inprise Corp (now known as Borland Software Corp) on 25 July, 2000. The Firebird Project supplies users, developers, and administrators with various kinds of documentation, from Quick Start guides to expert-level articles devoted to various aspects of Firebird. -
23
MySQL
Oracle
MySQL is the world's most popular open source database. With its proven performance, reliability, and ease-of-use, MySQL has become the leading database choice for web-based applications, used by high profile web properties including Facebook, Twitter, YouTube, and all five of the top five websites*. Additionally, it is an extremely popular choice as embedded database, distributed by thousands of ISVs and OEMs.Starting Price: Free -
24
Apache Kylin
Apache Software Foundation
Apache Kylin™ is an open source, distributed Analytical Data Warehouse for Big Data; it was designed to provide OLAP (Online Analytical Processing) capability in the big data era. By renovating the multi-dimensional cube and precalculation technology on Hadoop and Spark, Kylin is able to achieve near constant query speed regardless of the ever-growing data volume. Reducing query latency from minutes to sub-second, Kylin brings online analytics back to big data. Kylin can analyze 10+ billions of rows in less than a second. No more waiting on reports for critical decisions. Kylin connects data on Hadoop to BI tools like Tableau, PowerBI/Excel, MSTR, QlikSense, Hue and SuperSet, making the BI on Hadoop faster than ever. As an Analytical Data Warehouse, Kylin offers ANSI SQL on Hadoop/Spark and supports most ANSI SQL query functions. Kylin can support thousands of interactive queries at the same time, thanks to the low resource consumption of each query. -
25
Apache PredictionIO
Apache
Apache PredictionIO® is an open-source machine learning server built on top of a state-of-the-art open-source stack for developers and data scientists to create predictive engines for any machine learning task. It lets you quickly build and deploy an engine as a web service on production with customizable templates. Respond to dynamic queries in real-time once deployed as a web service, evaluate and tune multiple engine variants systematically, and unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics. Speed up machine learning modeling with systematic processes and pre-built evaluation measures, support machine learning and data processing libraries such as Spark MLLib and OpenNLP. Implement your own machine learning models and seamlessly incorporate them into your engine. Simplify data infrastructure management. Apache PredictionIO® can be installed as a full machine learning stack, bundled with Apache Spark, MLlib, HBase, Akka HTTP, etc.Starting Price: Free -
26
Apache Impala
Apache
Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment, with no redundant infrastructure or data conversion/duplication. For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about reinventing the implementation wheel. With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata stored from source through analysis.Starting Price: Free -
27
JProfiler
ej-technologies GmbH
When you profile, you need the most powerful tool you can get. At the same time, you do not want to spend time learning how to use the tool. JProfiler is just that: simple and powerful at the same time. Configuring sessions is straight-forward, third party integrations make getting started a breeze and profiling data is presented in a natural way. On all levels, JProfiler has been carefully designed to help you get started with solving your problems. Database calls are the top reasons for performance problems in business applications. JProfiler's JDBC and JPA/Hibernate probes as well as the NoSQL probes for MongoDB, Cassandra and HBase show the reasons for slow database access and how slow statements are called by your code. From the JDBC timeline view that shows you all JDBC connections with their activities, through the hot spots view that shows you slow statements to various telemetry views and a list of single events. -
28
Apache Ranger
The Apache Software Foundation
Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. Centralized security administration to manage all security related tasks in a central UI or using REST APIs. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool. Standardize authorization method across all Hadoop components. Enhanced support for different authorization methods - Role based access control etc. -
29
Apache Mahout
Apache Software Foundation
Apache Mahout is a powerful, scalable, and versatile machine learning library designed for distributed data processing. It offers a comprehensive set of algorithms for various tasks, including classification, clustering, recommendation, and pattern mining. Built on top of the Apache Hadoop ecosystem, Mahout leverages MapReduce and Spark to enable data processing on large-scale datasets. Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end or can be extended to other distributed backends. Matrix computations are a fundamental part of many scientific and engineering applications, including machine learning, computer vision, and data analysis. Apache Mahout is designed to handle large-scale data processing by leveraging the power of Hadoop and Spark. -
30
Azure HDInsight
Microsoft
Run popular open-source frameworks—including Apache Hadoop, Spark, Hive, Kafka, and more—using Azure HDInsight, a customizable, enterprise-grade service for open-source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure. Easily migrate your big data workloads and processing to the cloud. Open-source projects and clusters are easy to spin up quickly without the need to install hardware or manage infrastructure. Big data clusters reduce costs through autoscaling and pricing tiers that allow you to pay for only what you use. Enterprise-grade security and industry-leading compliance with more than 30 certifications helps protect your data. Optimized components for open-source technologies such as Hadoop and Spark keep you up to date. -
31
SingleStore
SingleStore
SingleStore (formerly MemSQL) is a distributed, highly-scalable SQL database that can run anywhere. We deliver maximum performance for transactional and analytical workloads with familiar relational models. SingleStore is a scalable SQL database that ingests data continuously to perform operational analytics for the front lines of your business. Ingest millions of events per second with ACID transactions while simultaneously analyzing billions of rows of data in relational SQL, JSON, geospatial, and full-text search formats. SingleStore delivers ultimate data ingestion performance at scale and supports built in batch loading and real time data pipelines. SingleStore lets you achieve ultra fast query response across both live and historical data using familiar ANSI SQL. Perform ad hoc analysis with business intelligence tools, run machine learning algorithms for real-time scoring, perform geoanalytic queries in real time.Starting Price: $0.69 per hour -
32
Hadoop
Apache Software Foundation
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page. Apache Hadoop 3.3.4 incorporates a number of significant enhancements over the previous major release line (hadoop-3.2). -
33
IBM Analytics Engine provides an architecture for Hadoop clusters that decouples the compute and storage tiers. Instead of a permanent cluster formed of dual-purpose nodes, the Analytics Engine allows users to store data in an object storage layer such as IBM Cloud Object Storage and spins up clusters of computing notes when needed. Separating compute from storage helps to transform the flexibility, scalability and maintainability of big data analytics platforms. Build on an ODPi compliant stack with pioneering data science tools with the broader Apache Hadoop and Apache Spark ecosystem. Define clusters based on your application's requirements. Choose the appropriate software pack, version, and size of the cluster. Use as long as required and delete as soon as an application finishes jobs. Configure clusters with third-party analytics libraries and packages. Deploy workloads from IBM Cloud services like machine learning.Starting Price: $0.014 per hour
-
34
JanusGraph
JanusGraph
JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a project under The Linux Foundation, and includes participants from Expero, Google, GRAKN.AI, Hortonworks, IBM and Amazon. Elastic and linear scalability for a growing data and user base. Data distribution and replication for performance and fault tolerance. Multi-datacenter high availability and hot backups. All functionality is totally free. No need to buy commercial licenses. JanusGraph is fully open source under the Apache 2 license. JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. Support for ACID and eventual consistency. In addition to online transactional processing (OLTP), JanusGraph supports global graph analytics (OLAP) with its Apache Spark integration. -
35
Deeplearning4j
Deeplearning4j
DL4J takes advantage of the latest distributed computing frameworks including Apache Spark and Hadoop to accelerate training. On multi-GPUs, it is equal to Caffe in performance. The libraries are completely open-source, Apache 2.0, and maintained by the developer community and Konduit team. Deeplearning4j is written in Java and is compatible with any JVM language, such as Scala, Clojure, or Kotlin. The underlying computations are written in C, C++, and Cuda. Keras will serve as the Python API. Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs. There are a lot of parameters to adjust when you're training a deep-learning network. We've done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala, Clojure, and Kotlin programmers. -
36
ZetaAnalytics
Halliburton
The ZetaAnalytics product requires a compatible database appliance for its Data Warehouse. Landmark has qualified the ZetaAnalytics software using Teradata, EMC Greenplum, and IBM Netezza. Please see the ZetaAnalytics Release Notes for the most up to date qualified versions. Before installing and configuring ZetaAnalytics software, ensure that the Data Warehouse you use for drilling data is created and running. Scripts to create the various Zeta-specific database components within the Data Warehouse will need to be run as part of the installation process. These require database administrator (DBA) rights. The ZetaAnalytics product requires Apache Hadoop for model scoring and real-time streaming. If you do not already have an Apache Hadoop cluster installed in your environment, please install it before running the ZetaAnalytics installer, which will prompt you for the name and port number of your Hadoop Name Server and Map Reducer. -
37
CockroachDB
Cockroach Labs
CockroachDB: Cloud-native, distributed SQL. Your cloud applications deserve a cloud-native database. Cloud-based apps and services deserve a database that scales across clouds, eases operational complexity, and improves reliability. CockroachDB delivers resilient, distributed SQL with ACID transactions and data partitioned by location. Automate operations for mission-critical applications by pairing CockroachDB with orchestration tools like Kubernetes and Mesosphere DC/OS. Every node can service both reads and writes so that you can scale query throughput and database capacity by simply adding more endpoints. Just add new nodes to CockroachDB, and it automatically rebalances data, completely removing the pain of manual sharding. As demand shifts, CockroachDB detects hotspots and intelligently distributes data to maintain performance. Tune your database at the row level so that data lives close to your users and you can minimize query latency. -
38
Valentina Studio
Paradigma Software
Create, administer, query and explore Valentina DB, MySQL, MariaDB, PostgreSQL and SQLite databases for FREE. Design business reports running in Valentina Studio Pro, on Valentina Server or in an application with an Application Developer Kit. Backward Engineering in Standard with forwarding Engineering in Valentina Studio Pro. Reverse engineering and create diagrams for existing databases. Add new objects to diagrams. Write SQL queries with auto-completion, color syntax. Define, manage, save favorite queries; access recent queries. Function browser dictionary of each function. Consoles for errors, warnings, and performance. Search, Export result records into CSV, JSON, Excel. Edit properties of multiple objects at the same time. Drill down to tables and fields; incredible fast searching. Reverse engineering and create diagrams for existing databases. Add new objects to diagrams. Add/drop users, and groups, and manage privileges.Starting Price: $79 -
39
BigBI
BigBI
BigBI enables data specialists to build their own powerful big data pipelines interactively & efficiently, without any coding! BigBI unleashes the power of Apache Spark enabling: Scalable processing of real Big Data (up to 100X faster) Integration of traditional data (SQL, batch files) with modern data sources including semi-structured (JSON, NoSQL DBs, Elastic, Hadoop), and unstructured (Text, Audio, video), Integration of streaming data, cloud data, AI/ML & graphs -
40
Actian Zen
Actian
Actian Zen is an embedded, high-performance, and low-maintenance database management system designed for edge applications, mobile devices, and IoT environments. It offers a seamless integration of SQL and NoSQL data models, providing flexibility for developers working with structured and unstructured data. Actian Zen is known for its small footprint, scalability, and high reliability, making it ideal for resource-constrained environments where consistent performance and minimal administrative overhead are essential. With built-in security features and a self-tuning architecture, it supports real-time data processing and analytics without the need for constant monitoring or maintenance. Actian Zen is widely used in industries like healthcare, retail, and manufacturing, where edge computing and distributed data environments are critical for business operations. -
41
NuoDB
NuoDB
The world is moving to distributed applications and architectures, and your database should too. Learn how you can deploy where you want, when you want, and how you want with a distributed SQL database. Migrate existing SQL applications to a distributed, multi-node architecture that can dynamically scale out and in. Our Transaction Engines (TEs) and Storage Managers (SMs) work together to ensure ACID compliance across multiple nodes. Deploy in a distributed architecture. When you deploy your database with multiple nodes, the loss of one or multiple nodes will not result in the loss of database access. Deploy TEs and SMs to meet your variable workload needs, or deploy in the different environments the teams in your organization uses: in private and public clouds, in hybrid environments, and across clouds. -
42
RoyalCyber eCatalyst
RoyalCyber
Ecatalyst is a proprietary product which can be integrated with ecommerce software’s like Hybris, Magento etc., and uses events generated by the sites to provide predictions such as personalized predictions, Similar Product predictions, Complementary predictions, Contextual predictions to the users. A proprietary decision making engine that provides predictions and suggestions about the products based on the event traffic of the products in a website. Uses advanced statistical and machine learning techniques for predictions and suggestions. It’s built on top of Big Data architecture and uses HBase and Apache Spark. Highly customizable and provides Intelligent and personalized recommendations to the customers. Captures all the events in real time and provides contextual recommendations to the user. Built on top of Big Data architecture providing great scalability without compromising performance. -
43
HyperSQL DataBase
The hsql Development Group
HSQLDB (HyperSQL DataBase) is the leading SQL relational database system written in Java. It offers a small, fast multithreaded and transactional database engine with in-memory and disk-based tables and supports embedded and server modes. It includes a powerful command line SQL tool and simple GUI query tools. HSQLDB supports the widest range of SQL Standard features seen in any open source database engine: SQL:2016 core language features and an extensive list of SQL:2016 optional features. It supports full Advanced ANSI-92 SQL with only two exceptions. Many extensions to the Standard, including syntax compatibility modes and features of other popular database engines, are also supported. -
44
Serverless, interactive querying for analyzing data in IBM Cloud Object Storage. Query your data directly where it is stored, there's no ETL, no databases, and no infrastructure to manage. IBM Cloud SQL Query uses Apache Spark, an open-source, fast, extensible, in-memory data processing engine optimized for low latency and ad hoc analysis of data. No ETL or schema definition needed to enable SQL queries. Analyze data where it sits in IBM Cloud Object Storage using our query editor and REST API. Run as many queries as you need; with pay-per-query pricing, you pay only for the data scan. Compress or partition data to drive savings and performance. IBM Cloud SQL Query is highly available and executes queries using compute resources across multiple facilities. IBM Cloud SQL Query supports a variety of data formats such as CSV, JSON and Parquet, and allows for standard ANSI SQL.Starting Price: $5.00/Terabyte-Month
-
45
IBM Db2 Big SQL
IBM
A hybrid SQL-on-Hadoop engine delivering advanced, security-rich data query across enterprise big data sources, including Hadoop, object storage and data warehouses. IBM Db2 Big SQL is an enterprise-grade, hybrid ANSI-compliant SQL-on-Hadoop engine, delivering massively parallel processing (MPP) and advanced data query. Db2 Big SQL offers a single database connection or query for disparate sources such as Hadoop HDFS and WebHDFS, RDMS, NoSQL databases, and object stores. Benefit from low latency, high performance, data security, SQL compatibility, and federation capabilities to do ad hoc and complex queries. Db2 Big SQL is now available in 2 variations. It can be integrated with Cloudera Data Platform, or accessed as a cloud-native service on the IBM Cloud Pak® for Data platform. Access and analyze data and perform queries on batch and real-time data across sources, like Hadoop, object stores and data warehouses. -
46
MonetDB
MonetDB
Choose from a wide range of SQL features to realise your applications from pure analytics to hybrid transactional/analytical processing. When you're curious about what's in your data; when you want to work efficiently; when your deadline is closing: MonetDB returns query result in mere seconds or even less. When you want to (re)use your own code; when you need specialised functions: use the hooks to add your own user-defined functions in SQL, Python, R or C/C++. Join us and expand the MonetDB community spread over 130+ countries with students, teachers, researchers, start-ups, small businesses and multinational enterprises. Join the leading Database in Analytical Jobs and surf the innovation! Don’t lose time with complex installation, use MonetDB’s easy setup to get your DBMS up and running quickly. -
47
ArcadeDB
ArcadeDB
ArcadeDB is an open-source, next-generation multi-model database. Forget Polyglot Persistence — store graphs, documents, key-value pairs, search engine indexes, vectors, and time-series data all in one database with native support for every model. No translation layers, no performance penalties. Process over 10 million records per second. Traversal speed stays constant whether your database has hundreds or billions of records. Query in the language you prefer: SQL, Cypher, Gremlin, GraphQL, MongoDB API, or Java. Deploy ArcadeDB embedded in your JVM application, on a standalone server, or distributed across multiple nodes with Raft Consensus for high availability. Fully ACID-compliant. Super lightweight. Apache 2.0 licensed — free for production and commercial use.Starting Price: Free -
48
Apache Drill
The Apache Software Foundation
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage -
49
Apache Atlas
Apache Software Foundation
Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. Apache Atlas provides open metadata management and governance capabilities for organizations to build a catalog of their data assets, classify and govern these assets and provide collaboration capabilities around these data assets for data scientists, analysts and the data governance team. Pre-defined types for various Hadoop and non-Hadoop metadata. Ability to define new types for the metadata to be managed. Types can have primitive attributes, complex attributes, object references; can inherit from other types. Instances of types, called entities, capture metadata object details and their relationships. REST APIs to work with types and instances allow easier integration. -
50
Apache Druid
Druid
Apache Druid is an open source distributed data store. Druid’s core design combines ideas from data warehouses, timeseries databases, and search systems to create a high performance real-time analytics database for a broad range of use cases. Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture. Druid stores and compresses each column individually, and only needs to read the ones needed for a particular query, which supports fast scans, rankings, and groupBys. Druid creates inverted indexes for string values for fast search and filter. Out-of-the-box connectors for Apache Kafka, HDFS, AWS S3, stream processors, and more. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures.