Alternatives to Apache Arrow

Compare Apache Arrow alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Arrow in 2024. Compare features, ratings, user reviews, pricing, and more from Apache Arrow competitors and alternatives in order to make an informed decision for your business.

  • 1
    IBM SPSS Statistics
    IBM SPSS Statistics software is used by a variety of customers to solve industry-specific business issues to drive quality decision-making. Advanced statistical procedures and visualization can provide a robust, user friendly and an integrated platform to understand your data and solve complex business and research problems. • Addresses all facets of the analytical process from data preparation and management to analysis and reporting • Provides tailored functionality and customizable interfaces for different skill levels and functional responsibilities • Delivers graphs and presentation-ready reports to easily communicate results Organizations of all types have relied on proven IBM SPSS Statistics technology to increase revenue, outmaneuver competitors, conduct research, and data driven decision-making.
    Compare vs. Apache Arrow View Software
    Visit Website
  • 2
    IBM Cognos Analytics
    IBM Cognos Analytics acts as your trusted co-pilot for business with the aim of making you smarter, faster, and more confident in your data-driven decisions. IBM Cognos Analytics gives every user — whether data scientist, business analyst or non-IT specialist — more power to perform relevant analysis in a way that ties back to organizational objectives. It shortens each user’s journey from simple to sophisticated analytics, allowing them to harness data to explore the unknown, identify new relationships, get a deeper understanding of outcomes and challenge the status quo. Visualize, analyze and share actionable insights about your data with anyone in your organization with IBM Cognos Analytics.
    Compare vs. Apache Arrow View Software
    Visit Website
  • 3
    Looker

    Looker

    Google

    Looker, Google Cloud’s business intelligence platform, enables you to chat with your data. Organizations turn to Looker for self-service and governed BI, to build custom applications with trusted metrics, or to bring Looker modeling to their existing environment. The result is improved data engineering efficiency and true business transformation. Looker is reinventing business intelligence for the modern company. Looker works the way the web does: browser-based, its unique modeling language lets any employee leverage the work of your best data analysts. Operating 100% in-database, Looker capitalizes on the newest, fastest analytic databases—to get real results, in real time.
    Leader badge
    Compare vs. Apache Arrow View Software
    Visit Website
  • 4
    StarTree

    StarTree

    StarTree

    StarTree Cloud is a fully-managed real-time analytics platform designed for OLAP at massive speed and scale for user-facing applications. Powered by Apache Pinot, StarTree Cloud provides enterprise-grade reliability and advanced capabilities such as tiered storage, plus additional indexes and connectors. It integrates seamlessly with transactional databases and event streaming platforms, ingesting data at millions of events per second and indexing it for lightning-fast query responses. StarTree Cloud is available on your favorite public cloud or for private SaaS deployment. • Gain critical real-time insights to run your business • Seamlessly integrate data streaming and batch data • High performance in throughput and low-latency at petabyte scale • Fully-managed cloud service • Tiered storage to optimize cloud performance & spend • Fully-secure & enterprise-ready
  • 5
    MicroStrategy

    MicroStrategy

    MicroStrategy

    Quickly deploy consumer-grade BI experiences for every role, on any device, with the platform that provides sub-second response at enterprise scale. Build consumer-grade intelligence applications, empower users with data discovery, and seamlessly push content to employees, partners, and customers in minutes. Using our open platform, inject the data you trust into the tools you love. Learn about MicroStrategy's #1-rated platform for Embedded Analytics. Deploy mobile intelligence solutions for every user on any device, customized for your organization with no coding required. The fastest, most efficient way to run your Intelligent Enterprise.
  • 6
    Apache Iceberg

    Apache Iceberg

    Apache Software Foundation

    Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceberg can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates. Iceberg handles the tedious and error-prone task of producing partition values for rows in a table and skips unnecessary partitions and files automatically. No extra filters are needed for fast queries, and the table layout can be updated as data or queries change.
  • 7
    Dremio

    Dremio

    Dremio

    Dremio delivers lightning-fast queries and a self-service semantic layer directly on your data lake storage. No moving data to proprietary data warehouses, no cubes, no aggregation tables or extracts. Just flexibility and control for data architects, and self-service for data consumers. Dremio technologies like Data Reflections, Columnar Cloud Cache (C3) and Predictive Pipelining work alongside Apache Arrow to make queries on your data lake storage very, very fast. An abstraction layer enables IT to apply security and business meaning, while enabling analysts and data scientists to explore data and derive new virtual datasets. Dremio’s semantic layer is an integrated, searchable catalog that indexes all of your metadata, so business users can easily make sense of your data. Virtual datasets and spaces make up the semantic layer, and are all indexed and searchable.
  • 8
    Exasol

    Exasol

    Exasol

    With an in-memory, columnar database and MPP architecture, you can query billions of rows in seconds. Queries are distributed across all nodes in a cluster, providing linear scalability for more users and advanced analytics. MPP, in-memory, and columnar storage add up to the fastest database built for data analytics. With SaaS, cloud, on premises and hybrid deployment options you can analyze data wherever it lives. Automatic query tuning reduces maintenance and overhead. Seamless integrations and performance efficiency gets you more power at a fraction of normal infrastructure costs. Smart, in-memory query processing allowed this social networking company to boost performance, processing 10B data sets a year. A single data repository and speed engine to accelerate critical analytics, delivering improved patient outcome and bottom line.
  • 9
    Upsolver

    Upsolver

    Upsolver

    Upsolver makes it incredibly simple to build a governed data lake and to manage, integrate and prepare streaming data for analysis. Define pipelines using only SQL on auto-generated schema-on-read. Easy visual IDE to accelerate building pipelines. Add Upserts and Deletes to data lake tables. Blend streaming and large-scale batch data. Automated schema evolution and reprocessing from previous state. Automatic orchestration of pipelines (no DAGs). Fully-managed execution at scale. Strong consistency guarantee over object storage. Near-zero maintenance overhead for analytics-ready data. Built-in hygiene for data lake tables including columnar formats, partitioning, compaction and vacuuming. 100,000 events per second (billions daily) at low cost. Continuous lock-free compaction to avoid “small files” problem. Parquet-based tables for fast queries.
  • 10
    Actian Vector
    High-performance vectorized columnar analytics database. Consistent performance leader on TPC-H decision support benchmark over last 5 years. Industry-standard ANSI SQL:2003 support plus integration for extensive set of data formats. Updates, security, management, replication. Actian Vector is the industry’s fastest analytic database. Vector’s ability to handle continuous updates without a performance penalty makes it an Operational Data Warehouse (ODW) capable of incorporating the latest business information into your analytic decision-making. Vector achieves extreme performance with full ACID compliance on commodity hardware with the flexibility to deploy on premises, on AWS or Azure, with little or no database tuning. Actian Vector is available on Microsoft Windows for single server deployment. The distribution includes Actian Director for easy GUI based management in addition to the command line interface to easy scripting.
  • 11
    TIBCO ActiveSpaces
    In-memory computing with TIBCO ActiveSpaces® in-memory data grid provides a distributed, consistent, fault-tolerant database that supports scalability for mixed read/write workloads and full system of record capabilities. ActiveSpaces® technology draws on available server memory but also stores it to local disks for data safety and for scaling to handle the processing of the largest data volumes. With the ActiveSpaces solution, contextual, reference, and operational data normally housed in back-end applications can be stored in memory for lightning-fast performance. It's the kind of performance you need to delight customers and beat the other guys. Any data stored anywhere can now be included in real-time processing and decision-making. ActiveSpaces technology handles large data volumes. You can dynamically add capacity without a system restart. Persistence is distributed, so it's often possible to remove costly databases and associated failure points from legacy implementations.
  • 12
    Apache Druid
    Apache Druid is an open source distributed data store. Druid’s core design combines ideas from data warehouses, timeseries databases, and search systems to create a high performance real-time analytics database for a broad range of use cases. Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture. Druid stores and compresses each column individually, and only needs to read the ones needed for a particular query, which supports fast scans, rankings, and groupBys. Druid creates inverted indexes for string values for fast search and filter. Out-of-the-box connectors for Apache Kafka, HDFS, AWS S3, stream processors, and more. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures.
  • 13
    SAP HANA
    SAP HANA in-memory database is for transactional and analytical workloads with any data type — on a single data copy. It breaks down the transactional and analytical silos in organizations, for quick decision-making, on premise and in the cloud. Innovate without boundaries on a database management system, where you can develop intelligent and live solutions for quick decision-making on a single data copy. And with advanced analytics, you can support next-generation transactional processing. Build data solutions with cloud-native scalability, speed, and performance. With the SAP HANA Cloud database, you can gain trusted, business-ready information from a single solution, while enabling security, privacy, and anonymization with proven enterprise reliability. An intelligent enterprise runs on insight from data – and more than ever, this insight must be delivered in real time.
  • 14
    BryteFlow

    BryteFlow

    BryteFlow

    BryteFlow builds the most efficient automated environments for analytics ever. It converts Amazon S3 into an awesome analytics platform by leveraging the AWS ecosystem intelligently to deliver data at lightning speeds. It complements AWS Lake Formation and automates the Modern Data Architecture providing performance and productivity. You can completely automate data ingestion with BryteFlow Ingest’s simple point-and-click interface while BryteFlow XL Ingest is great for the initial full ingest for very large datasets. No coding is needed! With BryteFlow Blend you can merge data from varied sources like Oracle, SQL Server, Salesforce and SAP etc. and transform it to make it ready for Analytics and Machine Learning. BryteFlow TruData reconciles the data at the destination with the source continually or at a frequency you select. If data is missing or incomplete you get an alert so you can fix the issue easily.
  • 15
    Google Cloud Dataproc
    Dataproc makes open source data and analytics processing fast, easy, and more secure in the cloud. Build custom OSS clusters on custom machines faster. Whether you need extra memory for Presto or GPUs for Apache Spark machine learning, Dataproc can help accelerate your data and analytics processing by spinning up a purpose-built cluster in 90 seconds. Easy and affordable cluster management. With autoscaling, idle cluster deletion, per-second pricing, and more, Dataproc can help reduce the total cost of ownership of OSS so you can focus your time and resources elsewhere. Security built in by default. Encryption by default helps ensure no piece of data is unprotected. With JobsAPI and Component Gateway, you can define permissions for Cloud IAM clusters, without having to set up networking or gateway nodes.
  • 16
    SigView

    SigView

    Sigmoid

    Get access to granular data for effortless slice & dice on billions of rows, and ensure real-time reporting in seconds! Sigview is a plug-n-play real-time data analytics tool by Sigmoid to carry exploratory data analysis. Custom built on Apache Spark, Sigview is capable of drilling down into massive data sets within a few seconds. Used by around 30k users across the globe to analyze billions of ad impressions, Sigview is designed to give real-time access to your Programmatic and non-programmatic data by analyzing enormous data sets while creating real-time reports. Whether it is optimizing your ad campaigns or discovering new inventory or generating revenue opportunities with changing times, Sigview is your go-to platform for all your reporting needs. Connects to multiple data sources like DFP, Pixel Servers, Audience and viewability partners to ingest data in any format and location maintaining data latency of less than 15 minutes.
  • 17
    GeoSpock

    GeoSpock

    GeoSpock

    GeoSpock enables data fusion for the connected world with GeoSpock DB – the space-time analytics database. GeoSpock DB is a unique, cloud-native database optimised for querying for real-world use cases, able to fuse multiple sources of Internet of Things (IoT) data together to unlock its full value, whilst simultaneously reducing complexity and cost. GeoSpock DB enables efficient storage, data fusion, and rapid programmatic access to data, and allows you to run ANSI SQL queries and connect to analytics tools via JDBC/ODBC connectors. Users are able to perform analysis and share insights using familiar toolsets, with support for common BI tools (such as Tableau™, Amazon QuickSight™, and Microsoft Power BI™), and Data Science and Machine Learning environments (including Python Notebooks and Apache Spark). The database can also be integrated with internal applications and web services – with compatibility for open-source and visualisation libraries such as Kepler and Cesium.js.
  • 18
    Sadas Engine
    Sadas Engine is the fastest Columnar Database Management System both in Cloud and On Premise. Turn Data into Information with the fastest columnar Database Management System able to perform 100 times faster than transactional DBMSs and able to carry out searches on huge quantities of data over a period even longer than 10 years. Every day we work to ensure impeccable service and appropriate solutions to enhance the activities of your specific business. SADAS srl, a company of the AS Group , is dedicated to the development of Business Intelligence solutions, data analysis applications and DWH tools, relying on cutting-edge technology. The company operates in many sectors: banking, insurance, leasing, commercial, media and telecommunications, and in the public sector. Innovative software solutions for daily management needs and decision-making processes, in any sector
  • 19
    biGENIUS

    biGENIUS

    biGENIUS AG

    biGENIUS automates the entire lifecycle of analytical data management solutions (e.g. data warehouses, data lakes, data marts, real-time analytics, etc.) and thus providing the foundation for turning your data into business as fast and cost-efficient as possible. Save time, efforts and costs to build and maintain your data analytics solutions. Integrate new ideas and data into your data analytics solutions easily. Benefit from new technologies thanks to the metadata-driven approach. Advancing digitalization challenges traditional data warehouse (DWH) and business intelligence systems to leverage an increasing wealth of data. To accommodate today’s business decision making, analytical data management is required to integrate new data sources, support new data formats as well as technologies and deliver effective solutions faster than ever before, ideally with limited resources.
  • 20
    MotherDuck

    MotherDuck

    MotherDuck

    We’re MotherDuck, a software company founded by a passionate flock of experienced data geeks. We’ve worked as leaders for some of the greatest companies in data. Scale-out is expensive and slow, let’s scale up. Big Data is dead, long live easy data. Your laptop is faster than your data warehouse. Why wait for the cloud? DuckDB slaps, so let’s supercharge it. When we founded MotherDuck we recognized that DuckDB might just be the next major game changer thanks to its ease of use, portability, lightning-fast performance, and rapid pace of community-driven innovation. At MotherDuck, we want to help the community, the DuckDB Foundation, and DuckDB Labs build greater awareness and adoption of DuckDB, whether users are working locally or want a serverless always-on way to execute their SQL. We are a world-class team of engineers and leaders with experience working on databases and cloud services at AWS, Databricks, Elastic, Facebook, Firebolt, Google BigQuery, Neo4j, SingleStore, and more.
  • 21
    Qubole

    Qubole

    Qubole

    Qubole is a simple, open, and secure Data Lake Platform for machine learning, streaming, and ad-hoc analytics. Our platform provides end-to-end services that reduce the time and effort required to run Data pipelines, Streaming Analytics, and Machine Learning workloads on any cloud. No other platform offers the openness and data workload flexibility of Qubole while lowering cloud data lake costs by over 50 percent. Qubole delivers faster access to petabytes of secure, reliable and trusted datasets of structured and unstructured data for Analytics and Machine Learning. Users conduct ETL, analytics, and AI/ML workloads efficiently in end-to-end fashion across best-of-breed open source engines, multiple formats, libraries, and languages adapted to data volume, variety, SLAs and organizational policies.
  • 22
    IBM Transformation Extender
    IBM® Sterling Transformation Extender enables your organization to integrate industry-based customer, supplier and business partner transactions across the enterprise. It helps automate complex transformation and validation of data between a range of different formats and standards. Data can be transformed either on-premises or in the cloud. Additional available advanced transformation support provides metadata for mapping, compliance checking and related processing functions for specific industries, including finance, healthcare, and supply chain. Industry standards, structured or unstructured data and custom formats. On-premises and hybrid, private or public cloud. With a robust user experience and RESTful APIs. Automates complex transformation and validation of data between various formats and standards. Any-to-any data transformation. Containerized for cloud deployments. Modern user experience. ITX industry-specific packs.
  • 23
    Scribble Data

    Scribble Data

    Scribble Data

    Scribble Data empowers organizations to enrich their raw data and easily transform it to enable reliable and fast decision-making for persistent business problems. Data-driven decision support for your business. A data-to-decision platform that helps you generate high-fidelity insights to automate decision-making. Solve your persistent business decision-making problems instantly with advanced analytics powered by machine learning. Rest easy and focus your energy on critical tasks, while Enrich does the heavy lifting to ensure the availability of reliable and trustworthy data for decision-making. Leverage customized data-driven workflows for easy consumption of data, and reduce your dependence on data science and machine learning engineering teams. Go from concept to operational data product in a few weeks, not months with feature engineering capabilities that can prepare high volume and high complexity data at scale.
  • 24
    Qlik Sense
    Empower people at all skill levels to make data-driven decisions and take action when it matters most. Deeper interactivity. Broader context. Lightning fast. No one else compares. Qlik’s one-of-a-kind Associative technology brings unmatched power to the core of our industry-leading analytics experience. Empower all your users to explore freely at the speed of thought with hyperfast calculations, always in context, at scale. Yeah, it’s a big deal. And it’s why Qlik Sense takes you way beyond the limits of query-based analytics and dashboards our competitors offer. Insight Advisor in Qlik Sense uses AI to help your users understand and use data more effectively, minimizing cognitive bias, amplifying discovery, and elevating data literacy. Organizations need a dynamic relationship with information that reflects the current moment. Traditional, passive BI falls short.
  • 25
    USEReady

    USEReady

    USEReady

    USEReady employs AI to make BI more human and conversational as you take your data intelligence to the next level. USEReady has built a culture of data democracy, self-service, and community to benefit all users collectively. Make a positive impact on business by using data visualization and insights to improve corporate decision-making at all levels. Harness the power of collective intelligence through our Analytics Community Platform. Faster & better actionable insights helped drive impactful social welfare campaigns & improve public participation. Watch how the decision to move their BI to the cloud, completely transformed the analytical culture and capability. A completely integrated solution where business users, data scientists, and marketers can all connect with experts and uncover intelligent data, collaborate, learn, and accelerate innovation – all made available from a vast collection of resources.
  • 26
    OptimalPlus
    Use advanced, actionable analytics to maximize your manufacturing efficiency, accelerate new product ramp and at the same time, make your product more reliable than ever. Harness the industry’s leading big data analytics platform and over a decade of domain expertise to take your manufacturing efficiency, quality and reliability to the next level. Use advanced, actionable analytics to maximize your manufacturing efficiency, accelerate new product ramp and gain visibility into your supply chain. We are a lifecycle analytics company that helps automotive and semiconductor manufacturing organizations make the most of their data. Our unique open platform was designed for your industry to give you a deep understanding of all the attributes of your products, to accelerate innovation by providing a comprehensive end-to-end solution for advanced analytics, artificial intelligence and machine learning.
  • 27
    Crux

    Crux

    Crux

    Find out why the heavy hitters are using the Crux external data automation platform to scale external data integration, transformation, and observability without increasing headcount. Our cloud-native data integration technology accelerates the ingestion, preparation, observability and ongoing delivery of any external dataset. The result is that we can ensure you get quality data in the right place, in the right format when you need it. Leverage automatic schema detection, delivery schedule inference, and lifecycle management to build pipelines from any external data source quickly. Enhance discoverability throughout your organization through a private catalog of linked and matched data products. Enrich, validate, and transform any dataset to quickly combine it with other data sources and accelerate analytics.
  • 28
    Ideata Analytics

    Ideata Analytics

    Ideata Analytics

    Ideata Analytics is a unified business intelligence platform that helps you prepare and analyze data at scale. Transform and visualize data with Ideata to help your organization view insights like never before. With Ideata’s suggestive data preparation and enrichment, users can perform data transformation and preparation on their data with zero coding. Ideata’s web based, drag and drop interface with powerful visualization and dashboarding capabilities empowers users to identify hidden patterns and insights with ease. Whether you are viewing your data on mobile or on an i-pad, your dashboard looks beautiful. Ideata Analytics works everywhere. Start building your dashboard on any modern web browser your laptop, and see it on the go on your iPad or phone. See it as beautiful as you designed.
  • 29
    Seerene

    Seerene

    Seerene

    Seerene’s Digital Engineering Platform is a software analytics and process mining technology that analyzes and visualizes the software development processes in your company. It reveals weaknesses and turns your organization into a well-oiled machine, delivering software efficiently, cost-effectively, quickly, and with the highest quality. Seerene provides decision-makers with the information needed to actively drive their organization towards 360° software excellence. Reveal code that frequently contains defects and kills developer productivity.​ Reveal lighthouse teams and transfer their best-practice processes across the entire workforce.​ Reveal defect risks in release candidates with a holistic X-ray of code, development hotspots and tests. Reveal features with a mismatch between invested developer time und created user value.​ Reveal code that is never executed by end-users and produces unnecessary maintenance costs.​
  • 30
    Katana Graph

    Katana Graph

    Katana Graph

    Simplified distributed computing drives huge graph-analytics performance gains without the need for major infrastructure. Strengthen insights by bringing in a wider array of data to be standardized and plotted onto the graph. Pairing innovations in graph and deep learning have meant efficiencies that allow timely insights on the world’s biggest graphs. From comprehensive fraud detection in real time to 360° views of the customer, Katana Graph empowers Financial Services organizations to unlock the tremendous potential of graph analytics and AI at scale. Drawing from advances in high-performance parallel computing (HPC), Katana Graph’s intelligence platform assesses risk and draws customer insights from the largest data sources using high-speed analytics and AI that goes well beyond what is possible using other graph technologies.
  • 31
    Trino

    Trino

    Trino

    Trino is a query engine that runs at ludicrous speed. Fast-distributed SQL query engine for big data analytics that helps you explore your data universe. Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low-latency analytics. The largest organizations in the world use Trino to query exabyte-scale data lakes and massive data warehouses alike. Supports diverse use cases, ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high-volume apps that perform sub-second queries. Trino is an ANSI SQL-compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset, and many others. You can natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data. Access data from multiple systems within a single query.
  • 32
    Sesame Software

    Sesame Software

    Sesame Software

    Sesame Software specializes in secure, efficient data integration and replication across diverse cloud, hybrid, and on-premise sources. Our patented scalability ensures comprehensive access to critical business data, facilitating a holistic view in the BI tools of your choice. This unified perspective empowers your own robust reporting and analytics, enabling your organization to regain control of your data with confidence. At Sesame Software, we understand what’s at stake when you need to move a massive amount of data between environments quickly—while keeping it protected, maintaining centralized access, and ensuring compliance with regulations. Over the past 23+ years, we’ve helped hundreds of organizations like Proctor & Gamble, Bank of America, and the U.S. government connect, move, store, and protect their data.
  • 33
    Elasticsearch
    Elastic is a search company. As the creators of the Elastic Stack (Elasticsearch, Kibana, Beats, and Logstash), Elastic builds self-managed and SaaS offerings that make data usable in real time and at scale for search, logging, security, and analytics use cases. Elastic's global community has more than 100,000 members across 45 countries. Since its initial release, Elastic's products have achieved more than 400 million cumulative downloads. Today thousands of organizations, including Cisco, eBay, Dell, Goldman Sachs, Groupon, HP, Microsoft, Netflix, The New York Times, Uber, Verizon, Yelp, and Wikipedia, use the Elastic Stack, and Elastic Cloud to power mission-critical systems that drive new revenue opportunities and massive cost savings. Elastic has headquarters in Amsterdam, The Netherlands, and Mountain View, California; and has over 1,000 employees in more than 35 countries around the world.
  • 34
    Apache Storm

    Apache Storm

    Apache Software Foundation

    Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use! Apache Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Apache Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Apache Storm integrates with the queueing and database technologies you already use. An Apache Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.
  • 35
    eXtremeDB

    eXtremeDB

    McObject

    How is platform independent eXtremeDB different? - Hybrid data storage. Unlike other IMDS, eXtremeDB can be all-in-memory, all-persistent, or have a mix of in-memory tables and persistent tables - Active Replication Fabric™ is unique to eXtremeDB, offering bidirectional replication, multi-tier replication (e.g. edge-to-gateway-to-gateway-to-cloud), compression to maximize limited bandwidth networks and more - Row & Columnar Flexibility for Time Series Data supports database designs that combine row-based and column-based layouts, in order to best leverage the CPU cache speed - Embedded and Client/Server. Fast, flexible eXtremeDB is data management wherever you need it, and can be deployed as an embedded database system, and/or as a client/server database system -A hard real-time deterministic option in eXtremeDB/rt Designed for use in resource-constrained, mission-critical embedded systems. Found in everything from routers to satellites to trains to stock markets worldwide
  • 36
    Arcadia Data

    Arcadia Data

    Arcadia Data

    Arcadia Data provides the first visual analytics and BI platform native to Hadoop and cloud (big data) that delivers the scale, performance, and agility business users need for both real-time and historical insights. Its flagship product, Arcadia Enterprise, was built from inception for big data platforms such as Apache Hadoop, Apache Spark, Apache Kafka, and Apache Solr, in the cloud and/or on-premises. Using artificial intelligence (AI) and machine learning (ML), Arcadia Enterprise streamlines the self-service analytics process with search-based BI and visualization recommendations. It enables real-time, high-definition insights in use cases like data lakes, cybersecurity, connected IoT devices, and customer intelligence. Arcadia Enterprise is deployed by some of the world’s leading brands, including Procter & Gamble, Citibank, Nokia, Royal Bank of Canada, Kaiser Permanente, HPE, and Neustar.
  • 37
    Apache Spark

    Apache Spark

    Apache Software Foundation

    Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
  • 38
    Hadoop

    Hadoop

    Apache Software Foundation

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page. Apache Hadoop 3.3.4 incorporates a number of significant enhancements over the previous major release line (hadoop-3.2).
  • 39
    Azure HDInsight

    Azure HDInsight

    Microsoft

    Run popular open-source frameworks—including Apache Hadoop, Spark, Hive, Kafka, and more—using Azure HDInsight, a customizable, enterprise-grade service for open-source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure. Easily migrate your big data workloads and processing to the cloud. Open-source projects and clusters are easy to spin up quickly without the need to install hardware or manage infrastructure. Big data clusters reduce costs through autoscaling and pricing tiers that allow you to pay for only what you use. Enterprise-grade security and industry-leading compliance with more than 30 certifications helps protect your data. Optimized components for open-source technologies such as Hadoop and Spark keep you up to date.
  • 40
    Kyligence

    Kyligence

    Kyligence

    Let Kyligence Zen take care of collecting, organizing, and analyzing your metrics so you can focus more on taking action Kyligence Zen is the go-to low-code metrics platform to define, collect, and analyze your business metrics. It empowers users to quickly connect their data sources, define their business metrics, uncover hidden insights in minutes, and share them across their organization. Kyligence Enterprise provides diverse solutions based on on-premise, public cloud, and private cloud, helping enterprises of any size to simplify multidimensional analysis based on massive amounts of data according to their business needs.​ Kyligence Enterprise, based on Apache Kylin, provides sub-second standard SQL query responses based on PB-scale datasets, simplifying multidimensional data analysis on data lakes for enterprises and enabling business users to quickly discover business value in massive amounts of data and drive better business decisions.
  • 41
    Bizintel360

    Bizintel360

    Bizdata

    AI powered self-service advanced analytics platform. Connect data sources and derive visualizations without any programming. Cloud native advanced analytics platform that provides high-quality data supply and intelligent real-time analysis across the enterprise without any code. Connect different data sources of different formats. Enables identification of root cause problems. Reduce cycle time: source to target. Analytics without programming knowledge. Real time data refresh on the go. Connect data source of any format, stream data in real time or defined frequency to data lake and visualize them in advanced interactive search engine-based dashboards. Descriptive, predictive and prescriptive analytics in a single platform with the power of search engine and advanced visualization. No traditional technology required to see data in various visualization formats. Roll up, slice and dice data with various mathematical computation right inside Bizintel360 visualization.
  • 42
    Hazelcast

    Hazelcast

    Hazelcast

    In-Memory Computing Platform. The digital world is different. Microseconds matter. That's why the world's largest organizations rely on us to power their most time-sensitive applications at scale. New data-enabled applications can deliver transformative business power – if they meet today’s requirement of immediacy. Hazelcast solutions complement virtually any database to deliver results that are significantly faster than a traditional system of record. Hazelcast’s distributed architecture provides redundancy for continuous cluster up-time and always available data to serve the most demanding applications. Capacity grows elastically with demand, without compromising performance or availability. The fastest in-memory data grid, combined with third-generation high-speed event processing, delivered through the cloud.
  • 43
    GigaSpaces

    GigaSpaces

    GigaSpaces

    Smart DIH is an operational data hub that powers real-time modern applications. It unleashes the power of customers’ data by transforming data silos into assets, turning organizations into data-driven enterprises. Smart DIH consolidates data from multiple heterogeneous systems into a highly performant data layer. Low code tools empower data professionals to deliver data microservices in hours, shortening developing cycles and ensuring data consistency across all digital channels. XAP Skyline is a cloud-native, in memory data grid (IMDG) and developer framework designed for mission critical, cloud-native apps. XAP Skyline delivers maximal throughput, microsecond latency and scale, while maintaining transactional consistency. It provides extreme performance, significantly reducing data access time, which is crucial for real-time decisioning, and transactional applications. XAP Skyline is used in financial services, retail, and other industries where speed and scalability are critical.
  • 44
    Vaex

    Vaex

    Vaex

    At Vaex.io we aim to democratize big data and make it available to anyone, on any machine, at any scale. Cut development time by 80%, your prototype is your solution. Create automatic pipelines for any model. Empower your data scientists. Turn any laptop into a big data powerhouse, no clusters, no engineers. We provide reliable and fast data driven solutions. With our state-of-the-art technology we build and deploy machine learning models faster than anyone on the market. Turn your data scientist into big data engineers. We provide comprehensive training of your employees, enabling you to take full advantage of our technology. Combines memory mapping, a sophisticated expression system, and fast out-of-core algorithms. Efficiently visualize and explore big datasets, and build machine learning models on a single machine.
  • 45
    QuerySurge
    QuerySurge leverages AI to automate the data validation and ETL testing of Big Data, Data Warehouses, Business Intelligence Reports and Enterprise Apps/ERPs with full DevOps functionality for continuous testing. Use Cases - Data Warehouse & ETL Testing - Hadoop & NoSQL Testing - DevOps for Data / Continuous Testing - Data Migration Testing - BI Report Testing - Enterprise App/ERP Testing QuerySurge Features - Projects: Multi-project support - AI: automatically create datas validation tests based on data mappings - Smart Query Wizards: Create tests visually, without writing SQL - Data Quality at Speed: Automate the launch, execution, comparison & see results quickly - Test across 200+ platforms: Data Warehouses, Hadoop & NoSQL lakes, databases, flat files, XML, JSON, BI Reports - DevOps for Data & Continuous Testing: RESTful API with 60+ calls & integration with all mainstream solutions - Data Analytics & Data Intelligence:  Analytics dashboard & reports
  • 46
    TIBCO Clarity
    TIBCO Clarity is a data preparation tool that offers you on-demand software services from the web in the form of Software-as-a-Service. You can use TIBCO Clarity to discover, profile, cleanse, and standardize raw data collated from disparate sources and provide good quality data for accurate analysis and intelligent decision-making. You can collect your raw data from disparate sources in variety of data formats. The supported data sources are disk drives, databases, tables, and spreadsheets, both cloud and on-premise. TIBCO Clarity detects data patterns and data types for auto-metadata generation. You can profile row and column data for completeness, uniqueness, and variation. Predefined facets categorize data based on text occurrences and text patterns. You can use the numeric distributions to identify variations and outliers in the data.
  • 47
    OpenText Magellan
    Machine Learning and Predictive Analytics Platform. Augment data-driven decision making and accelerate business with advanced artificial intelligence in a pre-built machine learning and big data analytics platform. OpenText Magellan uses AI technologies to provide predictive analytics in easy to consume and flexible data visualizations that maximize the value of business intelligence. Artificial intelligence software eliminates the need for manual big data processing by presenting valuable business insights in a way that is accessible and related to the most critical objectives of the organization. By augmenting business processes through a curated mix of capabilities, including predictive modeling, data discovery tools, data mining techniques, IoT data analytics and more, organizations can use their data to improve decision making based on real business intelligence and analytics.
  • 48
    SHREWD Platform

    SHREWD Platform

    Transforming Systems

    Harness your whole system’s data with ease, with our SHREWD Platform tools and open APIs. SHREWD Platform provides the integration and data collection tools the SHREWD modules operate from. The tools aggregate data, storing it in our secure, UK-based data lake. This data is then accessed by the SHREWD modules or an API, to transform the data into meaningful information with targeted functions. Data can be ingested by SHREWD Platform in almost any format, from analog in spreadsheets, to digital systems via APIs. The system’s open API can also allow third-party connections to use the information held in the data lake, if required. SHREWD Platform provides an operational data layer that is a single source of the truth in real-time, allowing the SHREWD modules to provide intelligent insights, and managers and key decision-makers to take the right action at the right time.
  • 49
    TIBCO Data Science

    TIBCO Data Science

    TIBCO Software

    Democratize, collaborate, and operationalize, machine learning across your organization. Data science is a team sport. Data scientists, citizen data scientists, data engineers, business users, and developers need flexible and extensible tools that promote collaboration, automation, and reuse of analytic workflows. But algorithms are only one piece of the advanced analytic puzzle. To deliver predictive insights, companies need to increase focus on the deployment, management, and monitoring of analytic models. Smart businesses rely on platforms that support the end-to-end analytics lifecycle while providing enterprise security and governance. TIBCO® Data Science software helps organizations innovate and solve complex problems faster to ensure predictive findings quickly turn into optimal outcomes. TIBCO Data Science allows organizations to expand data science deployments across the organization by providing flexible authoring and deployment capabilities.
  • 50
    doolytic

    doolytic

    doolytic

    doolytic is leading the way in big data discovery, the convergence of data discovery, advanced analytics, and big data. doolytic is rallying expert BI users to the revolution in self-service exploration of big data, revealing the data scientist in all of us. doolytic is an enterprise software solution for native discovery on big data. doolytic is based on best-of-breed, scalable, open-source technologies. Lightening performance on billions of records and petabytes of data. Structured, unstructured and real-time data from any source. Sophisticated advanced query capabilities for expert users, Integration with R for advanced and predictive applications. Search, analyze, and visualize data from any format, any source in real-time with the flexibility of Elastic. Leverage the power of Hadoop data lakes with no latency and concurrency issues. doolytic solves common BI problems and enables big data discovery without clumsy and inefficient workarounds.