Alternatives to Polars

Compare Polars alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Polars in 2024. Compare features, ratings, user reviews, pricing, and more from Polars competitors and alternatives in order to make an informed decision for your business.

  • 1
    Google Cloud BigQuery
    BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely scale analytics, share rich data experiences with built-in business intelligence, and train and deploy ML models with a simple SQL interface, helping to make your organization’s operations more data-driven.
    Compare vs. Polars View Software
    Visit Website
  • 2
    Apache Spark

    Apache Spark

    Apache Software Foundation

    Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
  • 3
    PySpark

    PySpark

    PySpark

    PySpark is an interface for Apache Spark in Python. It not only allows you to write Spark applications using Python APIs, but also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core. Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrame and can also act as distributed SQL query engine. Running on top of Spark, the streaming feature in Apache Spark enables powerful interactive and analytical applications across both streaming and historical data, while inheriting Spark’s ease of use and fault tolerance characteristics.
  • 4
    Trino

    Trino

    Trino

    Trino is a query engine that runs at ludicrous speed. Fast-distributed SQL query engine for big data analytics that helps you explore your data universe. Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low-latency analytics. The largest organizations in the world use Trino to query exabyte-scale data lakes and massive data warehouses alike. Supports diverse use cases, ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high-volume apps that perform sub-second queries. Trino is an ANSI SQL-compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset, and many others. You can natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data. Access data from multiple systems within a single query.
    Starting Price: Free
  • 5
    IBM Db2 Big SQL
    A hybrid SQL-on-Hadoop engine delivering advanced, security-rich data query across enterprise big data sources, including Hadoop, object storage and data warehouses. IBM Db2 Big SQL is an enterprise-grade, hybrid ANSI-compliant SQL-on-Hadoop engine, delivering massively parallel processing (MPP) and advanced data query. Db2 Big SQL offers a single database connection or query for disparate sources such as Hadoop HDFS and WebHDFS, RDMS, NoSQL databases, and object stores. Benefit from low latency, high performance, data security, SQL compatibility, and federation capabilities to do ad hoc and complex queries. Db2 Big SQL is now available in 2 variations. It can be integrated with Cloudera Data Platform, or accessed as a cloud-native service on the IBM Cloud Pak® for Data platform. Access and analyze data and perform queries on batch and real-time data across sources, like Hadoop, object stores and data warehouses.
  • 6
    Dremio

    Dremio

    Dremio

    Dremio delivers lightning-fast queries and a self-service semantic layer directly on your data lake storage. No moving data to proprietary data warehouses, no cubes, no aggregation tables or extracts. Just flexibility and control for data architects, and self-service for data consumers. Dremio technologies like Data Reflections, Columnar Cloud Cache (C3) and Predictive Pipelining work alongside Apache Arrow to make queries on your data lake storage very, very fast. An abstraction layer enables IT to apply security and business meaning, while enabling analysts and data scientists to explore data and derive new virtual datasets. Dremio’s semantic layer is an integrated, searchable catalog that indexes all of your metadata, so business users can easily make sense of your data. Virtual datasets and spaces make up the semantic layer, and are all indexed and searchable.
  • 7
    Qubole

    Qubole

    Qubole

    Qubole is a simple, open, and secure Data Lake Platform for machine learning, streaming, and ad-hoc analytics. Our platform provides end-to-end services that reduce the time and effort required to run Data pipelines, Streaming Analytics, and Machine Learning workloads on any cloud. No other platform offers the openness and data workload flexibility of Qubole while lowering cloud data lake costs by over 50 percent. Qubole delivers faster access to petabytes of secure, reliable and trusted datasets of structured and unstructured data for Analytics and Machine Learning. Users conduct ETL, analytics, and AI/ML workloads efficiently in end-to-end fashion across best-of-breed open source engines, multiple formats, libraries, and languages adapted to data volume, variety, SLAs and organizational policies.
  • 8
    Tabular

    Tabular

    Tabular

    Tabular is an open table store from the creators of Apache Iceberg. Connect multiple computing engines and frameworks. Decrease query time and storage costs by up to 50%. Centralize enforcement of data access (RBAC) policies. Connect any query engine or framework, including Athena, BigQuery, Redshift, Snowflake, Databricks, Trino, Spark, and Python. Smart compaction, clustering, and other automated data services reduce storage costs and query times by up to 50%. Unify data access at the database or table. RBAC controls are simple to manage, consistently enforced, and easy to audit. Centralize your security down to the table. Tabular is easy to use plus it features high-powered ingestion, performance, and RBAC under the hood. Tabular gives you the flexibility to work with multiple “best of breed” compute engines based on their strengths. Assign privileges at the data warehouse database, table, or column level.
    Starting Price: $100 per month
  • 9
    Databricks Data Intelligence Platform
    The Databricks Data Intelligence Platform allows your entire organization to use data and AI. It’s built on a lakehouse to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data. The winners in every industry will be data and AI companies. From ETL to data warehousing to generative AI, Databricks helps you simplify and accelerate your data and AI goals. Databricks combines generative AI with the unification benefits of a lakehouse to power a Data Intelligence Engine that understands the unique semantics of your data. This allows the Databricks Platform to automatically optimize performance and manage infrastructure in ways unique to your business. The Data Intelligence Engine understands your organization’s language, so search and discovery of new data is as easy as asking a question like you would to a coworker.
  • 10
    Starburst Enterprise

    Starburst Enterprise

    Starburst Data

    Starburst helps you make better decisions with fast access to all your data; Without the complexity of data movement and copies. Your company has more data than ever before, but your data teams are stuck waiting to analyze it. Starburst unlocks access to data where it lives, no data movement required, giving your teams fast & accurate access to more data for analysis. Starburst Enterprise is a fully supported, production-tested and enterprise-grade distribution of open source Trino (formerly Presto® SQL). It improves performance and security while making it easy to deploy, connect, and manage your Trino environment. Through connecting to any source of data – whether it’s located on-premise, in the cloud, or across a hybrid cloud environment – Starburst lets your team use the analytics tools they already know & love while accessing data that lives anywhere.
  • 11
    Apache Impala
    Impala provides low latency and high concurrency for BI/analytic queries on the Hadoop ecosystem, including Iceberg, open data formats, and most cloud storage options. Impala also scales linearly, even in multitenant environments. Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Ranger module, you can ensure that the right users and applications are authorized for the right data. Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment, with no redundant infrastructure or data conversion/duplication. For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about reinventing the implementation wheel. With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata stored from source through analysis.
    Starting Price: Free
  • 12
    Presto

    Presto

    Presto Foundation

    Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse. Different engines for different workloads means you will have to re-platform down the road. With Presto, you get 1 familar ANSI SQL language and 1 engine for your data analytics so you don't need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users. Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together.
  • 13
    Baidu Palo

    Baidu Palo

    Baidu AI Cloud

    Palo helps enterprises to create the PB-level MPP architecture data warehouse service within several minutes and import the massive data from RDS, BOS, and BMR. Thus, Palo can perform the multi-dimensional analytics of big data. Palo is compatible with mainstream BI tools. Data analysts can analyze and display the data visually and gain insights quickly to assist decision-making. It has the industry-leading MPP query engine, with column storage, intelligent index,and vector execution functions. It can also provide in-library analytics, window functions, and other advanced analytics functions. You can create a materialized view and change the table structure without the suspension of service. It supports flexible and efficient data recovery.
  • 14
    Axibase Time Series Database
    Parallel query engine with time- and symbol-indexed data access. Extended SQL syntax with advanced filtering and aggregations. Consolidate quotes, trades, snapshots, and reference data in one place. Strategy backtesting on high-frequency data. Quantitative and market microstructure research. Granular transaction cost analysis and rollup reporting. Market surveillance and anomaly detection. Non-transparent ETF/ETN decomposition. FAST, SBE, and proprietary protocols. Plain text protocol. Consolidated and direct feeds. Built-in latency monitoring tools. End-of-day archives. ETL from institutional and retail financial data platforms. Parallel SQL engine with syntax extensions. Advanced filtering by trading session, auction stage, index composition. Optimized aggregates for OHLCV and VWAP calculations. Interactive SQL console with auto-completion. API endpoint for programmatic integration. Scheduled SQL reporting with email, file, and web delivery. JDBC and ODBC drivers.
  • 15
    labPortal

    labPortal

    Analytical Information Systems

    Perhaps you want to give your clients access to their LIMS data and reports via the web. AIS labPortal allows you to do just that. Paper copies of sample analyses needn’t be sent out in the post to customers. Using their unique login and security password, clients can access data from their computer, which is not only safer and less time-consuming but also more environmentally friendly. labPortal is a web-based portal that securely stores your clients’ sample information and data in the cloud, allowing them to easily access it instantly from their own desktop, tablet or phone. The labPortal interface is 'inbox' style which is simple and easy to use with an enhanced query engine, conditional highlighting and Microsoft Excel export. The software features a simple and easy-to-use sample registration form which allows users to pre-register samples online. Transcribing data is a time-consuming and tedious activity.
    Starting Price: $200 per month
  • 16
    QuasarDB

    QuasarDB

    QuasarDB

    Quasar's brain is QuasarDB, a high-performance, distributed, column-oriented timeseries database management system designed from the ground up to deliver real-time on petascale use cases. Up to 20X less disk usage. Quasardb ingestion and compression capabilities are unmatched. Up to 10,000X faster feature extraction. QuasarDB can extract features in real-time from the raw data, thanks to the combination of a built-in map/reduce query engine, an aggregation engine that leverages SIMD from modern CPUs, and stochastic indexes that use virtually no disk space. The most cost-effective timeseries solution, thanks to its ultra-efficient resource usage, the capability to leverage object storage (S3), unique compression technology, and fair pricing model. Quasar runs everywhere, from 32-bit ARM devices to high-end Intel servers, from Edge Computing to the cloud or on-premises.
  • 17
    Motif Analytics

    Motif Analytics

    Motif Analytics

    Rich interactive visualizations for identifying patterns in user and business flows, with full visibility into underlying computation. A small set of sequence operations providing full expressivity and fine-grained control in under 10 lines of code. An incremental query engine to seamlessly trade between query precision, speed and cost according to your needs. Currently Motif uses a tiny custom-built DSL called Sequence Operations Language (SOL), which we believe is more natural to use than SQL and more powerful than a drag-and-drop interface. We built a custom engine to optimize sequence queries and are also trading off precision, which goes unused in decision-making, for query speed.
  • 18
    Apache Hive

    Apache Hive

    Apache Software Foundation

    The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Apache Hive is an open source project run by volunteers at the Apache Software Foundation. Previously it was a subproject of Apache® Hadoop®, but has now graduated to become a top-level project of its own. We encourage you to learn about the project and contribute your expertise. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API.
  • 19
    PuppyGraph

    PuppyGraph

    PuppyGraph

    PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model. Graph databases are expensive, take months to set up, and need a dedicated team. Traditional graph databases can take hours to run multi-hop queries and struggle beyond 100GB of data. A separate graph database complicates your architecture with brittle ETLs and inflates your total cost of ownership (TCO). Connect to any data source anywhere. Cross-cloud and cross-region graph analytics. No complex ETLs or data replication is required. PuppyGraph enables you to query your data as a graph by directly connecting to your data warehouses and lakes. This eliminates the need to build and maintain time-consuming ETL pipelines needed with a traditional graph database setup. No more waiting for data and failed ETL processes. PuppyGraph eradicates graph scalability issues by separating computation and storage.
    Starting Price: Free
  • 20
    IRI CoSort

    IRI CoSort

    IRI, The CoSort Company

    What is CoSort? IRI CoSort® is a fast, affordable, and easy-to-use sort/merge/report utility, and a full-featured data transformation and preparation package. The world's first sort product off the mainframe, CoSort continues to deliver maximum price-performance and functional versatility for the manipulation and blending of big data sources. CoSort also powers the IRI Voracity data management platform and many third-party tools. What does CoSort do? CoSort runs multi-threaded sort/merge jobs AND many other high-volume (big data) manipulations separately, or in combination. It can also cleanse, mask, convert, and report at the same time. Self-documenting 4GL scripts supported in Eclipse™ help you speed or leave legacy: sort, ETL and BI tools; COBOL and SQL programs, plus Hadoop, Perl, Python, and other batch jobs. Use CoSort to sort, join, aggregate, and load 2-20X faster than data wrangling and BI tools, 10x faster than SQL transforms, and 6x faster than most ETL tools.
    Starting Price: From $4K USD perpetual use
  • 21
    Amazon Athena
    Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets. Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.
  • 22
    SPListX for SharePoint

    SPListX for SharePoint

    Vyapin Software Systems

    SPListX for SharePoint is a powerful rule-based query engine application to export document / picture library contents and associated metadata and list items, including associated file attachments to Windows File System. Export SharePoint site, libraries, folders, documents, list items, version histories, metadata and permissions to the desired destination location in Windows File System. SPListX supports SharePoint 2019 / SharePoint 2016 / SharePoint 2013 / SharePoint 2010 / SharePoint 2007 / SharePoint 2003 & Office 365.
    Starting Price: $1,299.00
  • 23
    VeloDB

    VeloDB

    VeloDB

    Powered by Apache Doris, VeloDB is a modern data warehouse for lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within seconds. Storage engine with real-time upsert、append and pre-aggregation. Unparalleled performance in both real-time data serving and interactive ad-hoc queries. Not just structured but also semi-structured data. Not just real-time analytics but also batch processing. Not just run queries against internal data but also work as a federate query engine to access external data lakes and databases. Distributed design to support linear scalability. Whether on-premise deployment or cloud service, separation or integration of storage and compute, resource usage can be flexibly and efficiently adjusted according to workload requirements. Built on and fully compatible with open source Apache Doris. Support MySQL protocol, functions, and SQL for easy integration with other data tools.
  • 24
    Amazon Timestream
    Amazon Timestream is a fast, scalable, and serverless time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day up to 1,000 times faster and at as little as 1/10th the cost of relational databases. Amazon Timestream saves you time and cost in managing the lifecycle of time series data by keeping recent data in memory and moving historical data to a cost optimized storage tier based upon user defined policies. Amazon Timestream’s purpose-built query engine lets you access and analyze recent and historical data together, without needing to specify explicitly in the query whether the data resides in the in-memory or cost-optimized tier. Amazon Timestream has built-in time series analytics functions, helping you identify trends and patterns in your data in near real-time.
  • 25
    ClickHouse

    ClickHouse

    ClickHouse

    ClickHouse is a fast open-source OLAP database management system. It is column-oriented and allows to generate analytical reports using SQL queries in real-time. ClickHouse's performance exceeds comparable column-oriented database management systems currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second. ClickHouse uses all available hardware to its full potential to process each query as fast as possible. Peak processing performance for a single query stands at more than 2 terabytes per second (after decompression, only used columns). In distributed setup reads are automatically balanced among healthy replicas to avoid increasing latency. ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. All nodes are equal, which allows avoiding having single points of failure.
  • 26
    StarRocks

    StarRocks

    StarRocks

    Whether you're working with a single table or multiple, you'll experience at least 300% better performance on StarRocks compared to other popular solutions. From streaming data to data capture, with a rich set of connectors, you can ingest data into StarRocks in real time for the freshest insights. A query engine that adapts to your use cases. Without moving your data or rewriting SQL, StarRocks provides the flexibility to scale your analytics on demand with ease. StarRocks enables a rapid journey from data to insight. StarRocks' performance is unmatched and provides a unified OLAP solution covering the most popular data analytics scenarios. Whether you're working with a single table or multiple, you'll experience at least 300% better performance on StarRocks compared to other popular solutions. StarRocks' built-in memory-and-disk-based caching framework is specifically designed to minimize the I/O overhead of fetching data from external storage to accelerate query performance.
    Starting Price: Free
  • 27
    LlamaIndex

    LlamaIndex

    LlamaIndex

    LlamaIndex is a “data framework” to help you build LLM apps. Connect semi-structured data from API's like Slack, Salesforce, Notion, etc. LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. LlamaIndex provides the key tools to augment your LLM applications with data. Connect your existing data sources and data formats (API's, PDF's, documents, SQL, etc.) to use with a large language model application. Store and index your data for different use cases. Integrate with downstream vector store and database providers. LlamaIndex provides a query interface that accepts any input prompt over your data and returns a knowledge-augmented response. Connect unstructured sources such as documents, raw text files, PDF's, videos, images, etc. Easily integrate structured data sources from Excel, SQL, etc. Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
  • 28
    ksqlDB

    ksqlDB

    Confluent

    Now that your data is in motion, it’s time to make sense of it. Stream processing enables you to derive instant insights from your data streams, but setting up the infrastructure to support it can be complex. That’s why Confluent developed ksqlDB, the database purpose-built for stream processing applications. Make your data immediately actionable by continuously processing streams of data generated throughout your business. ksqlDB’s intuitive syntax lets you quickly access and augment data in Kafka, enabling development teams to seamlessly create real-time innovative customer experiences and fulfill data-driven operational needs. ksqlDB offers a single solution for collecting streams of data, enriching them, and serving queries on new derived streams and tables. That means less infrastructure to deploy, maintain, scale, and secure. With less moving parts in your data architecture, you can focus on what really matters -- innovation.
  • 29
    SSuite MonoBase Database

    SSuite MonoBase Database

    SSuite Office Software

    Create relational or flat file databases with unlimited tables, fields, and rows. Includes a custom report builder. Interface with ODBC compatible databases and create custom reports for them. Create your own personal and custom databases. Some Highlights: - Filter tables instantly - Ultra simple graphical-user-interface - One click table and data form creation - Open up to 5 databases simultaneously - Export your data to comma separated files - Create custom reports for all your databases - Full helpfile to assist in creating database reports - Print tables and queries directly from the data grid - Supports any SQL standard that your ODBC compatible database requires Please install and run this database application with full administrator rights for best performance and user experience. Requires: . 1024x768 Display Size . Windows 98 / XP / 7 / 8 / 10 - 32bit and 64bit No Java or DotNet required. Green Energy Software. Saving the planet one bit at a time...
    Starting Price: Free
  • 30
    Backtrace

    Backtrace

    Backtrace

    Don’t let app, device, or game crashes get in the way of a great experience. Backtrace takes all the manual labor out of cross-platform crash and exception management so you can focus on shipping. Cross-platform callstack and event aggregation and monitoring. Process errors from panics, core dumps, minidumps, and during runtime across your stack with a single system. Backtrace generates structured, searchable error reports from your data. Automated analysis cuts down on time to resolution by surfacing important signals that lead engineers to crash root cause. Never worry about missing a clue with rich integrations into dashboards, notification, and workflow systems. Answer the questions that matter to you with Backtrace’s rich query engine. View a high-level overview of error frequency, prioritization, and trends across all your projects. Search through key data points and your own custom data across all your errors.
  • 31
    Timeplus

    Timeplus

    Timeplus

    Timeplus is a simple, powerful, and cost-efficient stream processing platform. All in a single binary, easily deployed anywhere. We help data teams process streaming and historical data quickly and intuitively, in organizations of all sizes and industries. Lightweight, single binary, without dependencies. End-to-end analytic streaming and historical functionalities. 1/10 the cost of similar open source frameworks. Turn real-time market and transaction data into real-time insights. Leverage append-only streams and key-value streams to monitor financial data. Implement real-time feature pipelines using Timeplus. One platform for all infrastructure logs, metrics, and traces, the three pillars supporting observability. In Timeplus, we support a wide range of data sources in our web console UI. You can also push data via REST API, or create external streams without copying data into Timeplus.
    Starting Price: $199 per month
  • 32
    GeoSpock

    GeoSpock

    GeoSpock

    GeoSpock enables data fusion for the connected world with GeoSpock DB – the space-time analytics database. GeoSpock DB is a unique, cloud-native database optimised for querying for real-world use cases, able to fuse multiple sources of Internet of Things (IoT) data together to unlock its full value, whilst simultaneously reducing complexity and cost. GeoSpock DB enables efficient storage, data fusion, and rapid programmatic access to data, and allows you to run ANSI SQL queries and connect to analytics tools via JDBC/ODBC connectors. Users are able to perform analysis and share insights using familiar toolsets, with support for common BI tools (such as Tableau™, Amazon QuickSight™, and Microsoft Power BI™), and Data Science and Machine Learning environments (including Python Notebooks and Apache Spark). The database can also be integrated with internal applications and web services – with compatibility for open-source and visualisation libraries such as Kepler and Cesium.js.
  • 33
    Deepnote

    Deepnote

    Deepnote

    Deepnote is building the best data science notebook for teams. In the notebook, users can connect their data, explore, and analyze it with real-time collaboration and version control. Users can easily share project links with team collaborators, or with end-users to present polished assets. All of this is done through a powerful, browser-based UI that runs in the cloud. We built Deepnote because data scientists don't work alone. Features: - Sharing notebooks and projects via URL - Inviting others to view, comment and collaborate, with version control - Publishing notebooks with visualizations for presentations - Sharing datasets between projects - Set team permissions to decide who can edit vs view code - Full linux terminal access - Code completion - Automatic python package management - Importing from github - PostgreSQL DB connection
    Starting Price: Free
  • 34
    Stata

    Stata

    StataCorp

    Stata is a complete, integrated software package that provides all your data science needs: data manipulation, visualization, statistics, and automated reporting. Stata is fast and accurate. It is easy to learn through the extensive graphical interface yet completely programmable. With Stata's menus and dialogs, you get the best of both worlds. You can easily point and click or drag and drop your way to all of Stata's statistical, graphical, and data management features. Use Stata's intuitive command syntax to quickly execute commands. Whether you enter commands directly or use the menus and dialogs, you can create a log of all actions and their results to ensure the reproducibility and integrity of your analysis. Stata also has complete command-line scripting and programming facilities, including a full matrix programming language. You have access to everything you need to script your analysis or even to create new Stata commands--commands that work just like those shipped with Stata.
    Starting Price: $48.00/6-month/student
  • 35
    Saturn Cloud

    Saturn Cloud

    Saturn Cloud

    Saturn Cloud is an award-winning ML platform for any cloud with 100,000+ users, including NVIDIA, CFA Institute, Snowflake, Flatiron School, Nestle, and more. It is an all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Users can spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, build large language models, and more in a completely hosted environment. Data professionals can use your preferred languages, IDEs, and machine-learning libraries in Saturn Cloud. We offer full Git integration, shared custom images, and secure credential storage, making scaling and building your team in the cloud easy. We support the entire machine learning lifecycle from experimentation to production with features like jobs and deployments. These features and built-in tools are easily shareable within teams, so time is saved and work is reproducible.
    Leader badge
    Starting Price: $0.005 per GB per hour
  • 36
    JetBrains DataSpell
    Switch between command and editor modes with a single keystroke. Navigate over cells with arrow keys. Use all of the standard Jupyter shortcuts. Enjoy fully interactive outputs – right under the cell. When editing code cells, enjoy smart code completion, on-the-fly error checking and quick-fixes, easy navigation, and much more. Work with local Jupyter notebooks or connect easily to remote Jupyter, JupyterHub, or JupyterLab servers right from the IDE. Run Python scripts or arbitrary expressions interactively in a Python Console. See the outputs and the state of variables in real-time. Split Python scripts into code cells with the #%% separator and run them individually as you would in a Jupyter notebook. Browse DataFrames and visualizations right in place via interactive controls. All popular Python scientific libraries are supported, including Plotly, Bokeh, Altair, ipywidgets, and others.
    Starting Price: $229
  • 37
    Apache Drill

    Apache Drill

    The Apache Software Foundation

    Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
  • 38
    Snowflake

    Snowflake

    Snowflake

    Your cloud data platform. Secure and easy access to any data with infinite scalability. Get all the insights from all your data by all your users, with the instant and near-infinite performance, concurrency and scale your organization requires. Seamlessly share and consume shared data to collaborate across your organization, and beyond, to solve your toughest business problems in real time. Boost the productivity of your data professionals and shorten your time to value in order to deliver modern and integrated data solutions swiftly from anywhere in your organization. Whether you’re moving data into Snowflake or extracting insight out of Snowflake, our technology partners and system integrators will help you deploy Snowflake for your success.
    Starting Price: $40.00 per month
  • 39
    DuckDB

    DuckDB

    DuckDB

    Processing and storing tabular datasets, e.g. from CSV or Parquet files. Large result set transfer to client. Large client/server installations for centralized enterprise data warehousing. Writing to a single database from multiple concurrent processes. DuckDB is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation is essentially a mathematical term for a table. Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access.
  • 40
    Arroyo

    Arroyo

    Arroyo

    Scale from zero to millions of events per second. Arroyo ships as a single, compact binary. Run locally on MacOS or Linux for development, and deploy to production with Docker or Kubernetes. Arroyo is a new kind of stream processing engine, built from the ground up to make real-time easier than batch. Arroyo was designed from the start so that anyone with SQL experience can build reliable, efficient, and correct streaming pipelines. Data scientists and engineers can build end-to-end real-time applications, models, and dashboards, without a separate team of streaming experts. Transform, filter, aggregate, and join data streams by writing SQL, with sub-second results. Your streaming pipelines shouldn't page someone just because Kubernetes decided to reschedule your pods. Arroyo is built to run in modern, elastic cloud environments, from simple container runtimes like Fargate to large, distributed deployments on the Kubernetes logo Kubernetes.
  • 41
    Forestpin Analytics
    Forestpin Analytics runs a series of complex mathematical tests on your data and provides you with easy-to-use analyses which highlight transactions that fall outside the norm. These outliers can be related to fraud, mistakes, manipulation, opportunity loss and business process improvements. No more queries. Just point, click, and drag. Easily add custom filters to your data to filter out the most relevant data you are looking for. Filter your data by date, date ranges, district, salesperson, product, product combinations, materials, sales outlet or any other data category your dataset includes. Customizable dashboards are automatically populated by the analyses most relevant for your data. Copy and paste data from spreadsheets or open CSV files. Forestpin also integrates with your existing ERP or finace systems so you don’t have to worry about implications.
  • 42
    Azure Data Lake Analytics
    Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job. Process big data jobs in seconds with Azure Data Lake Analytics. There is no infrastructure to worry about because there are no servers, virtual machines, or clusters to wait for, manage, or tune. Instantly scale the processing power, measured in Azure Data Lake Analytics Units (AU), from one to thousands for each job. You only pay for the processing that you use per job. Act on all of your data with optimized data virtualization of your relational sources such as Azure SQL Database and Azure Synapse Analytics. Your queries are automatically optimized by moving processing close to the source data without data movement, which maximizes performance and minimizes latency.
    Starting Price: $2 per hour
  • 43
    NVIDIA RAPIDS
    The RAPIDS suite of software libraries, built on CUDA-X AI, gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes. Accelerate your Python data science toolchain with minimal code changes and no new tools to learn. Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.
  • 44
    Oracle Big Data Service
    Oracle Big Data Service makes it easy for customers to deploy Hadoop clusters of all sizes, with VM shapes ranging from 1 OCPU to a dedicated bare metal environment. Customers choose between high-performance NVmE storage or cost-effective block storage, and can grow or shrink their clusters. Quickly create Hadoop-based data lakes to extend or complement customer data warehouses, and ensure that all data is both accessible and managed cost-effectively. Query, visualize and transform data so data scientists can build machine learning models using the included notebook with its R, Python and SQL support. Move customer-managed Hadoop clusters to a fully-managed cloud-based service, reducing management costs and improving resource utilization.
    Starting Price: $0.1344 per hour
  • 45
    Azure Databricks
    Unlock insights from all your data and build artificial intelligence (AI) solutions with Azure Databricks, set up your Apache Spark™ environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn. Azure Databricks provides the latest versions of Apache Spark and allows you to seamlessly integrate with open source libraries. Spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure. Clusters are set up, configured, and fine-tuned to ensure reliability and performance without the need for monitoring. Take advantage of autoscaling and auto-termination to improve total cost of ownership (TCO).
  • 46
    Omniscope Evo
    Visokio builds Omniscope Evo, complete and extensible BI software for data processing, analytics and reporting. A smart experience on any device. Start from any data in any shape, load, edit, blend, transform while visually exploring it, extract insights through ML algorithms, automate your data workflows, and publish interactive reports and dashboards to share your findings. Omniscope is not only an all-in-one BI tool with a responsive UX on all modern devices, but also a powerful and extensible platform: you can augment data workflows with Python / R scripts and enhance reports with any JS visualisation. Whether you’re a data manager, scientist or analyst, Omniscope is your complete solution: from data, through analytics to visualisation.
    Starting Price: $59/month/user
  • 47
    Atlan

    Atlan

    Atlan

    The modern data workspace. Make all your data assets from data tables to BI reports, instantly discoverable. Our powerful search algorithms combined with easy browsing experience, make finding the right asset, a breeze. Atlan auto-generates data quality profiles which make detecting bad data, dead easy. From automatic variable type detection & frequency distribution to missing values and outlier detection, we’ve got you covered. Atlan takes the pain away from governing and managing your data ecosystem! Atlan’s bots parse through SQL query history to auto construct data lineage and auto-detect PII data, allowing you to create dynamic access policies & best in class governance. Even non-technical users can directly query across multiple data lakes, warehouses & DBs using our excel-like query builder. Native integrations with tools like Tableau and Jupyter makes data collaboration come alive.
  • 48
    GraphDB

    GraphDB

    Ontotext

    *GraphDB allows you to link diverse data, index it for semantic search and enrich it via text analysis to build big knowledge graphs.* GraphDB is a highly efficient and robust graph database with RDF and SPARQL support. The GraphDB database supports a highly available replication cluster, which has been proven in a number of enterprise use cases that required resilience in data loading and query answering. If you need a quick overview of GraphDB or a download link to its latest releases, please visit the GraphDB product section. GraphDB uses RDF4J as a library, utilizing its APIs for storage and querying, as well as the support for a wide variety of query languages (e.g., SPARQL and SeRQL) and RDF syntaxes (e.g., RDF/XML, N3, Turtle).
  • 49
    Hopsworks

    Hopsworks

    Logical Clocks

    Hopsworks is an open-source Enterprise platform for the development and operation of Machine Learning (ML) pipelines at scale, based around the industry’s first Feature Store for ML. You can easily progress from data exploration and model development in Python using Jupyter notebooks and conda to running production quality end-to-end ML pipelines, without having to learn how to manage a Kubernetes cluster. Hopsworks can ingest data from the datasources you use. Whether they are in the cloud, on‑premise, IoT networks, or from your Industry 4.0-solution. Deploy on‑premises on your own hardware or at your preferred cloud provider. Hopsworks will provide the same user experience in the cloud or in the most secure of air‑gapped deployments. Learn how to set up customized alerts in Hopsworks for different events that are triggered as part of the ingestion pipeline.
    Starting Price: $1 per month
  • 50
    QuerySurge
    QuerySurge leverages AI to automate the data validation and ETL testing of Big Data, Data Warehouses, Business Intelligence Reports and Enterprise Apps/ERPs with full DevOps functionality for continuous testing. Use Cases - Data Warehouse & ETL Testing - Hadoop & NoSQL Testing - DevOps for Data / Continuous Testing - Data Migration Testing - BI Report Testing - Enterprise App/ERP Testing QuerySurge Features - Projects: Multi-project support - AI: automatically create datas validation tests based on data mappings - Smart Query Wizards: Create tests visually, without writing SQL - Data Quality at Speed: Automate the launch, execution, comparison & see results quickly - Test across 200+ platforms: Data Warehouses, Hadoop & NoSQL lakes, databases, flat files, XML, JSON, BI Reports - DevOps for Data & Continuous Testing: RESTful API with 60+ calls & integration with all mainstream solutions - Data Analytics & Data Intelligence:  Analytics dashboard & reports