Page 7 | Compare Business Software for Databricks Data Intelligence Platform: July 2025 Reviews & Comparison

lakeFS

Treeverse

lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data. Simplifying the lives of engineers, data scientists and analysts who are transforming the world with data. lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, atomic and versioned data lake operations, from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage (GCS) as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc. lakeFS provides a Git-like branching and committing model that scales to exabytes of data by utilizing S3, GCS, or Azure Blob for storage.

View Software

Talend Data Integration

Qlik

Talend Data Integration lets you connect and manage all your data, no matter where it lives. Use more than 1,000 connectors and components to connect virtually any data source with virtually any data environment, in the cloud or on premises. Easily develop and deploy reusable data pipelines with a drag-and-drop interface that’s 10 times faster than hand-coding. Talend has always supported scaling massive data sets to advanced data analytics or Spark platforms. We also partner with leading cloud service providers, data warehouses, and analytics platforms, including Amazon Web Services, Microsoft Azure, Google Cloud Platform, Snowflake, and Databricks. With Talend, data quality is embedded into every step of the data integration processes. Discover, highlight, and fix issues as data moves through your systems, before inconsistencies can disrupt or impact crucial decisions. Connect to data where it lives, use it where you need it.

View Software

AnalyticsIQ

AnalyticsIQ works with marketers from a wide variety of industries that are as excited about great data as we are. Financial institutions, non-profits, auto manufacturers, retailers, agencies, and travel providers all have unique data needs. But the ones that work with us want to personalize experiences for their customers, use data in a responsible way, and perform the best they can. Truly knowing your customers requires more than simply using the most accurate demographic data like income, age, and gender. It even involves going beyond behavioral data like past purchases, lifestyle interests, and channel preferences. Although this data is important, getting into the psyche of your customers can help you connect with them like never before. This is where our psychological data is a game changer.

View Software

Trillium Geolocation

Precisely

Improve the accuracy and efficiency of your business applications with real-time global postal address validation and geocoding integration. Acquiring a global customer base takes a lot of effort, and you want to provide the best experience to keep them satisfied. From online forms to customer service, to timely delivery, you need to meet their expectations, no matter their country. However, managing worldwide address standards and geocode information is a challenge. It provides the appropriate formats, character sets, rules, and postal standards for more than 240 different countries and territories. There’s also Unicode support for a broad range of languages. And it provides the intelligence to identify and apply the data to standard address formats. It helps you to avoid costly billing and shipping errors, wasted mailings, misdirected customer communication, and more. Data entry errors are unavoidable, but the goal is to minimize them wherever you can.

View Software

DuckDB

Processing and storing tabular datasets, e.g. from CSV or Parquet files. Large result set transfer to client. Large client/server installations for centralized enterprise data warehousing. Writing to a single database from multiple concurrent processes. DuckDB is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation is essentially a mathematical term for a table. Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access.

View Software

DataSentics

Making data science & machine learning have a real impact on organizations. We are an AI product studio, a group of 100 experienced data scientists and data engineers with a combination of experience both from the agile world of digital start-ups as well as major international corporations. We don’t end with nice slides and dashboards. The result that counts is an automated data solution in production integrated inside a real process. We do not report clickers but data scientists and data engineers. We have a strong focus on productionalizing data science solutions in the cloud with high standards of CI and automation. Building the greatest concentration of the smartest and most creative data scientists and engineers by being the most exciting and fulfilling place for them to work in Central Europe. Giving them the freedom to use our critical mass of expertise to find and iterate on the most promising data-driven opportunities, both for our clients and our own products.

View Software

Azure Databricks

Microsoft

Unlock insights from all your data and build artificial intelligence (AI) solutions with Azure Databricks, set up your Apache Spark™ environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn. Azure Databricks provides the latest versions of Apache Spark and allows you to seamlessly integrate with open source libraries. Spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure. Clusters are set up, configured, and fine-tuned to ensure reliability and performance without the need for monitoring. Take advantage of autoscaling and auto-termination to improve total cost of ownership (TCO).

View Software

Great Expectations

Great Expectations is a shared, open standard for data quality. It helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting. There are many amazing companies using great expectations these days. Check out some of our case studies with companies that we've worked closely with to understand how they are using great expectations in their data stack. Great expectations cloud is a fully managed SaaS offering. We're taking on new private alpha members for great expectations cloud, a fully managed SaaS offering. Alpha members get first access to new features and input to the roadmap.

View Software

Wallaroo.AI

Wallaroo facilitates the last-mile of your machine learning journey, getting ML into your production environment to impact the bottom line, with incredible speed and efficiency. Wallaroo is purpose-built from the ground up to be the easy way to deploy and manage ML in production, unlike Apache Spark, or heavy-weight containers. ML with up to 80% lower cost and easily scale to more data, more models, more complex models. Wallaroo is designed to enable data scientists to quickly and easily deploy their ML models against live data, whether to testing environments, staging, or prod. Wallaroo supports the largest set of machine learning training frameworks possible. You’re free to focus on developing and iterating on your models while letting the platform take care of deployment and inference at speed and scale.

View Software

Eureka

Eureka automatically discovers all types of deployed data stores, understanding the data and identifying your real-time risk. Eureka lets you choose, customize and create policies; automatically translating them into platform-specific controls for all of your relevant data stores. Eureka continuously compares real-world implementation to desired policy, alerting on gaps and policy drift before recommending risk-prioritized remediations, actions, and controls. Understand your entire cloud data store footprint, data store content, and security and compliance risk. Implement change rapidly and non-intrusively with agentless discovery and risk monitoring. Continuously monitor, improve and communicate cloud data security posture and compliance. Store, access, and leverage data with guardrails that don’t interfere with business agility and operations. Eureka delivers broad visibility, policy, and control management, as well as continuous monitoring and alerting.

View Software

SQL

SQL is a domain-specific programming language used for accessing, managing, and manipulating relational databases and relational database management systems.

View Software

Habu

Connect to data wherever it lives, even across a disparate universe. Data and model enrichment is the #1 way to increase and enhance acquisition and retention. Through machine learning, you will unlock new insights by bringing proprietary models, like propensity models, and data together in a protected way to supercharge your customer profiles and models and scale rapidly. It’s not enough to enrich the data. Your team must seamlessly go from insight to activation. Automate audience segmentation and immediately push your campaigns across disparate channels. Be smarter about who you target to save on budget and churn. Know where to target and when. Have the tools to act on data at the moment. Identifying the entire customer journey, including different types of data, has always been a challenge. As privacy regulations get stricter and data becomes more distributed, secure and easy access to those intent signals is more critical than ever.

View Software

Feast

Tecton

Make your offline data available for real-time predictions without having to build custom pipelines. Ensure data consistency between offline training and online inference, eliminating train-serve skew. Standardize data engineering workflows under one consistent framework. Teams use Feast as the foundation of their internal ML platforms. Feast doesn’t require the deployment and management of dedicated infrastructure. Instead, it reuses existing infrastructure and spins up new resources when needed. You are not looking for a managed solution and are willing to manage and maintain your own implementation. You have engineers that are able to support the implementation and management of Feast. You want to run pipelines that transform raw data into features in a separate system and integrate with it. You have unique requirements and want to build on top of an open source solution.

View Software

Polytomic

From your app database, data warehouse, spreadsheets, or even arbitrary APIs. No coding is required. See a live view of all the customer data you need right in Salesforce, Marketo, HubSpot, and other business systems. Automatically pipe combined data from any number of databases, data warehouses, spreadsheets, and APIs. Choose which fields to sync so you only get the data you care about. Integrate with all of your favorite tools at the click of a button. Point-and-click interface to sync the data you need from your databases and spreadsheets to your business applications. Give your customer success and sales teams a full view of all your customer data right from their sales CRM. Automatic syncs from your data warehouses and databases to all business systems and spreadsheets. See all proprietary user and company attributes automatically synced to your CRM. Give your support team an instant live view of the customer data they need right from their support system.

View Software

Wizata

With Digital Twin & Data Explorer, AI Solutions Builder and Automation of Production features, the Wizata Platform empowers the manufacturing industry to drive its digital transformation and facilitates the development of AI solutions from proof of concept to real-time production recommendations for a complete loop process control through AI. This open architecture platform (SaaS– Software as a Service) acts as an orchestrator of your different assets (machines, sensors, AI, cloud, edge,) and ensures you gather and explore easily your data that stays under your sole control. Control resources invested into AI experiments step by step and prioritize your projects depending on how your AI solutions solve your business pains and improve production processes, their return of investment and according to data science best practices in metallurgy that we developed since 4 years around the world.

View Software

Theom

Theom is a cloud data security product that discovers and protects all data in cloud stores, APIs, and message queues. Like a bodyguard who closely follows and protects a high-value asset, Theom ensures controls follow the data regardless of how it is stored or accessed. Theom identifies PII, PHI, financial information, and trade secrets using agentless scanning and NLP classifiers, which support custom taxonomies. Theom discovers dark data, data that are never accessed, and shadow data, data whose security posture is different from the primary copy. Theom pinpoints confidential data, e.g., developer keys, in APIs and message queues. Theom estimates the financial value of data to help prioritize risks. Theom maps the relationships between data, access identities, and security attributes to uncover the risks to data. Theom shows how high-value data is accessed by identities (users and roles). Security attributes including user location, atypical access patterns, etc.

View Software

Sentra

Strengthen your cloud data security posture without slowing down your business. Sentra’s agentless solution is able to discover and scan cloud data stores to find sensitive data without any impact on performance. Sentra's data-centric approach is focused on securing your company's most valuable data. Automatically detect all managed and unmanaged cloud-native data stores. Sentra uses both existing and custom data recognition tools to identify sensitive cloud data. By leveraging data scanning technologies that are based on smart metadata clustering and data sampling, users can reduce cloud costs by three orders of magnitude compared to existing solutions. Sentra’s API-first and extensible classification easily integrates with your existing data catalogs and security tools. Assess the risk to your data stores by looking both at compliance requirements and your security posture. Sentra also integrates with your existing security tools, so you always have the full context.

View Software

Amazon SageMaker Feature Store

Amazon

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store for feature use across the ML lifecycle. Store, share, and manage ML model features for training and inference to promote feature reuse across ML applications. Ingest features from any data source including streaming and batch such as application logs, service logs, clickstreams, sensors, etc.

View Software

Amazon SageMaker Data Wrangler

Amazon

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data you want from a wide variety of data sources and import it quickly. Next, you can use the Data Quality and Insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations so you can quickly transform data without writing any code. Once you have completed your data preparation workflow, you can scale it to your full datasets using SageMaker data processing jobs; train, tune, and deploy models.

View Software

Sana

Sana Labs

One home for all your learning and knowledge. Sana is an AI-powered learning platform that empowers teams to find, share, and harness the knowledge they need to achieve their missions. Give everyone a more immersive learning experience by blending live collaborative sessions with personalized self-paced courses. All from one platform. Lower the barrier to sharing knowledge by letting Sana Assistant generate questions, explanations, images, and even entire courses from scratch. Empower anyone to keep up the energy and engagement with interactive quizzes, Q&A, polls, stickynotes, reflection cards, recordings, and more. Integrate Sana with all your team apps and make your entire company’s knowledge searchable in under 100ms. Github, Google Workspace, Notion, Slack, Salesforce. You name it, Sana can query it.

View Software

Robust Intelligence

The Robust Intelligence Platform integrates seamlessly into your ML lifecycle to eliminate model failures. The platform detects your model’s vulnerabilities, prevents aberrant data from entering your AI system, and detects statistical data issues like drift. At the core of our test-based approach is a single test. Each test measures your model’s robustness to a specific type of production model failure. Stress Testing runs hundreds of these tests to measure model production readiness. The results of these tests are used to auto-configure a custom AI Firewall that protects the model against the specific forms of failure to which a given model is susceptible. Finally, Continuous Testing runs these tests during production, providing automated root cause analysis informed by the underlying cause of any single test failure. Using all three elements of the Robust Intelligence platform together helps ensure ML Integrity.

View Software

TextQL

The platform indexes BI tools and semantic layers, documents data in dbt, and uses OpenAI and language models to provide self-serve power analytics. With TextQL, non-technical users can easily and quickly work with data by asking questions in their work context (Slack/Teams/email) and getting automated answers quickly and safely. The platform also leverages NLP and semantic layers, including the dbt Labs semantic layer, to ensure reasonable solutions. TextQL's elegant handoffs to human analysts, when required, dramatically simplify the whole question-to-answer process with AI. At TextQL, our mission is to empower business teams to access the data that they're looking for in less than a minute. To accomplish this, we help data teams surface and create documentation for their data so that business teams can trust that their reports are up to date.

View Software

Optable

End-to-end data clean room platform, integrated for activation. Publishers and advertisers use Optable data clean room technology to securely plan, activate and measure advertising campaigns. A new generation of privacy-preserving data collaboration software. Optable customers can collaborate with their customers and partners, including those who aren't Optable customers themselves. This can be done using the platform's Flash Nodes, allowing to invite other parties into a secure environment. Optable offers a decentralized identity infrastructure, allowing to build of private identity graphs. The infrastructure provides means for creating purpose-limited, permission data clean rooms that minimize data movement. Interoperability with data warehouses and other data clean rooms is key. The use of our open-source software allows third-party platforms to match data with Optable customers, as well as implement secure clean room functions for their own use.

View Software

Mimic

Facteus

Advanced technology and services to safely transform and enhance sensitive data into actionable insights, help drive innovation, and open new revenue streams. Using the Mimic synthetic data engine, companies can safely synthesize their data assets, protecting consumer privacy information from being exposed, while still maintaining the statistical relevancy of the data. The synthetic data can then be used for internal initiatives like analytics, machine learning and AI, marketing and segmentation activities, and new revenue streams through external data monetization. Mimic enables you to safely move statistically-relevant synthetic data to the cloud ecosystem of your choice to get the most out of your data. Analytics, insights, product development, testing, and third-party data sharing can all be done in the cloud with the enhanced synthetic data, which has been certified to be compliant with regulatory and privacy laws.

View Software

Qualytics

Helping enterprises proactively manage their full data quality lifecycle through contextual data quality checks, anomaly detection and remediation. Expose anomalies and metadata to help teams take corrective actions. Automatically trigger remediation workflows to resolve errors quickly and efficiently. Maintain high data quality and prevent errors from affecting business decisions. The SLA chart provides an overview of SLA, including the total number of SLA monitoring that have been performed and any violations that have occurred. This chart can help you identify areas of your data that may require further investigation or improvement.

View Software

LlamaIndex

LlamaIndex is a “data framework” to help you build LLM apps. Connect semi-structured data from API's like Slack, Salesforce, Notion, etc. LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. LlamaIndex provides the key tools to augment your LLM applications with data. Connect your existing data sources and data formats (API's, PDF's, documents, SQL, etc.) to use with a large language model application. Store and index your data for different use cases. Integrate with downstream vector store and database providers. LlamaIndex provides a query interface that accepts any input prompt over your data and returns a knowledge-augmented response. Connect unstructured sources such as documents, raw text files, PDF's, videos, images, etc. Easily integrate structured data sources from Excel, SQL, etc. Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.

View Software

Wherobots

Wherobots enables users to easily develop, test, and deploy geospatial data analytics and AI pipelines within the user's existing data stack. That can be deployed in the cloud. Users do not have to worry about the hassle of resource administration, workload scalability, and geospatial processing support/optimization. Connect your Wherobots account to the cloud database where the data is stored using our SaaS web interface. Develop your geospatial data science, machine learning, or analytics application using Sedona Developer Tool. Schedule automatic deployment of your geospatial pipeline to the cloud data platform and monitor the performance in Wherobots. Consume the outcome of your geospatial analytics task. The consumption model can be through a single geospatial map visualization or API calls.

View Software

Acryl Data

No more data catalog ghost towns. Acryl Cloud drives fast time-to-value via Shift Left practices for data producers and an intuitive UI for data consumers. Continuously detect data quality incidents in real-time, automate anomaly detection to prevent breakages, and drive fast resolution when they do occur. Acryl Cloud supports both push-based and pull-based metadata ingestion for easy maintenance, ensuring information is trustworthy, up-to-date, and definitive. Data should be operational. Go beyond simple visibility and use automated Metadata Tests to continuously expose data insights and surface new areas for improvement. Reduce confusion and accelerate resolution with clear asset ownership, automatic detection, streamlined alerts, and time-based lineage for tracing root causes.

View Software

Modelbit

Don't change your day-to-day, works with Jupyter Notebooks and any other Python environment. Simply call modelbi.deploy to deploy your model, and let Modelbit carry it — and all its dependencies — to production. ML models deployed with Modelbit can be called directly from your warehouse as easily as calling a SQL function. They can also be called as a REST endpoint directly from your product. Modelbit is backed by your git repo. GitHub, GitLab, or home grown. Code review. CI/CD pipelines. PRs and merge requests. Bring your whole git workflow to your Python ML models. Modelbit integrates seamlessly with Hex, DeepNote, Noteable and more. Take your model straight from your favorite cloud notebook into production. Sick of VPC configurations and IAM roles? Seamlessly redeploy your SageMaker models to Modelbit. Immediately reap the benefits of Modelbit's platform with the models you've already built.

View Software

Demyst

External data is the next frontier of business impact, powering competitive advantages across industries, but businesses struggle with the complexity of implementation. Demyst provides the end-to-end tools you need to discover, onboard, and ingest the right external data, with our experts working closely with you every step of the way. Browse and instantly deploy the right data from Demyst’s catalog of data sources, or our expert team will recommend and onboard something new for you from any external data provider around the globe. Demyst's data provider certification program means we procure and diligence data for your use, all covered under our contract. Demyst removes the "compliance versus speed" trade-off, performing ongoing legal, privacy and security due diligence for your safe and compliant data access, whilst typically onboarding new data in 4 weeks or less. Demyst performs the last mile. Deploy and monitor the data you need with consistently formatted APIs or files.

View Software

Business Software for Databricks Data Intelligence Platform - Page 7

Top Software that integrates with Databricks Data Intelligence Platform as of July 2025 - Page 7

lakeFS

Talend Data Integration

AnalyticsIQ

Trillium Geolocation

DuckDB

DataSentics

Azure Databricks

Great Expectations

Wallaroo.AI

Eureka

SQL

Habu

Feast

Polytomic

Wizata

Theom

Sentra

Amazon SageMaker Feature Store

Amazon SageMaker Data Wrangler

Sana

Robust Intelligence

TextQL

Optable

Mimic

Qualytics

LlamaIndex

Wherobots

Acryl Data

Modelbit

Demyst