Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Add a Product Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Help
Create
Join
Login

Home
Compare Business Software
Artificial Intelligence Software
Apache Spark

Best Artificial Intelligence Software for Apache Spark - Page 2

View:

Open Source Commercial

Clear All Filters

Artificial Intelligence Features

Predictive Analytics 5
For eCommerce 4
For Healthcare 4
For Sales 4
More...
Machine Learning 4
Natural Language Processing 4
Process/Workflow Automation 4
Rules-Based Automation 3
Chatbot 2
Image Recognition 2
Multi-Language 2
Virtual Personal Assistant 2

Deployment

Cloud 45
Linux 10
Windows 10
Mac 7
On-Premises 6

Compare the Top Artificial Intelligence Software that integrates with Apache Spark as of November 2025 - Page 2

Sort By:

Sponsored

Apache Spark Artificial Intelligence Clear Filters

This a list of Artificial Intelligence software that integrates with Apache Spark. Use the filters on the left to add additional filters for products that have integrations with Apache Spark. View the products that work with Apache Spark in the table below.

1

Tonic

Tonic

Tonic automatically creates mock data that preserves key characteristics of secure datasets so that developers, data scientists, and salespeople can work conveniently without breaching privacy. Tonic mimics your production data to create de-identified, realistic, and safe data for your test environments. With Tonic, your data is modeled from your production data to help you tell an identical story in your testing environments. Safe, useful data created to mimic your real-world data, at scale. Generate data that looks, acts, and feels just like your production data and safely share it across teams, businesses, and international borders. PII/PHI identification, obfuscation, and transformation. Proactively protect your sensitive data with automatic scanning, alerts, de-identification, and mathematical guarantees of data privacy. Advanced sub setting across diverse database types. Collaboration, compliance, and data workflows — perfectly automated.

View Software
2

NVIDIA RAPIDS

NVIDIA

The RAPIDS suite of software libraries, built on CUDA-X AI, gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes. Accelerate your Python data science toolchain with minimal code changes and no new tools to learn. Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.

View Software
3

OPAQUE

OPAQUE Systems

OPAQUE Systems offers a leading confidential AI platform that enables organizations to securely run AI, machine learning, and analytics workflows on sensitive data without compromising privacy or compliance. Their technology allows enterprises to unleash AI innovation risk-free by leveraging confidential computing and cryptographic verification, ensuring data sovereignty and regulatory adherence. OPAQUE integrates seamlessly into existing AI stacks via APIs, notebooks, and no-code solutions, eliminating the need for costly infrastructure changes. The platform provides verifiable audit trails and attestation for complete transparency and governance. Customers like Ant Financial have benefited by using previously inaccessible data to improve credit risk models. With OPAQUE, companies accelerate AI adoption while maintaining uncompromising security and control.

View Software
4

Deeplearning4j

Deeplearning4j

DL4J takes advantage of the latest distributed computing frameworks including Apache Spark and Hadoop to accelerate training. On multi-GPUs, it is equal to Caffe in performance. The libraries are completely open-source, Apache 2.0, and maintained by the developer community and Konduit team. Deeplearning4j is written in Java and is compatible with any JVM language, such as Scala, Clojure, or Kotlin. The underlying computations are written in C, C++, and Cuda. Keras will serve as the Python API. Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs. There are a lot of parameters to adjust when you're training a deep-learning network. We've done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala, Clojure, and Kotlin programmers.

View Software
5

StreamFlux

Fractal

Data is crucial when it comes to building, streamlining and growing your business. However, getting the full value out of data can be a challenge, many organizations are faced with poor access to data, incompatible tools, spiraling costs and slow results. Simply put, leaders who can turn raw data into real results will thrive in today’s landscape. The key to this is empowering everyone across your business to be able to analyze, build and collaborate on end-to-end AI and machine learning solutions in one place, fast. Streamflux is a one-stop shop to meet your data analytics and AI challenges. Our self-serve platform allows you the freedom to build end-to-end data solutions, uses models to answer complex questions and assesses user behaviors. Whether you’re predicting customer churn and future revenue, or generating recommendations, you can go from raw data to genuine business impact in days, not months.

View Software
6

AI Squared

AI Squared

Empower data scientists and application developers to collaborate on ML projects. Build, load, optimize and test models and integrations before publishing to end-users for integration into live applications. Reduce data science workload and improve decision-making by storing and sharing ML models across the organization. Publish updates to automatically push changes to models in production. Drive efficiency by instantly providing ML-powered insights within any web-based business application. Our self-service, drag-and-drop browser extension enables analysts and business users to integrate models into any web-based application with zero code.

View Software
7

Zepl

Zepl

Sync, search and manage all the work across your data science team. Zepl’s powerful search lets you discover and reuse models and code. Use Zepl’s enterprise collaboration platform to query data from Snowflake, Athena or Redshift and build your models in Python. Use pivoting and dynamic forms for enhanced interactions with your data using heatmap, radar, and Sankey charts. Zepl creates a new container every time you run your notebook, providing you with the same image each time you run your models. Invite team members to join a shared space and work together in real time or simply leave their comments on a notebook. Use fine-grained access controls to share your work. Allow others have read, edit, and run access as well as enable collaboration and distribution. All notebooks are auto-saved and versioned. You can name, manage and roll back all versions through an easy-to-use interface, and export seamlessly into Github.

View Software
8

Yottamine

Yottamine

Our highly innovative machine learning technology is designed specifically to accurately predict financial time series where only a small number of training data points are available. Advance AI is computationally consuming. YottamineAI leverages the cloud to eliminate the need to invest time and money on managing hardware, shortening the time to benefit from higher ROI significantly. Strong encryption and protection of keys ensure trade secrets stay safe. We follow the best practices of AWS and utilize strong encryption to secure your data. We evaluate how your existing or future data can generate predictive analytics in helping you make information-based decisions. If you need predictive analytics on a project basis, Yottamine Consulting Services provides project-based consulting to accommodate your data-mining needs.

View Software
9

Amazon SageMaker Feature Store

Amazon

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store for feature use across the ML lifecycle. Store, share, and manage ML model features for training and inference to promote feature reuse across ML applications. Ingest features from any data source including streaming and batch such as application logs, service logs, clickstreams, sensors, etc.

View Software
10

Amazon SageMaker Data Wrangler

Amazon

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data you want from a wide variety of data sources and import it quickly. Next, you can use the Data Quality and Insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations so you can quickly transform data without writing any code. Once you have completed your data preparation workflow, you can scale it to your full datasets using SageMaker data processing jobs; train, tune, and deploy models.

View Software
11

Apache Mahout

Apache Software Foundation

Apache Mahout is a powerful, scalable, and versatile machine learning library designed for distributed data processing. It offers a comprehensive set of algorithms for various tasks, including classification, clustering, recommendation, and pattern mining. Built on top of the Apache Hadoop ecosystem, Mahout leverages MapReduce and Spark to enable data processing on large-scale datasets. Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end or can be extended to other distributed backends. Matrix computations are a fundamental part of many scientific and engineering applications, including machine learning, computer vision, and data analysis. Apache Mahout is designed to handle large-scale data processing by leveraging the power of Hadoop and Spark.

View Software
12

Determined AI

Determined AI

Distributed training without changing your model code, determined takes care of provisioning machines, networking, data loading, and fault tolerance. Our open source deep learning platform enables you to train models in hours and minutes, not days and weeks. Instead of arduous tasks like manual hyperparameter tuning, re-running faulty jobs, and worrying about hardware resources. Our distributed training implementation outperforms the industry standard, requires no code changes, and is fully integrated with our state-of-the-art training platform. With built-in experiment tracking and visualization, Determined records metrics automatically, makes your ML projects reproducible and allows your team to collaborate more easily. Your researchers will be able to build on the progress of their team and innovate in their domain, instead of fretting over errors and infrastructure.

View Software
13

Qlik Staige

QlikTech

Harness the power of Qlik® Staige™ to make AI real by delivering a trusted data foundation, automation, actionable predictions, and company-wide impact. AI isn’t just experiments and initiatives — it’s an entire ecosystem of files, scripts, and results. Wherever your investments, we’ve partnered with top sources to bring you integrations that save time, enable management, and validate quality. Automate the delivery of real-time data into AWS data warehouses or data lakes, and make it easily accessible through a governed catalog. Through our new integration with Amazon Bedrock, you can easily connect to foundational large language models (LLMs) including A21 Labs, Amazon Titan, Anthropic, Cohere, and Meta. Seamless integration with Amazon Bedrock makes it easier for AWS customers to leverage large language models with analytics for AI-driven insights.

View Software
14

ModelOp

ModelOp

ModelOp is the leading AI governance software that helps enterprises safeguard all AI initiatives, including generative AI, Large Language Models (LLMs), in-house, third-party vendors, embedded systems, etc., without stifling innovation. Corporate boards and C‑suites are demanding the rapid adoption of generative AI but face financial, regulatory, security, privacy, ethical, and brand risks. Global, federal, state, and local-level governments are moving quickly to implement AI regulations and oversight, forcing enterprises to urgently prepare for and comply with rules designed to prevent AI from going wrong. Connect with AI Governance experts to stay informed about market trends, regulations, news, research, opinions, and insights to help you balance the risks and rewards of enterprise AI. ModelOp Center keeps organizations safe and gives peace of mind to all stakeholders. Streamline reporting, monitoring, and compliance adherence across the enterprise.

View Software
15

Unity Catalog

Databricks

Databricks Unity Catalog is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. With Unity Catalog, organizations can seamlessly govern both structured and unstructured data in any format, as well as machine learning models, notebooks, dashboards, and files across any cloud or platform. Data scientists, analysts, and engineers can securely discover, access, and collaborate on trusted data and AI assets across platforms, leveraging AI to boost productivity and unlock the full potential of the lakehouse environment. This unified and open approach to governance promotes interoperability and accelerates data and AI initiatives while simplifying regulatory compliance. Easily discover and classify both structured and unstructured data in any format, including machine learning models, notebooks, dashboards, and files across all cloud platforms.

View Software
16

MLlib

Apache Software Foundation

Apache Spark's MLlib is a scalable machine learning library that integrates seamlessly with Spark's APIs, supporting Java, Scala, Python, and R. It offers a comprehensive suite of algorithms and utilities, including classification, regression, clustering, collaborative filtering, and tools for constructing machine learning pipelines. MLlib's high-quality algorithms leverage Spark's iterative computation capabilities, delivering performance up to 100 times faster than traditional MapReduce implementations. It is designed to operate across diverse environments, running on Hadoop, Apache Mesos, Kubernetes, standalone clusters, or in the cloud, and accessing various data sources such as HDFS, HBase, and local files. This flexibility makes MLlib a robust solution for scalable and efficient machine learning tasks within the Apache Spark ecosystem.

View Software
17

Botify.cloud

Botify.cloud

Botify.cloud is an innovative platform designed to streamline and simplify cryptocurrency automation through a certified, all-in-one AI agent marketplace. With Botify.cloud, users can explore a diverse range of agent categories, including trading, volume management, social media, and utility agents. Our instant agent creation tool allows users to customize agents to their needs quickly and easily. It offers features such as agent creation, selling agents on the marketplace, Botify certification for every agent, diverse agent categories, and easy editing of agents' names and profiles. Users can also save their favorite agents for later use. For every agent that is sold, a token is created, and basically, in any transaction on the platform, users earn rewards. Building an agent is straightforward: simply choose a category, fill in the required fields, choose a large language model, and decide the temperature of your agent.

View Software
18

Oracle AI Data Platform (AIDP)

Oracle

The Oracle AI Data Platform unifies the complete data-to-insight lifecycle with embedded artificial intelligence, machine learning, and generative capabilities across data stores, analytics, applications, and infrastructure. It supports everything from data ingestion and governance through to feature engineering, model training, and operationalization, enabling organizations to build trusted AI-driven systems at scale. With its integrated architecture, the platform offers native support for vector search, retrieval-augmented generation, and large language models, while enabling secure, auditable access to business data and analytics across enterprise roles. The platform’s analytics layer lets users explore, visualize, and interpret data with AI-powered assistance, where self-service dashboards, natural-language queries, and generative summaries accelerate decision making.

View Software
19

Unravel

Unravel Data

Unravel makes data work anywhere: on Azure, AWS, GCP or in your own data center– Optimizing performance, automating troubleshooting and keeping costs in check. Unravel helps you monitor, manage, and improve your data pipelines in the cloud and on-premises – to drive more reliable performance in the applications that power your business. Get a unified view of your entire data stack. Unravel collects performance data from every platform, system, and application on any cloud then uses agentless technologies and machine learning to model your data pipelines from end to end. Explore, correlate, and analyze everything in your modern data and cloud environment. Unravel’s data model reveals dependencies, issues, and opportunities, how apps and resources are being used, what’s working and what’s not. Don’t just monitor performance – quickly troubleshoot and rapidly remediate issues. Leverage AI-powered recommendations to automate performance improvements, lower costs, and prepare.

View Software
20

DataNimbus

DataNimbus

DataNimbus is an AI-powered platform that streamlines payments and accelerates AI adoption through innovative, cost-efficient solutions. By seamlessly integrating with Databricks components like Spark, Unity Catalog, and ML Ops, DataNimbus enhances scalability, governance, and runtime operations. Its offerings include a visual designer, a marketplace for reusable connectors and machine learning blocks, and agile APIs, all designed to simplify workflows and drive data-driven innovation.

View Software

Previous
1
You're on page 2
Next

Related Categories

AIOps AI Recruiting Augmented Analytics Decision Intelligence MLOps Digital Transformation AI Accounting Generative AI AI Chatbots AI Meeting Assistants AI Website Builders AI Legal AI Presentation Makers AI Spreadsheet AI SEO AI Security Edge AI AI Landing Page Generators AI SQL Query LLMOps AI Search Engines Autonomous Sourcing AI eCommerce Machine Learning as a Service AI CRM AI Customer Success AI Project Management AI Scheduling Assistants AI Trading Bots AI Manufacturing AI HR AI Knowledge Base AI Coaching No Code AI Tools AI UX/UI Design Tools AI Observability AI Finance AI Podcast Tools AI Education AI Contract Generators LLM Security AI Interior Design AI Construction AI Compliance Management Active Learning Foundation Models Small Language Models Retrieval-Augmented Generation (RAG) AI Insurance AI Lending AI Interviewing AI Call Center AutoML AI Architecture AI ERP AI Translation AI Content Management Systems (AI CMS) Agentic Frameworks AI Collaboration Tools AI Cybersecurity Serverless GPU Clouds AI-Powered ITSM Tools AI Service Management (AISM) AI Browsers

SourceForge

Open Source Software
Business Software
Add Your Software
Business Software Advertising

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support / Documentation
Site Status
SourceForge Reviews

Terms Privacy Opt Out Advertise

Thanks for helping keep SourceForge clean.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL:

Best Artificial Intelligence Software for Apache Spark - Page 2

Compare the Top Artificial Intelligence Software that integrates with Apache Spark as of November 2025 - Page 2

Tonic

NVIDIA RAPIDS

OPAQUE

Deeplearning4j

StreamFlux

AI Squared

Zepl

Yottamine

Amazon SageMaker Feature Store

Amazon SageMaker Data Wrangler

Apache Mahout

Determined AI

Qlik Staige

ModelOp

Unity Catalog

MLlib

Botify.cloud

Oracle AI Data Platform (AIDP)

Unravel

DataNimbus

Related Categories